MINOR auto-update in Australia stuck in restart cycle - resolving

On June 13 at about 13:00 UTC, MINOR auto-update was changed to the latest solid release 02_05_09. There is an ongoing problem in Australia where the download of a 800K+ release file takes longer than 20 seconds,causing a restart condition in IoTaWatt units running on releases before 02_05_09. I estimate, based on server request volume, that there about 20 units in Australia afflicted with this condition. There are possibly a handful of units in Finland as well.

IoTaWatt.com uses cloudflare to cache the release files globally. There are seven servers in Australia. I have no idea why these 20 or so sites don’t get faster access, as the rest of the world does. Possibly one or more ISP isn’t using the cloudflare presence and is going back to the IoTaWatt.com server in the USA. But that doesn’t explain why thousands of requests fail and zero succeed. My suspicion is that there is some explicit delay being introduced, or a link to the USA is via satellite.

Whatever the cause, to get past this, the auto-update must be stopped and a manual update performed. To stop the auto-update restart cycle, there are three possible approaches:

  1. Try to run the config app repeatedly to catch a window between restarts and set auto-update to NONE. Some folks have had success with this and it should work with persistence.

  2. If you can disable internet from your WiFi router, the restarts should stop long enough for you to set auto-update to NONE, then you can reconnect internet access.

  3. Power down the IoTaWatt, carefully remove the SDcard, mount in another computer, edit the config.txt file changing the line:
    "update": "MINOR",
    to:
    "update": "NONE",
    Replace the SDcard and restart.

Once restarted, download the new release file to your computer with the URL https://iotawatt.com/firmware/bin/02_05_09.bin. Select save the file if asked.

Run the IoTaWatt file manager and upload the release file to the download directory

download/02_05_09.bin

Now send an update command to the IoTaWatt to install the release:

http://iotawatt.local/update?release=02_05_09

Your system should update and restart.

Given all that is successful, you could hel;p resolve this by posting that you had the problem, your location, who your ISP is and a tracert to iotawatt.com. With enough responses, I might be able to figure out the exact cause.

I am also willing to accept a return for service if you cannot make the above work for you, but time and postage to the USA from Australia makes that a last resort.

UPDATE: 6/20/2020 Estimate that there are about 5 units still experiencing this. A few have documented using the manual update, and the rest have probably finally succeeded in downloading the update. Hopefully this will eventually resolve completely on its own.

UPDATE: 6/24/2020 It appears, from the download statistics, that most of the units affected by this issue have resolved, presumably by eventually downloading the release within the 30 second time constraint. The next hurdle is to update the MAJOR auto-updates. I don’t anticipate as many units affected in Australia, and as with the MINOR update, the rest of the world should upgrade without incident.

I updated my IoTaWatt on 16 June by manually changing Auto-update from NONE to MINOR. All went fine.

6/16/20 11:51:15 Updater: Auto-update class changed to MINOR
6/16/20 11:51:16 Updater: Update from 02_05_05 to 02_05_09
6/16/20 11:51:16 Updater: download 02_05_09
6/16/20 11:51:22 Updater: Release downloaded 5737ms, size 807424
6/16/20 11:51:31 Updater: signature verified
6/16/20 11:51:39 Updater: firmware upgraded to version 02_05_09
6/16/20 11:51:39 Updater: Firmware updated, restarting.

location: city of Hobart, state of Tasmania
ISP: Mint
tracert iotawatt.com

Tracing route to iotawatt.com [104.18.33.157]
over a maximum of 30 hops:

1 3 ms 1 ms 2 ms Broadcom.Home [192.168.20.1]
2 15 ms 14 ms 14 ms lo40.lns01.pmelnxd.vic.vocus.network [202.138.24.239]
3 16 ms 19 ms 14 ms ae2-14.per01.melbvoc.vic.vocus.network [203.134.26.33]
4 15 ms 16 ms 15 ms ae6.per03.melbvoc.vic.vocus.network [203.134.25.210]
5 15 ms 15 ms 15 ms be106-99.bdr02.mel11.vic.vocus.network [114.31.197.36]
6 16 ms 16 ms 15 ms be150.cor02.mel07.vic.vocus.network [114.31.196.54]
7 16 ms 15 ms 15 ms be151.bdr03.mel07.vic.vocus.network [114.31.196.51]
8 15 ms 15 ms 16 ms as13335.vic.ix.asn.au [218.100.78.12]
9 15 ms 15 ms 16 ms 104.18.33.157

Trace complete.

Location: Rural Victoria
ISP: Telstra (ADSL)
traceroute to iotawatt.com (104.18.32.157), 64 hops max, 52 byte packets
1 mymodem (192.168.0.1) 1.956 ms 1.257 ms 2.502 ms
2 172.18.211.19 (172.18.211.19) 28.237 ms 30.981 ms 29.942 ms
3 172.18.67.153 (172.18.67.153) 30.662 ms
172.18.67.157 (172.18.67.157) 31.145 ms 27.163 ms
4 bundle-ether4.win-edge902.melbourne.telstra.net (203.50.76.8) 26.656 ms 29.411 ms 26.333 ms
5 bundle-ether12.win-core10.melbourne.telstra.net (203.50.11.111) 30.467 ms 24.929 ms 29.181 ms
6 bundle-ether12.ken-core10.sydney.telstra.net (203.50.11.122) 38.096 ms 53.747 ms 37.494 ms
7 bundle-ether1.pad-gw11.sydney.telstra.net (203.50.6.61) 38.632 ms 44.859 ms 39.650 ms
8 bundle-ether1.sydp-core04.sydney.reach.com (203.50.13.90) 39.345 ms 39.153 ms 39.062 ms
9 i-20802.eqnx-core02.telstraglobal.net (202.84.141.25) 175.413 ms 172.728 ms 175.604 ms
10 i-93.joua01.telstraglobal.net (202.84.140.214) 227.525 ms 251.081 ms 306.049 ms
11 unknown.telstraglobal.net (202.127.69.182) 307.292 ms 242.414 ms 311.790 ms
12 104.18.32.157 (104.18.32.157) 303.922 ms 222.449 ms 339.043 ms

My IoTaWatt has been set to MAJOR for many months and is currently running 02_05_02. I switched the update class to MINOR via the UI and got the following:

6/16/20 22:19:44z Updater: Auto-update class changed to MINOR
6/16/20 22:19:46z Updater: Auto-update is current for class MINOR.

Restarted to be sure and got the same output again. Doesn’t look like I’m being offered the update?

Here’s a trace to iotawatt.com from my TPG connection in Brisbane:

traceroute to 104.18.33.157 (104.18.33.157), 64 hops max, 52 byte packets
 1  sentry (192.168.14.140)  2.897 ms  2.019 ms  2.009 ms
 2  10.20.22-5.tpgi.com.au (10.20.22.5)  14.791 ms  14.340 ms  14.670 ms
 3  60-240-241-129.static.tpgi.com.au (60.240.241.129)  15.795 ms  16.361 ms  14.854 ms
 4  203-219-107-125.static.tpgi.com.au (203.219.107.125)  31.465 ms  34.819 ms  32.559 ms
 5  syd-gls-har-int2-be200.tpgi.com.au (203.221.3.68)  29.885 ms  31.445 ms  32.097 ms
 6  cloudflare1-lacp-100g.hkix.net (123.255.90.246)  148.101 ms  150.637 ms  177.823 ms
 7  104.18.33.157 (104.18.33.157)  140.185 ms  139.779 ms  139.270 ms

I am seeing multiple IPv4 addresses for iotawatt.com: 172.67.185.165, 104.18.32.157, and 104.18.33.157. All have the same trace. (Aside: my IPv6 connectivity is via a tunnel and hits an endpoint in Sydney.)

Manual downloads have always seemed fast:

$ curl -4 -o /dev/null https://iotawatt.com/firmware/bin/02_05_09.bin
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  788k  100  788k    0     0   591k      0  0:00:01  0:00:01 --:--:--  591k

Interestingly, a trace to www.cloudflare.com shows my ISP does have decent connectivity for at least some CloudFlare IPs. Looks like same city:

traceroute to www.cloudflare.com (104.17.210.9), 64 hops max, 52 byte packets
 1  sentry (192.168.14.140)  1.879 ms  2.221 ms  1.359 ms
 2  10.20.22-5.tpgi.com.au (10.20.22.5)  15.030 ms  16.515 ms  14.659 ms
 3  60-240-241-193.static.tpgi.com.au (60.240.241.193)  15.881 ms  16.432 ms  14.574 ms
 4  bri-pow-que-crt3-be-10.tpg.com.au (203.219.107.57)  15.865 ms  15.574 ms  15.840 ms
 5  60-240-241-19.static.tpgi.com.au (60.240.241.19)  16.482 ms
    60-240-241-83.static.tpgi.com.au (60.240.241.83)  15.735 ms
    60-240-241-19.static.tpgi.com.au (60.240.241.19)  15.606 ms
 6  be300.bqueebrdr11.aapt.net.au (203.219.107.210)  15.427 ms  15.429 ms  16.897 ms
 7  bundle-ether16.cha-edge902.brisbane.telstra.net (165.228.23.17)  16.089 ms  15.992 ms  15.918 ms
 8  clo2241696.lnk.telstra.net (139.130.42.194)  27.901 ms  17.082 ms  16.151 ms
 9  104.17.210.9 (104.17.210.9)  15.959 ms  16.675 ms  15.758 ms

Since it looks like I’ll probably be affected by this issue (it has taken multiple attempts in the past to get updates to successfully download), I am changing the update class to NONE for now.

@AndrewHeard,
Your update went fine, an dthe tracert looks like you have a solid internet connection.

@Whatto,
I’m suspicious of the ADSL. How long does it take to download https://iotawatt.com/firmware/02_05_09.bin? The tracert, if I’m reading it correctly, appears to have a lot of latency.

@jasoncodes - Your connection looks reasonably fast, and I would expect the update to succeed. The reason your unit won’t update when you set MINOR is that I have suppressed MINOR 02_05_02 units from updating due to this problem. If you want to have a go at updating to 02_05_09, just set your auto-update class to ALPHA or BETA. If you don’t want to risk this update problem, you can just use the manual update procedure above, but set your auto-update to NONE before doing so as if you leave it MAJOR, it will immediately revert to 02_05_02, which is the current MAJOR release.

02_05_09 should not have this restart problem.

as bad as ADSL can be, not this time. This report is getting it to my browser but it’s still nowhere near the 20 secs it takes to time out:
image

I had some spare time again just now to deal with any fallout and tried this update again, this time setting the update class to ALPHA. Looks like as I expected, I was affected. Took a number of tries but it eventually barely succeeded before the watchdog triggered:

6/17/20 04:27:01z Updater: service started. Auto-update class is ALPHA
6/17/20 04:27:04z Updater: Update from 02_05_02 to 02_05_09
6/17/20 04:27:04z Updater: download 02_05_09
6/17/20 04:27:34z dataLog: datalog WDT - restarting

** Restart **

[trimmed]
6/17/20 04:27:40z Updater: service started. Auto-update class is ALPHA
6/17/20 04:27:41z Updater: Update from 02_05_02 to 02_05_09
6/17/20 04:27:41z Updater: download 02_05_09
6/17/20 04:28:10z dataLog: datalog WDT - restarting

** Restart **

[trimmed]
6/17/20 04:28:19z Updater: service started. Auto-update class is ALPHA
6/17/20 04:28:20z Updater: Update from 02_05_02 to 02_05_09
6/17/20 04:28:20z Updater: download 02_05_09
6/17/20 04:28:50z dataLog: datalog WDT - restarting
6/17/20 04:28:50z Updater: Release downloaded 29274ms, size 807424
6/17/20 04:28:50z WiFi disconnected.

** Restart **

[trimmed]
6/17/20 04:28:55z Updater: service started. Auto-update class is ALPHA
6/17/20 04:28:56z Updater: Update from 02_05_02 to 02_05_09
6/17/20 04:28:56z Updater: download 02_05_09
6/17/20 04:29:12z Updater: Download failed HTTPcode 
6/17/20 04:29:14z Updater: Update from 02_05_02 to 02_05_09
6/17/20 04:29:15z Updater: download 02_05_09
6/17/20 04:29:45z Updater: Release downloaded 29403ms, size 807424
6/17/20 04:29:59z Updater: signature verified
6/17/20 04:30:07z Updater: firmware upgraded to version 02_05_09
6/17/20 04:30:07z Updater: Firmware updated, restarting.

** Restart **

SD initialized.
6/17/20 04:30:14z Real Time Clock is running. Unix time 1592368214 
6/17/20 04:30:14z Reset reason: Software/System restart
6/17/20 04:30:14z Trace:  9:0, 9:1, 8:4, 8:6, 8:8, 8:9, 9:3, 9:5, 9:9, 1:2, 1:3, 1:4, 1:5[19], 1:6, 1:1[2], 1:2[3], 9:0[3], 9:0, 9:1, 8:4, 8:6, 8:8, 8:9, 9:3, 9:5, 9:9, 1:2, 1:3, 1:4, 1:5[5], 5:0, 5:7
6/17/20 04:30:14z ESP8266 ChipID: 6920572
6/17/20 04:30:14z IoTaWatt 4.x, Firmware version 02_05_09
6/17/20 04:30:14z Updater: Installing update files for version 02_05_09
6/17/20 04:30:14z Updater: Installing EDIT.HTM
6/17/20 04:30:14z Updater: Installing GRAPH.HTM
6/17/20 04:30:14z Updater: Installing GRAPH.JS
6/17/20 04:30:15z Updater: Installing GRAPH2.HTM
6/17/20 04:30:16z Updater: Installing GRAPH2.JS
6/17/20 04:30:16z Updater: Installing INDEX.HTM
6/17/20 04:30:18z Updater: Installing TABLES.TXT
6/17/20 04:30:18z Updater: Installing CNFSTYLE.CSS
6/17/20 04:30:18z Updater: Installation complete.
6/17/20 04:30:19z SPIFFS mounted.
[trimmed]
6/17/20 04:30:25z Updater: service started. Auto-update class is NONE

Manually updated on 18/6/20 after being unable to access the web interface for a few days.
After killing the WAN connection I was able to set update class to “NONE”, however the Graph+ interface was broken (e.g., appearance similar to unloaded CSS on a web page).

During the udpate period, “power” for three inputs appears to have been recorded as normal, however the kWh accrual for this period has broken (registering about 30% of typical consumption). I’m guessing this is due to being “down for maintenance” repeatedly.

I have manually updated from 02_05_08 to 02_05_09 and all is working now.

ISP - TPG Internet 50/20 MBps on fibre to the premises in regional WA. Good wifi signal.

Tracert:

Tracert - cloudfare

System log

1 Like

Thanks, that’s helpful. Disabling the WAN (internet) is a good way to get a window to set auto-update to NONE. While the WAN is off, it’s expected that Graph+ would not work as it gets a lot of its JavaScript from a CDN on the Internet.

While still only a few samples, a common thread that is emerging is Telstra. That may be because they are ubiquitous in Oz (I don’t know), or it may be they have some type of throttling in place for whatever reason. Worldwide, I am only seeing this problem in Australia.

The problem definitely looks to be high RTT times to iotawatt.com. Some Googling suggests the ESP IP stack has quite a small TCP window (maybe a couple of packets in flight at a time) so high RTT would greatly affect throughput.

Telstra, Optus, and TPG (which includes iiNet, Internode, AAPT, and many more) are the 3 biggest ISPs in Australia. https://lg.telstraglobal.com/, http://looking-glass.optus.net.au/lg/, and http://looking-glass.connect.com.au/lg/ are all showing the same issue.

My work’s Vocus connection shows iotawatt.com going to Victoria and www.cloudflare.com staying local in Brisbane. Looks like lots of less than ideal routing for the IPs serving iotawatt.com. I wonder if there’s just a missing/incorrect BGP rule somewhere.

Buffer space severely limits packet size and flow.

The firmware should be cached on Cloudflare.

Had the same issue, constant red green red led, and restarting.last Monday I ended up pulling the SD card out and placed in PC, I found a thread that suggested deleting the history files based on the Led pattern, I did this and has worked ever since.
Im on 02.05.02, I’m on Major now but I was on minor.
I’m also on Telstra fibre(velocity) at 100Mbps, location Melbourne.

That may not have been the same issue. In any event it’s not the same resolution. Setting to MAJOR only puts off the inevitable. The problem is with 02_05_02, not the new release. Sometime in the not too distant future, MAJOR. will be set to 02_05_09. Right now would be easy for you to set auto-update to NONE and perform the manual update procedure described in the first post of this thread.

Will do. I’ll give it an update in the next few days. Thanks.

After completing all of these steps, I assume we should leave auto-update class set to “NONE” for now?

If you are now on 02_05_09, you can set auto-update to ALPHA, BETA, or MINOR with no problem.