02_03_06 New ALPHA

02_03_06 is now released for ALPHA auto-update class. This release incorporates the latest ESP/Arduino core release 2.4.1 with lwip 2.0 (The IP layer).

The newer lwip seems to solve problems with accessing over a VPN and some port-forward routers. It also seems to be a little quicker.

The new core also included some housekeeping by the developers and resulted in some significant increases in free heap. I also made a pass through the IoTaWatt firmware to reduce heap, mostly by utilizing more progmem for const char strings. The net result is quite a lot of heap is now available - typically well in excess of 20K. Hopefully this will enable use of TLS for select communications in the near future.

Also included is a rework of the way the web server handles requests. This was done to pave the way for user authorization, but also turn out to save quite a bit of heap by not using the webserver handler data structures in heap.

influxDB posting, particularly history upload, is quite a bit faster now with a persistent connection and tls 2.0. Looking into using compression to further speed things up.

2 Likes

Just checked the log and my unit is yet to upgrade to this version. I did a software reset as below but the updater did not ‘see’ a new version.

o clock yet: SD initialized.
5/19/18 05:05:47 Real Time Clock is running. Unix time: 1526706347
5/19/18 05:05:47 Version: 02_03_05
5/19/18 05:05:47 Reset reason: Software/System restart
5/19/18 05:05:47 Trace: 9:1, 8:4, 8:6, 8:8, 9:3, 9:4, 9:5, 9:9, 1:2, 1:3, 10:2, 10:3, 1:3, 1:4, 1:5, 7:0, 7:7, 1:6, 1:3, 1:4, 1:5, 7:0, 7:7, 1:6, 1:3, 1:4, 1:5, 7:0, 7:7, 1:6, 1:1, 9:0
5/19/18 05:05:47 ESP8266 ChipID:2991982
5/19/18 15:05:47 device name: IotaWatt, version: 3
5/19/18 15:05:47 Local time zone: 10
5/19/18 15:05:47 MDNS responder started
5/19/18 15:05:47 You can now connect to http://IotaWatt.local
5/19/18 15:05:47 HTTP server started
5/19/18 15:05:47 dataLog: service started.
5/19/18 15:05:48 dataLog: Last log entry:1526706345
5/19/18 15:05:48 statService: started.
5/19/18 15:05:48 timeSync: service started.
5/19/18 15:05:48 WiFi connected. SSID: Telstra82C2AA, IP: 10.0.0.173
5/19/18 15:05:48 Updater: started.
5/19/18 15:05:48 historyLog: service started.
5/19/18 15:05:48 historyLog: Last log entry:1526706300
5/19/18 15:05:52 influxDB: started.url: 10.0.0.166,port=8086,db=iotawattnew,post interval: 10
5/19/18 15:05:53 influxDB: Start posting from 5/19/18 15:05:50

Unfortunately, I don’t get the update either.

My bad, should be available now.

Update just finished and all working so far.

It works great! I am travelling through Germany and can access the installed devices via VPN and port sharing as if I were on site. Many thanks for that!

Thanks for the report. Next step will be to secure that somewhat with digest authorization.

I tried updating to this new alpha a number of times tonight and kept getting the following:

5/20/18 09:49:39 Updater: Update from 02_03_05 to 02_03_06
5/20/18 09:49:39 Updater: download 02_03_06
5/20/18 09:49:48 Updater: Download failed HTTPcode -11

-11 appears to be a timeout. I retrieved the same URLs manually from my Mac without issue. Not sure what the problem was. I ended up downloading the firmware from the GitHub releases page and flashing manually with ESPtool.

I too can confirm access over my VPN is working. The automatic restart on WiFi loss will be most welcome too. Cheers.

Well, glad you got it updated, but be aware that a release is more than the ESP firmware, there are half-dozen associated files. Not much difference in those here but for future reference, a new firmware may not work properly without the rest of the release files.

Your Mac, I’m sure, has more robust communications than the light-weight stuff in the ESP. That said, this release has some improvements to that in the new core, and the flow control in the firmware has been improved, so maybe future updates will go smoother. If your IoTaWatt is not in a strong wifi area, you might consider installing a plug in repeater. They cost around $20 US. I have a D-link unit to service an outbuilding and it makes a huge difference.

Good point. In this case I diffed the two tags and found only firmware source to be changed so didn’t bother updating the filesystem.

That’s what I’m hoping. I hadn’t had any issues getting the automatic updates to work in the past.

Also a good tip. In my case I have a hardwired access point only a few metres away. There is a lot of 2.4 GHz noise in my area though (apartment building) which is why I wanted to get this new build installed.

Cheers again for this update. The tech details on what has been worked on is also appreciated.

After a few days running this version I noticed that the Influxdb logging had stopped. After a hard restart, rebooting the RaspberryPi that had a problem, and an internet failure for a few hours by the network in the last few days and adding a wifi extender I can now see data logging again.

A problem is that the upload is quite slow. It updates a few minutes of records and then seems to stop for a few minutes for a rest before uploading a few more minutes records. Is that expected behavior ?

Not expected, but a symptom of a problem that I had believed solved. At issue is the new core and specifically lwip 2, the updated IP layer. The new IP layer is supposed to solve a rash of problems associated with the old 1.4 layer, and it does. Unfortunately, it brings with it some new (old?) problems that were not evident in 1.4. This is a relatively new component, and there are some issues filed against it that are being actively addressed. This looks like one of them.

There is a problem called TIME-WAIT where a server does not explicitly close a connection and the ESP waits about two minutes before doing so unilaterally. Unfortunately, each open connection consumes memory, so in aggregate, the problem eats up the heap if there are a lot of transactions.

There is a safety valve in IoTaWatt where transactions are suspended when heap falls below a threshold. That appears to be what you are experiencing. There is another defense against this problem, which is to simply continue to use the existing connection for subsequent transactions. That’s what other IoTaWatt are doing, and it’s actually very fast. The question is why yours isn’t doing that.

There are some changes coming in the core to limit the number of connections that can be left in TIME-WAIT state, and those should be available soon as well. So I’m optimistic that will be overcome.

In this case, I’d like to better understand what’s different here. My experience is that if it can run in Australia, it can run anywhere. There are communications challenges there that seem to bring out the worst in a firmware and so I’d like to take advantage of this opportunity to fix what can be fixed.

First, could you restart the IoTaWatt and note the amount of Heap immediately after restart? Then watch as it degrades, and tell me how long before it get to the paused state.

Now restart again and when the heap gets below, say 19K or so, hit the STOP button on the influx tab of the status display. When it shows stopped, note the available heap. Wait a few minutes and see if the heap comes back to something near restart levels.

Next, could you explain the environment better. Are you also uploading to Emoncms? Is it local on the Rpi or is it Emoncms.org? Is the raspberry pi on your local LAN? Is the instance of influxDB on the RPi? What version of influx is it?

If you are using Emoncms, could you stop the influx posting and let Emoncms update until current. Note if the heap stays healthy during that process. When it’s up to date, restarting influx, does it work any better?

All this will help me to understand what’s unique about this occurrence.

Thanks

EDIT: Also noticed the null entries in some of the HTML tables of your status display. Those appear to be peculiar to Edge and somewhat benign. I will look into that as well.

Ill do the restart later today and post results - is that a software restart or power up and down ?

I can PM the port forwarding and you can see if you can get in from other side of world.

Not uploading to Emoncms.
RaspberryPi is on local LAN
InfluxDB is on RPi
Version 1.4.2

Edge looks like reason for this strange behaviour too.

image

image

I have sent you two screen capture files so you can see the results.

Heap after restart - 23448
Pauses 22 seconds later - heap now 14034

Heap at restart - 20536
24 seconds later - heap at 17768 and stop server now heap jumps to 21000 and stays there
26 seconds later - start server again and heap starts to fall and upload starts but stall again after only a short time

It certainly looks like the problem I was having, but you would need to wait a few minutes (at least three) to see if the heap comes back. The timeouts take about two minutes, and it takes longer to get all the heap back. Although I’m 90% sure, I need to verify that it all comes back. At issue is whether the memory leak is in the IoTaWatt code or lower level code. If it all comes back, it’s not IoTaWatt code - that would be a permanent leak as I have nothing that would be released after two minutes and that is the exact symptom explained by the ESP folks.

Next is why this happens to you. I’m also running influx 1.4.2. Is it possible there are a lot of communications problems with your LAN? Not sure why that would cause this, but looking for something different to explore.

So I waited longer this time and memory all came back.

Heap before restart - 14224
Heap after restart - 22640
Drops to about 16720 and stop server
Now heap jumps immediately to 19160 and stays there
Exactly 2 minutes later heap jumps to 20312 then climbs to 26864 over next 20 sec and then stays there.

Not a particularly complicated LAN setup. Wifi Router and ethernet hub and wifi repeater (as of last night). RaspberryPi connected to hub. IotaWatt now connected to wifi repeater. A couple of phones, a laptop, a wifi camera, a tv and two other gaming pc’s on wifi but overnight nothing much being used.

Thanks. That confirms the TIME-WAIT problem. I’ll check the status of the “fix” to the core.

I too have encounted the issue where InfluxDB logging has stopped sending. The status screen still says “Running” and the timestamp is a few seconds before “WiFi disconnected.” and “WiFi connected.” (one second apart) in the logs. Data logs continue to increment fine. It had been in this state for 12 hours and had not recovered itself. Uptime when the WiFi dropped was about 6.5 days.

I have been monitoring the free heap since it was mentioned a few days ago here. It had been quite stable but there is a noticeable change at the point where WiFi dropped and reconnected:

Clicking the “Stop” button on the InfluxDB status did not unfreeze it. Everything is working again though after restarting via the web UI.

Update: I too am now seeing the issue described above where uploading pauses for approximately 2 minutes and coincides with low free heap:

I suspect once it eventually catches up to realtime, the free heap will stabilise again.