Degraded Heap, Multiple Starts and Confusing Messages

OddJob · July 5, 2019, 6:45pm

Yes, historical data uploaded OK, except for the missing heap memory figures. However there was another ‘Heap memory has degraded below safe minimum’ occurrence just before midday causing IoTaWatt to restart. If this happens frequently I’ll give you another shout along with a copy of the message log.

OddJob · March 9, 2020, 10:27pm

Hello Bob,

Since my last communication quite a few things have happened. Hopefully, with this brief summary I’ll manage to convey the pertinent information. During the autumn 2019, gaps in the heap graph became fairly frequent. It was also noticed that the internet disappeared for lengthy periods on an irregular basis as well (sometimes more than an hour). As our modem and separate router were fairly old, a new combined modem/router and high speed switch unit were obtained in November. To ensure consistent Wi-Fi communications with the IoTaWatt, the original router stayed in the same physical position it has always been. Although the heap situation improved a bit, the period between restarts did not achieved significant durations. A Powerwall2 was installed recently which required changing one CT (100A to 50A) and adding a further 50A CT.

Having remembered a comment that the amount of data stored in memory before sending it off to the logging servers was possibly causing heap issues, I reduced the ‘Post Interval’ from 10 secs to 5 secs on Emoncms and InfluxDB server setup pages. I found that this option was not available on the PVOutput server setup page. When I looked at PVOutput itself I found that the ‘Post Interval’ & ‘Poll Interval’ were already set to their lowest value of 5 mins. I thought it worth trying these changes as they might be a possible alternative to compressing the data before sending it, which I seem to recall was on your ‘TO DO’ list. Unfortunately, there appeared to be no benefit from the shorter time settings. Current screenshot of the ‘Inputs’

Copy of the iotamsgs.txt file starting from early July 2019
IoTaWatt Message Log 09_03_20.txt (303.2 KB)
I think it important to point out that a few of the recorded restarts may have been caused by activities associated with several nearby housing developments! I live in hope that something can be identified that will result in less frequent restarts.

overeasy · March 10, 2020, 12:53am

I don’t recall that comment. The data is all stored on the SDcard. It’s true that larger buffers are needed when filling a backlog, or when “bulk send” is greater than 1, but the “post Interval” is the frequency that posts are sent. Changing that from 10 to 5 doubles the amount of data that needs to be uploaded. Not that it’s a particular problem, but could be if you are sending a real lot of data, particularly to influxDB.

Not gonna get into neighbor disputes. Besides, I doubt it.

I’ve looked at the recent logs. Since Jan 20 when your system upgraded from 02_05_00 to 02_05_02, you have had a restart on average every 2-3 days. First let me point out that a restart takes about 5 seconds and no data is lost, so even if that were SOP, not a big problem.

But the good news is just about every one of those restarts can be linked to heap problems, and that is consistent with 2-3 day MBTF. Use of passwords typically exacerbates the problem up to that release.

The two major contributing problems to heap degradation are WiFi errors and password authorization. Both seem to be fixed in the current ALPHA and BETA release 02_05_05. It’s not anything I did, it’s an updated core from Espresif with a bunch of WiFi and lwip fixes. I had exhausted the possibilities for those problems originating in the IoTaWatt firmware and fixing it there. They seem to have disappeared with this new core. Here are some units running on the old core back in December. The green one has all three upload services configured and passwords. You can see where it restarted twice.

Here’s that same system this week running 02_05_05.

Here’s the message log from when 02_05_05 was installed 11 days ago.

02_05_02 is currently the MINOR and MAJOR auto-update release. There are thousands of users, so I push these releases out slowly in the hopes of catching anything nasty with the ALPHA and BETA users. This core changed with 02_05_03, and now you see 02_05_05 out there because there were issues. There are currently no issues outstanding with 02_05_05 and I plan to advance it to MINOR auto-update within a week or two. If all remains well, it will probably be MAJOR by mid April.

You can stick with MAJOR class and accept the relativelky benign 2-3 day MBTF, or you can change to auto-update BETA and see where you stand with that.

BTW/ I don’t see anything confusing in the message log. There’s a lot of information in there that helps me respond to reports like this. On the contrary, I think the message log helps eliminate the confusion inherent in the disconnect between reported symptoms and underlying problems, if any.

OddJob · March 13, 2020, 9:14pm

Thank you for explaining why I had miss-understood your earlier reply at 12 above.

I was comparing the frequent restarts with the inbuilt long period time based restarts I noticed during autumn/winter 2017, so I’m pleased they do not indicate a problem.

I am content to wait until 02_05_05 is released. So far the IoTaWatt data has identified a failing freezer, provided a realistic figure for the % solar energy used and also helped with the decision to invest in a Powerwall2.