Constant restarts from heap memory has degraded below safe minimum

So the heap values were consistently uploaded, but I have no values for the CTs going back. Since the restart the value for CTs are now being uploaded to the influxdb, but no prior values were uploaded. I tried changing the data uploader “upload_history_from” and restarting but that didn’t help.

Screenshot of plot of both heap (pink) and one of the CTs (blue).

I’m looking at the two status displays and they look like they are from different IoTaWatt. You don’t have any tags defined in the first influx upload. Are there any in the second? Are you uploading any measurements in both with the same name?

When starting or resuming an uploader, IoTaWatt first does a query to discover the last measurement that was made, and resumes from that point. If there is a duplication between the two uploads, it will create an ambiguity. That’s why it’s recommended with multiple IoTaWatt to add a tag that identifies the source IoTaWatt. That disambiguates.

What is the current status? Are both IoTaWatt updating influx and are both current?

Ah sorry, yeah the latest status was from the second iotawatt. I don’t have tags set on both, but no measurements have the same name. Only one of them is uploading the heap size.

Currently both are uploading fine to InfluxDB and HA, with HA updating at an interval of 60s. I tried some different update intervals, 10s caused them to quickly crash again. Though 15s seems ok.

It does seem strange to me that heap size decreases the more often HA query occurs, is that expected?

Hi @overeasy, any thoughts on why heap goes down porportional to how often the query is run? Do you see that when you adjust the timing?

I think you likely weren’t able to repro because the heap value didn’t go down enough to crash the device. Maybe mine has a lot more CTs, causing more memory usage.

Assuming the query should finish evaluating within a few seconds, it seems strange to me that the heap is going down for values 60s, 30s, and 15s.

If I could reproduce this, I would be able to diagnose and possibly fix it. But I have not had any success. I have several units running with the 10 second update and plenty of CTs. No issues.

I didn’t see it going down for your 60s stretch. Also, while it’s a suspect, I’m not convinced it’s the query. The overwhelming cases of chronic heap degradation are related to poor WiFi and disappear when that is resolved. My feeling is that the lower-level code may not do adequate cleanup after error recovery. Every query that you do involves a lot of WiFi activity and transfers quite a bit of data.

That said, I’m not a fan of the HASS integration at all. The original version that was contributed was very simple and did all that was needed. When it was merged into HASS, folks there hacked it up in the interest of expediency and with disregard to my objections about useability and long-term ability to refine it to take advantage of the unique capabilities of IoTaWatt. In short, they threw it over the wall to support their new Energy module, which IMO is a shortsighted piece of junk with no data integrity.

My iotwatts have an RSSI of -52, what values would you consider “poor wifi”?

That is a good RSSI. I generally consider -70 to be the boundary.

@overeasy I am experiencing this issues on my new IotaWatt. It arrived with version 2_7_5. I read through the comments here and relocated my access points to improve the wifi connection. Same issue. I’ve since switched to beta 2_8_0. It happens less frequently now but still occurs. Thoughts?

** Restart **

SD initialized.
11/14/22 21:31:27z Real Time Clock is running. Unix time 1668461487
11/14/22 21:31:27z Reset reason: Software/System restart
11/14/22 21:31:27z Trace: 10:58, 25:20, 25:30, 25:31, 10:56, 25:1, 25:1[2], 25:1[3], 10:59, 1:3, 1:1[6], 1:2[7], 9:0[7], 9:0, 9:1, 8:4, 8:6, 8:8, 9:3, 9:5, 9:9, 1:3, 1:3, 1:6[1], 1:6[2], 1:6[3], 1:5[21], 1:6[4], 21:0, 21:1, 21:10, 21:10
11/14/22 21:31:27z ESP8266 ID: 15763464, RTC PCF8523 (68)
11/14/22 21:31:27z IoTaWatt 5.0, Firmware version 02_08_00
11/14/22 21:31:27z SPIFFS mounted.
11/14/22 16:31:27 Local time zone: -5:00, using DST/BST when in effect.
11/14/22 16:31:27 device name: IotaWatt
11/14/22 16:31:27 HTTP server started
11/14/22 16:31:27 timeSync: service started.
11/14/22 16:31:27 statService: started.
11/14/22 16:31:27 dataLog: service started.
11/14/22 16:31:27 dataLog: Last log entry 11/14/22 16:31:20
11/14/22 16:31:32 historyLog: service started.
11/14/22 16:31:32 historyLog: Last log entry 11/14/22 16:31:00
11/14/22 16:31:36 WiFi connected. SSID=****************, IP=192.168.0.62, channel=11, RSSI -58db
11/14/22 16:31:36 Updater: service started. Auto-update class is BETA
11/14/22 16:31:37 Updater: Auto-update is current for class BETA.
11/14/22 19:21:08 Heap memory has degraded below safe minimum, restarting.

While I agree that your WiFi has good vitals, it would be worth trying to change a few things. Unless you live on an otherwise uninhabited island, there are probably other APs within earshot that are using channel 6 - arguably the most popular channel. There are also more and more IOT devices, cordless phones etc. operating in the same bandwidth. It would be worth trying to use one of the other non-overlapping channels 1 or 11 to see what happens.

I am also curious that the trace shows a query ending before the low heap was detected. Can you shine any light on what that might have been. Was it a one-off query from Graph+? Do you have Home Assistant running and if so, what is the polling frequency?

I’m running home assistant. I have a lot of wifi devices running on a Unifi 4 ap network. I rotate channel as often as they report interference. I have 3 aps in range that are on the other channels, 1,11 for 2.4 and a bunch of others on 5. The device doesnt change channels easily. I’ve reset the wifi setup a few times now in order to get it on another channel.

I’m running default scanning interval of 30 secs. I can override in the config. Do you have a suggestion on polling times?

How often does it restart? Can you stop HA and see if improves?

It restarts every 3 hours or so now that I’ve improved the WiFi. The wifi was -75~ -79, it was restarting every 20mins. I’ll disconnect it from home assistant overnight and check the logs.

1 Like

same behavior overnight with the configuration removed from HA. I have no other data sources attached. after a few hours, reboot. I forgot to mention, the sd card failed shortly after I received my unit. I had a fast sd card laying around so I replaced it and it came right up after copying files over.

** Restart **

SD initialized.
11/15/22 08:40:57z Real Time Clock is running. Unix time 1668501657
11/15/22 08:40:57z Reset reason: Software/System restart
11/15/22 08:40:57z Trace: 10:58, 25:20, 25:30, 25:31, 10:56, 25:1, 25:1[2], 25:1[3], 10:59, 1:3, 1:1[3], 1:2[4], 9:0[4], 9:0, 9:1, 8:4, 8:6, 8:8, 9:3, 9:5, 9:9, 1:3, 1:3, 1:6[1], 1:6[2], 1:6[3], 1:5[21], 1:6[4], 21:0, 21:1, 21:10, 21:10
11/15/22 08:40:57z ESP8266 ID: 15763464, RTC PCF8523 (68)
11/15/22 08:40:57z IoTaWatt 5.0, Firmware version 02_08_00
11/15/22 08:40:57z SPIFFS mounted.
11/15/22 03:40:57 Local time zone: -5:00, using DST/BST when in effect.
11/15/22 03:40:57 device name: IotaWatt
11/15/22 03:40:57 HTTP server started
11/15/22 03:40:57 timeSync: service started.
11/15/22 03:40:57 statService: started.
11/15/22 03:40:57 dataLog: service started.
11/15/22 03:40:57 dataLog: Last log entry 11/15/22 03:40:55
11/15/22 03:41:01 WiFi connected. SSID=**************, IP=192.168.0.62, channel=11, RSSI -58db
11/15/22 03:41:01 Updater: service started. Auto-update class is BETA
11/15/22 03:41:02 historyLog: service started.
11/15/22 03:41:02 historyLog: Last log entry 11/15/22 03:40:00
11/15/22 03:41:03 Updater: Auto-update is current for class BETA.
11/15/22 06:26:10 Heap memory has degraded below safe minimum, restarting.

There is still a query right before the low heap is detected. Are you sure the HA integration was stopped? There is usually a lot of restarting involved with getting HA to accept change.

I’m getting curious about exactly how the queries are implicated here. Could you download your /iotawatt/iotamsgs.txt file and post it to me in a PM (private message) please? I’d lik eto see if it’s involved in most or all of these restarts. Could be the breakthrough I’ve been looking for.

Message sent with the logs. I restarted HA just now. I’ll let you know if it happens again.

Wow, I’ve come to join the party :smiley: I can’t even get the web interface to show up anymore. I’m having trouble even with DHCP sometimes, it went down today, happy to send all my logs I pulled the iotawatt apart thinking it needed reflashing. Perhaps not. I also have influxdb and HA running, though it looks like possibly influx uploading has been failing. Happy to try some things.

Disabled HA integration, iotawatt started up better than it has all day and web interface is back.

There is an undocumented feature that uploads free heap size to influx with each write. To turn it on you need to manually add the json object.

“heap” : true,

to the influx object in your config.txt file. You can then plot the heap size over time and see if there is gradual or sudden loss.

I’ve been able to stabilize my install. Based on @overeasy being confident its a wifi problem, I created a new SSID, placed it on a vlan different than the bulk of my devices, and disabled MDNS. 30hrs and running now. I’ll start walking the changes back to identify exactly what fixed it but so far, it seems to be either the device noise or MDNS.

I am running iotaswatt firmware ver 275 and a current Unifi UDMPRO.