IoTaWatt rebooting every day

Hi, just happened to notice that my unit is rebooting everyday and trying to figure out why? Here is the log from the last couple days.

Datalog Watch Dog Timer (WDT) exception is very rare. The datalog WDT will trigger if the datalog is not updated after 5 minutes. There is not much information available to diagnose what is causing this, but I can offer some possibilities:

The trace indicates that the web-server was active at the time of the WDT expiration. Because of the single -thread operation of the ESP8266, the web server monopolizes the CPU when handling a transaction. For the most part, transactions are handled relatively quickly and have little or no impact on sampling and data recording. However, there are some requests that can take extra time to handle. In particular, queries can take a long time as they can require reading a lot of data from the SDcard. For that reason, queries length is limited by default but there is an overide LIMIT parameter that will allow longer queries. If you are using query explicitly you might check the response times for those queries (the browser debugger will time transactions). Even if not doing explicit queries, the home assistant integration uses query to gather IoTaWatt data. There could be some issue with that. A simple test would be to disable the integration for a day or so and see if the problem is mitigated.

I wouldn’t rule out datalog corruption or SDcard issues but there is no direct historical link to those types of problems and datalog WDT restarts, so HA involvement would be my first choice.

K, will disable the HA integration and see. Thanks.

So I disabled the HA integration and I still see it happening (last 2 days). Anything else I could try before doing something with the SDcard?

Don’t have any ideas. SDcard would be my next step.

Would it hurt leaving it alone other than the possibility of the SDcard failing?

I wonder if I got a bad one since it has been only a year of having it.

Don’t know. When the datalog times out it had no entries for five minutes, which means no data logging. It’s possible that whatever is causing the timeout occurs much more frequently but doesn’t quite last five minutes, thus leaving holes in your log and skewing the log data.

Thinking more about this, the datalog Service runs at a high priority and should never miss a single 5 second tic, much less 60 of them. Right now I’m thinking it may be WiFi related. The trace shows the web-server is the last thing called before the timeout. It’s possible that’s the problem - probably more likely than an SDcard issue.

If you want to try that, I’d recommend that you disconnect the unit from WiFi from the Tools->>WiFi menu. The LED should go dull red. Leave it that way for two days. You will not be able to access it but it will still be running and should be logging.

After two days (MTBF seems to be less than 24 hours), power cycle the unit. You will get RGG LED so follow the docs to connect it back to your WiFi. Once connected (dull green led) look at the message log to see if it restarted during the disconnected period.

2 Likes

So I checked the log this morning (HA integration still disabled) and the highlighted message came up (heap memory has degraded below safe minimum, restarting. RSSI was -45db).

Does that help narrow anything down?

Well it reinforces the WiFi suspicion. Heap memory degradation is almost always associated with WiFi issues. I don’t believe there are any memory leaks in the IoTaWatt firmware, so my assumption is that there is one in the IP stack, probably the LWIP code supplied with the ESP8266/Arduino IDE, and it seems associated with exception handling.

It usually happens with poor RSSI, but there have been some cases reported where the RSSI very good like yours. Maybe too good. In any event, have you tried the disconnect experiment the I suggested above?