Rash of current log failures on 02_03_13


#1

There have been about a half-dozen reports of systems “hanging” or otherwise becomming non-responsive. All of these are running release 02_03_13. There were a couple of incidences of this in the ALPHA period which began on August 18. 02_03_13 went MAJOR a month after that on September 12.

As a result of the two failures identified during ALPHA, some changes were made:

  • Remove recent performance related changes to the datalog routines that may be contributing to the problem.
  • Add some code to try to recover from one of the types of damage.
  • Add code to produce a diagnostic file from a damaged current log because they are just too large.
  • Add code to delete a damaged log and restart with a fresh log (history log remains)

These changes are in 02_03_16 which has just been made MINOR and MAJOR (Oct 6).

As a result, systems that encounter this problem on 02_03_16 will appear to hang for anywhere up to an hour after a restart. The LED will blink RED-GREEN-RED while this is in progress. It must run to completion. Then the unit will restart with a new current log. The history log will remain intact with all history.

If this happens to you, please look for the file /iotawatt/logDiag.txt and if you have one, please post it.


#2

So, my device seems to have been one of the victims of the issue. I noticed a few days ago that its web server seemed to not be responsive. I got to the main page but none of the buttons worked. This did not concern me too much because that is normal on some of my devices (they all have different browsers on them). But, then today I noticed that it was back to working on my Android phone, but it seems that it has deleted ALL of the history and current logs. There is no file (other than burden.txt) on the memory card from the UI.

Here is the log
10/06/18 11:22:04 Updater: Update downloaded and signature verified
10/06/18 11:22:14 Updater: firmware upgraded to version 02_03_16
10/06/18 11:22:14 Firmware updated, restarting.

** Restart **

SD initialized.
10/06/18 19:22:25z Real Time Clock is running. Unix time 1538853745
10/06/18 19:22:25z Version 02_03_16
10/06/18 19:22:25z Updater: Installing update files for version 02_03_16
10/06/18 19:22:25z Updater: Installing EDIT.HTM
10/06/18 19:22:26z Updater: Installing GRAPH.HTM
10/06/18 19:22:26z Updater: Installing GRAPH.JS
10/06/18 19:22:26z Updater: Installing INDEX.HTM
10/06/18 19:22:28z Updater: Installing TABLES.TXT
10/06/18 19:22:28z Updater: Installing CNFSTYLE.CSS
10/06/18 19:22:28z Updater: Installation complete.
10/06/18 19:22:28z Reset reason: Software/System restart
10/06/18 19:22:28z Trace: 18:2, 18:3, 18:2, 18:3, 18:2, 18:3, 18:2, 18:3, 18:2, 18:3, 18:2, 18:3, 18:2, 18:3, 18:4, 18:5, 1:6, 1:3, 1:4, 1:5[19], 1:6, 1:3, 1:4, 1:5[19], 1:6, 1:3, 1:4, 1:5[19], 1:6, 1:3, 1:4, 1:5[5]
10/06/18 19:22:28z ESP8266 ChipID: 2222239
10/06/18 19:22:38z SPIFFS mounted.
10/06/18 11:22:38 Local time zone: -8
10/06/18 11:22:38 device name: IotaWatt, version: 3
10/06/18 11:22:38 MDNS responder started
10/06/18 11:22:38 You can now connect to http://IotaWatt.local
10/06/18 11:22:38 HTTP server started
10/06/18 11:22:38 timeSync: service started.
10/06/18 11:22:38 statService: started.
10/06/18 11:22:38 WiFi connected. SSID FrogmoreN66, IP 192.168.0.38, channel 1, RSSI -68db
10/06/18 11:22:38 Updater: service started. Auto-update class is MINOR
10/06/18 11:22:38 dataLog: service started.
10/06/18 11:22:38 dataLog: New current log created.
10/06/18 11:22:40 Updater: Auto-update is current for class MINOR.
10/06/18 11:24:41 historyLog: service started.
10/06/18 11:24:41 historyLog: first entry 10/6/18 11:23:00
10/06/18 11:26:39 timeSync: adjusting RTC by -1
10/07/18 04:26:39 timeSync: adjusting RTC by -1
10/07/18 16:27:42 timeSync: adjusting RTC by -1


#3

Can you post the entire message log please? and also a screenshot of the file manager listing of the SDcard files. It has to have a config.txt file to start at all, and the log you posted is in /iotawatt/iotamsgs.txt, so this is a real mystery. How the firmware can find the config.txt file to start but not display it in the file manager listing. Or how the file manager UI can even work because it’s also a file on the SD. The burden.txt file is actually in the SPIFFS file system on the ESP8266.

This doesn’t appear to be exactly the same as the problem described above. The new release will only delete logs after release 02_03_16 is installed and running, and then only after producing a diagnostic file (which takes awhile) and adding some messages to the message log that the file was deleted. So while it looks as if the log files were not there at restart, it’s not clear where they went. That said, the timing is right for loading the new release.


#4

Umm, that is the entire message log that is displayed. I was kind of surprised to see that too.
I posted this picture in the other thread, since that seemed a better location, but here it is too.

I agree this is a strange one, and I have no explanation for how it could have happened. I have it set to Minor update and really don’t look at it that often. I have had it for over a year so maybe the file did get corrupted and maybe the SD card is somewhat corrupted too. There have probably been about 5 power outages in the time I have had it


#5

I haven’t had any evidence to suggest that power failures cause SD failures in IoTaWatt. What happens is that Existing corruption is detected upon restart after a power failure so it appears to be caused by the power failure.

There have been some suspected SDcard failures though, and this does look like an SDcard issue. Your IotaWatt is able to open and read it’s files, but the isn’t reading the directory.

Could you remove the SD card and try to read the directory from another machine, then post that directory listing?

If you can read the files on an external machine, id recommend copying them to a new card and restarting with that.