Problems with datalog on older units (>1 year)

overeasy · September 9, 2018, 4:38pm

I’ve fielded several problems in the last week that exhibited symptoms of corrupted current log. Without the actual log, it has been impossible to determine what is going on or even if it represents a firmware or hardware issue. That situation changed last night when my datalog crashed with the same symptoms.

The current datalog is a huge file. It maxes out at close to 2GB, the practical limit in a FAT32 file system. When it gets to the limit set, it wraps around and starts overwiting older entries. If your log has wrapped, you will see (in the status datalog tab) that the starting date/time advances with the ending date/time. If the starting date/time is constant, it hasn’t yet wrapped.

These problems seem to only effect datalogs that have wrapped. The default wrap size is 400 days worth of data or 6,912,000 entries. That may represent more than 400 days in any particular case if there are lapses where the IoTaWatt was not recording.

I’m in the process of fixing this issue, and will expedite a release with the fix. I may also be able to introduce code to repair previous damage. Too soon to tell. In the meantime, there is an immediate way to solve the problem if it happens to you. Delete the current-log. That may sound drastic, but all of your history is still in the history log, and it is served up seamlessly when you use the local graph app to view the data. In fact, even with a large current-log, virtually all of the data served to the graph app comes from the history log. What you lose is resolution below 60 seconds. All of the standard graph requests only require 2 minute resolution.

To delete the current-log, enter the following query on your browser:

http://iotawatt.local/command?deletelog=current

The usual indication that you are experiencing this problem is a solid green LED, and when you look at the message log it ends with the message:

MM/DD/YY HH:MM:SS dataLog: service started.

I don’t expect too many more of these as there were not that many sold during that period, but if you suspect you are experiencing this problem but aren’t sure, just post the question alongg with a brief desciption of the symptoms and your recent message log.

overeasy · September 10, 2018, 1:08am

I’ve had some time to look this over in greater depth. I found three instances where my datalog had been corrupted. One on March 3, one on July 11, and one on Sept 7. The first two are traceable to a problem that was fixed shortly after July 11 with release 02_03_09, and would have installed on all but MAJOR release by July 29, 2018. It had to do with downloading a new release asynchronously. The SD routines are not very robust, and can’t be used in an asynchronous callback. There were other symptoms of this problem in that the downloaded file sometimes would not verify.

The last failure on Sept 7 appears to be the result of faulty code to repair a file with a bad last record or two. It’s impossible to reconstruct everything that might happen as a result, but it’s my best guess as to how that corruption occured. It is unlike the corruption of the first two cases.

What I have done is eliminate the second problem, and introduced more diagnostic output in the event of a damaged log. It will still stop your system for about an hour while it dumps a diagnostic file, and then it will delete the current log and restart. A message will issued, and during the dump, a LED sequence will be displayed (R-G-R).

I’ll try to get this out in ALPHA over the next few days.