Hi Bob. I’m running several Iotawatts, all posting to InfluxDB. From time-to-time, influxdb fails. It restarts but in the meantime, IotaWatts do this:
1/03/19 04:08:05 influxDB: Start posting at 01/03/19 04:07:10 1/03/19 14:10:41 influxDB: Post Failed: -11 * 1/03/19 19:58:46 Heap memory has degraded below safe minimum, restarting.
(The * indicates heap memory problem, but this is possibly unrelated?)
Then after the restart:
SD initialized. 1/03/19 09:58:47z Real Time Clock is running. Unix time 1546509527 1/03/19 09:58:47z Version 02_03_20 1/03/19 09:58:47z Reset reason: Software/System restart 1/03/19 09:58:47z Trace: 1:5[7], 7:0, 7:9, 1:6, 1:3, 1:4, 1:5[7], 7:0, 7:9, 1:6, 1:1[1], 1:2[2], 9:0[2], 9:0, 9:1, 8:4, 8:6, 8:8, 8:9, 9:3, 9:5, 9:9, 1:2, 1:3, 1:4, 1:5[7], 7:0, 7:9, 1:6, 1:3, 1:4, 1:5[21] 1/03/19 09:58:47z ESP8266 ChipID: 6910648 1/03/19 09:58:47z SPIFFS mounted. 1/03/19 19:58:48 Local time zone: +10:00 1/03/19 19:58:48 device name: iotaa39 1/03/19 19:58:48 MDNS responder started for hostname iotaa39 1/03/19 19:58:48 LLMNR responder started for hostname iotaa39 1/03/19 19:58:48 HTTP server started 1/03/19 19:58:48 timeSync: service started. 1/03/19 19:58:49 statService: started. 1/03/19 19:58:49 dataLog: service started. 1/03/19 19:58:49 dataLog: Last log entry 01/03/19 19:58:45 1/03/19 19:58:49 historyLog: service started. 1/03/19 19:58:50 historyLog: Last log entry 01/03/19 19:58:00 1/03/19 19:58:53 influxDB: started, url=live.phisaver.com:8086, db=iotaa39, interval=10 1/03/19 19:58:53 WiFi connected. SSID=NetComm 0405, IP=10.1.1.4, channel=1, RSSI -79db 1/03/19 19:58:53 Updater: service started. Auto-update class is MINOR 1/03/19 19:58:58 Updater: Auto-update is current for class MINOR. 1/03/19 19:59:26 influxDB: last entry query failed: -11 1/03/19 19:59:26 influxDB: Stopped. Last post 01/03/19 19:56:50
Since the last entry query failed, the InfluxDB service stops. This query fails because, I guess, the InfluxDB is still down. However, a manual restart 12 hours later still results in a “last entry query failed: -11” even though the InfluxDB is running. Further manual restarts of the service eventually success and all is well.
The status of this unit is here, in the middle of back-filling. Looks fine to me:
Looking at the InfluxD logs on 1/3/19 at 19:59 local time (09:69 UTC):
ts=2019-01-03T09:59:06.980442Z lvl=info msg=“Executing query” log_id=0ClVDwel000 service=query query=“SELECT last(Watts) FROM iotaa39.autogen.iotaa39 WHERE device = ‘iotaa39’”
ts=2019-01-03T09:59:13.745951Z lvl=info msg=“Executing query” log_id=0ClVDwel000 service=query query=“SELECT last(Watts) FROM iotaa39.autogen.iotaa39 WHERE device = ‘iotaa39’”
ts=2019-01-03T09:59:36.328734Z lvl=info msg=“Executing query” log_id=0ClVDwel000 service=query query=“SELECT last(Watts) FROM iotaa39.autogen.iotaa39 WHERE device = ‘iotaa39’”
So it seems influxdb was getting the “last entry” query, but iotawatt didn’t get/understand the response.
Also in the log is a query on PF instead of Watts:
ts=2019-01-02T18:08:04.204275Z lvl=info msg=“Executing query” log_id=0CjPd8DG000 service=query query=“SELECT last(PF) FROM iotaa39.autogen.iotaa39 WHERE device = ‘iotaa39’”
Some of the IotaWatts (3 of 7) are posting again, but the rest are running but not posting. I couldn’t find the restart logic on github.
So, my question is: What’s the logic/timing for restarting InfluxDB service? And why is the “last entry” query failing?
In my case, I’ll like it to try to reconnect frequently (every 5 mins) so outages of InfluxDB are not too visible to users.
(Obviously a perfect WiFi and InfluxDB would be better! But that’s not feasible!)
Happy to provide further testing / debug as necessary.
Cheers, Brett