Influxdb[v1] "code -11" "query interrupted"

I have 4 IoTaWatt units, one in my basement, 3 in my garage.
IWbsmt - on a subpanel
IWmain01 & IWmain02 - my main/old panel. Have a lot of circuits. Is generator backed, but that’s not important here.
IWsolr - my new panel, my solar inverters backfeed here. Not generator backed.

At approximately 9:06PM EDT IWsolr [has only 4 circuits] and IWbsmt [has 13 monitored circuits] both started reporting “Query failed, Code -11” and stopped pushing to InfluxDB.
The only hint I’ve found so far is that InfluxDB is reporting ‘error: “query interrupted”’ for both of them, during what looks like the “how up to date is the database” queries. Googling suggests a problem with connection reuse, but that was with a much older [1.4.3?] version of InfluxDB.

Previous threads on this board have suggested that I might have too many metrics [which I probably do, I have PF, VA, Watts and Wh], but given that IWsolr is having this problem and has only 4 circuits monitored, I think that’s not the cause.

I have tried: rebooting the IoTaWatt[s] & restarting the influxdb server process. I checked the WiFi and moved the WiFi that IWbsmt is using, such that the IoTaWatt reported RSSI went from -78 to -55 [actually, I did this part first].

What happened at 9:06pm to cause the two IoTaWatt units to restart their uploaders? Sounds like you may have had a power failure and the other two did not restart because they are on a UPS?

Given that two independent units are having trouble with the startup query, it sounds like a problem with the influx server. Did it go down at that time as well? Did your WiFi go down?

What is the status of the main units, are they still uploading and are they current? Can you post the IoTaWatt message log for that event? Where does your influx data stop for each of the units? Lastly, what is the influx host and does it use HDD or SD?

I am not aware of a specific cause for the IoTaWatts to restart. Note that all of them are powered by UPSs [but the VT is not].
The influx server [host] has an uptime of 42 days, and has a USB3 SSD. The hardware platform is an Odroid N2+ with 4GB RAM. I did try stopping lighttpd [in front of grafana on the same host] to see if lowering the load on the influxdb would make a difference. No joy.
Yes, IWmain01 and 02 are both current in their uploads. I have not wanted to touch them in case it breaks them as well.
log from IWsolr “9/16/22 20:07:22 timeSync: Six week routine restart.” unclear why 8PM, probably EST vs EDT?
This message implies that it’s going to happen to the other two soon. IWmain02 & IWmain01 have uptime 39d 13h. 2.5 days left.

had a thought this morning for a possible reason/optimization: the query for the checking for db freshness could be taking too long, and thus an optimization would be to change the query to use the “upload history from” date.
Having trouble proving it’s better b/c I think I’m dealing with a) query queueing b) caching?
So right now it only qualifies as “obvious” not provable.
from a tcpdump capture “SELECT LAST(Hz) FROM Hz WHERE iotawatt=‘IWsolr’ AND circuit=‘hz’”
Suggested modification: “SELECT LAST(Hz) FROM Hz WHERE iotawatt=‘IWsolr’ AND circuit=‘hz’ AND time>1662015600s”

tabris@andvari:~$ time echo "SELECT time,LAST(Hz) FROM Hz WHERE iotawatt='IWsolr' AND circuit='hz' AND time>1662015600s;" | influx -database iotawatt -precision=rfc3339
name: Hz
time                 last
----                 ----
2022-09-17T01:06:50Z 60.0239

real    0m0.389s

Back to my questions: What happened at 9:02pm to BOTH of the units? Can I see the full logs covering that period? Can you also post your influx setup for the two units?

The -11 completion is a timeout. IoTaWatt sent a query and did not get a response. It actually sends a lot of queries, one for each measurement configured. I’ve never had a problem with the last post being a long time ago, or nonexistent. Which brings up the question: Have you changed your influx setup for these units recently?

You are showing a tcpdump capture for Hz on unit IWsolr. Are you saying that you know this is the query that is failing? Did you recently add Hz to the setup for either or both units?

from IWsolr:

** Restart **

SD initialized.
8/06/22 00:42:52z Real Time Clock is running. Unix time 1659746572 
8/06/22 00:42:52z Reset reason: Software/System restart
8/06/22 00:42:52z Trace:  34:5, 34:5, 34:5, 34:6[1], 34:10[3], 34:5, 34:5, 34:5, 34:6[1], 34:10[3], 34:5, 34:5, 34:5, 34:6[1], 34:10[3], 34:5, 34:5, 34:5, 34:6[1], 34:10, 32:101, 32:101, 32:102, 32:103, 32:103, 31:105, 31:105, 31:106, 11:45, 11:50, 11:55, 11:70
8/06/22 00:42:52z ESP8266 ID: 16295187, RTC PCF8523 (68)
8/06/22 00:42:52z IoTaWatt 5.0, Firmware version 02_07_05
8/06/22 00:42:52z SPIFFS mounted.
8/05/22 19:42:52 Local time zone: -5:00
8/05/22 19:42:52 device name: IWsolr
8/05/22 19:42:52 HTTP server started
8/05/22 19:42:52 influxDB_v1: Starting, interval:5, url:http://influxdb-iot.tabris.net:8086
8/05/22 19:42:52 timeSync: service started.
8/05/22 19:42:52 statService: started.
8/05/22 19:42:52 dataLog: service started.
8/05/22 19:42:53 dataLog: Last log entry 08/05/22 19:40:25
8/05/22 19:42:54 historyLog: service started.
8/05/22 19:42:54 historyLog: Last log entry 08/05/22 19:40:00
8/05/22 19:45:39 WiFi connected. SSID=IoTnet, IP=192.168.82.53, channel=11, RSSI -71db
8/05/22 19:45:39 MDNS responder started for hostname IWsolr
8/05/22 19:45:39 LLMNR responder started for hostname IWsolr
8/05/22 19:45:39 Updater: service started. Auto-update class is NONE
8/05/22 20:02:31 influxDB_v1: Start posting at 08/03/22 18:57:40
8/16/22 13:15:18 WiFi disconnected.
8/16/22 13:15:42 WiFi connected. SSID=IoTnet, IP=192.168.82.53, channel=6, RSSI -70db
8/16/22 13:37:29 WiFi disconnected.
8/16/22 13:37:50 WiFi connected. SSID=IoTnet, IP=192.168.82.53, channel=6, RSSI -72db
8/17/22 06:07:05 WiFi disconnected.
8/17/22 06:07:14 WiFi connected. SSID=IoTnet, IP=192.168.82.53, channel=1, RSSI -72db
8/17/22 06:14:43 WiFi disconnected.
8/17/22 06:18:10 WiFi connected. SSID=IoTnet, IP=192.168.82.53, channel=6, RSSI -70db
8/21/22 09:56:07 WiFi disconnected.
8/21/22 09:56:32 WiFi connected. SSID=IoTnet, IP=192.168.82.53, channel=6, RSSI -70db
9/16/22 20:07:22 timeSync: Six week routine restart.

** Restart **

SD initialized.
9/17/22 01:07:23z Real Time Clock is running. Unix time 1663376843 
9/17/22 01:07:23z Reset reason: Software/System restart
9/17/22 01:07:23z Trace:  8:8, 8:9, 9:3, 9:5, 9:9, 1:2, 1:3, 1:3, 1:1[2], 1:2[3], 9:0[3], 9:0, 9:1, 8:4, 8:6, 8:8, 8:9, 9:3, 9:5, 9:9, 1:2, 1:3, 1:3, 1:6[1], 1:6[2], 1:6[2], 1:6[2], 1:6[3], 1:5[20], 1:6[4], 20:0, 20:1
9/17/22 01:07:23z ESP8266 ID: 16295187, RTC PCF8523 (68)
9/17/22 01:07:23z IoTaWatt 5.0, Firmware version 02_07_05
9/17/22 01:07:23z SPIFFS mounted.
9/16/22 20:07:23 Local time zone: -5:00
9/16/22 20:07:23 device name: IWsolr
9/16/22 20:07:23 HTTP server started
9/16/22 20:07:23 influxDB_v1: Starting, interval:5, url:http://influxdb-iot.tabris.net:8086
9/16/22 20:07:23 timeSync: service started.
9/16/22 20:07:23 statService: started.
9/16/22 20:07:23 dataLog: service started.
9/16/22 20:07:24 dataLog: Last log entry 09/16/22 20:07:20
9/16/22 20:07:24 historyLog: service started.
9/16/22 20:07:25 historyLog: Last log entry 09/16/22 20:07:00
9/16/22 20:07:28 WiFi connected. SSID=IoTnet, IP=192.168.82.53, channel=6, RSSI -69db
9/16/22 20:07:28 MDNS responder started for hostname IWsolr
9/16/22 20:07:28 LLMNR responder started for hostname IWsolr
9/16/22 20:07:28 Updater: service started. Auto-update class is NONE

From IWbsmt

** Restart **

SD initialized.
8/06/22 00:45:42z Real Time Clock is running. Unix time 1659746742 
8/06/22 00:45:42z Reset Reason: Power-fail restart.
8/06/22 00:45:42z ESP8266 ID: 16274938, RTC PCF8523 (68)
8/06/22 00:45:42z IoTaWatt 5.0, Firmware version 02_07_05
8/06/22 00:45:42z SPIFFS mounted.
8/05/22 20:45:43 Local time zone: -5:00, using DST/BST when in effect.
8/05/22 20:45:43 device name: IWbsmt
8/05/22 20:45:46 Connecting with WiFiManager.
8/05/22 20:45:49 HTTP server started
8/05/22 20:45:49 influxDB_v1: Starting, interval:5, url:http://influxdb-iot.tabris.net:8086
8/05/22 20:45:49 WiFi connected. SSID=IoTnet, IP=192.168.82.50, channel=11, RSSI -61db
8/05/22 20:45:49 MDNS responder started for hostname IWbsmt
8/05/22 20:45:49 LLMNR responder started for hostname IWbsmt
8/05/22 20:45:49 timeSync: service started.
8/05/22 20:45:49 statService: started.
8/05/22 20:45:49 Updater: service started. Auto-update class is NONE
8/05/22 20:45:49 dataLog: service started.
8/05/22 20:45:50 dataLog: Last log entry 08/03/22 21:35:05
8/05/22 20:45:54 historyLog: service started.
8/05/22 20:45:54 historyLog: Last log entry 08/03/22 21:35:00
8/05/22 21:07:50 influxDB_v1: Start posting at 08/03/22 19:57:55
8/16/22 14:16:42 WiFi disconnected.
8/16/22 14:18:45 WiFi connected. SSID=IoTnet, IP=192.168.82.50, channel=11, RSSI -94db
8/16/22 14:37:29 WiFi disconnected.
8/16/22 14:38:01 WiFi connected. SSID=IoTnet, IP=192.168.82.50, channel=11, RSSI -62db
8/21/22 10:56:07 WiFi disconnected.
8/21/22 10:56:58 WiFi connected. SSID=IoTnet, IP=192.168.82.50, channel=11, RSSI -93db
8/21/22 21:39:51 WiFi disconnected.
8/21/22 21:40:08 WiFi connected. SSID=IoTnet, IP=192.168.82.50, channel=1, RSSI -67db
9/16/22 21:06:22 timeSync: Six week routine restart.

** Restart **

SD initialized.
9/17/22 01:06:23z Real Time Clock is running. Unix time 1663376783 
9/17/22 01:06:23z Reset reason: Software/System restart
9/17/22 01:06:23z Trace:  9:0, 8:4, 8:6, 8:8, 8:9, 1:2, 1:3, 1:3, 1:1, 1:2[2], 9:0[2], 9:0, 9:1, 8:4, 8:6, 8:8, 8:9, 9:3, 9:5, 9:9, 1:2, 1:3, 1:3, 1:6[1], 1:6[2], 1:6[2], 1:6[2], 1:6[3], 1:5[20], 1:6[4], 20:0, 20:1
9/17/22 01:06:23z ESP8266 ID: 16274938, RTC PCF8523 (68)
9/17/22 01:06:23z IoTaWatt 5.0, Firmware version 02_07_05
9/17/22 01:06:23z SPIFFS mounted.
9/16/22 21:06:23 Local time zone: -5:00, using DST/BST when in effect.
9/16/22 21:06:23 device name: IWbsmt
9/16/22 21:06:23 HTTP server started
9/16/22 21:06:23 influxDB_v1: Starting, interval:5, url:http://influxdb-iot.tabris.net:8086
9/16/22 21:06:23 timeSync: service started.
9/16/22 21:06:23 statService: started.
9/16/22 21:06:23 dataLog: service started.
9/16/22 21:06:25 dataLog: Last log entry 09/16/22 21:06:20
9/16/22 21:06:28 historyLog: service started.
9/16/22 21:06:28 historyLog: Last log entry 09/16/22 21:06:00
9/16/22 21:06:28 WiFi connected. SSID=IoTnet, IP=192.168.82.50, channel=1, RSSI -65db
9/16/22 21:06:28 MDNS responder started for hostname IWbsmt
9/16/22 21:06:28 LLMNR responder started for hostname IWbsmt
9/16/22 21:06:29 Updater: service started. Auto-update class is NONE
9/16/22 21:25:10 Restart command received.

** Restart **

SD initialized.
9/17/22 01:25:12z Real Time Clock is running. Unix time 1663377912 
9/17/22 01:25:12z Reset reason: Software/System restart
9/17/22 01:25:12z Trace:  1:3, 1:1[4], 1:2[5], 9:0[5], 9:0, 9:1, 8:4, 8:6, 8:8, 8:9, 9:3, 9:5, 9:9, 1:2, 1:3, 1:3, 1:1[5], 1:2[6], 9:0[6], 9:0, 9:1, 8:4, 8:6, 8:8, 8:9, 9:3, 9:5, 9:9, 1:2, 1:3, 10:2, 10:3
9/17/22 01:25:12z ESP8266 ID: 16274938, RTC PCF8523 (68)
9/17/22 01:25:12z IoTaWatt 5.0, Firmware version 02_07_05
9/17/22 01:25:12z SPIFFS mounted.
9/16/22 21:25:12 Local time zone: -5:00, using DST/BST when in effect.
9/16/22 21:25:12 device name: IWbsmt
9/16/22 21:25:12 HTTP server started
9/16/22 21:25:12 influxDB_v1: Starting, interval:5, url:http://influxdb-iot.tabris.net:8086
9/16/22 21:25:12 timeSync: service started.
9/16/22 21:25:12 statService: started.
9/16/22 21:25:12 dataLog: service started.
9/16/22 21:25:14 dataLog: Last log entry 09/16/22 21:25:10
9/16/22 21:25:17 historyLog: service started.
9/16/22 21:25:17 historyLog: Last log entry 09/16/22 21:25:00
9/16/22 21:25:17 WiFi connected. SSID=IoTnet, IP=192.168.82.50, channel=1, RSSI -65db
9/16/22 21:25:17 MDNS responder started for hostname IWbsmt
9/16/22 21:25:17 LLMNR responder started for hostname IWbsmt
9/16/22 21:25:17 Updater: service started. Auto-update class is NONE
9/16/22 21:36:57 WiFi disconnected.
9/16/22 21:37:04 WiFi connected. SSID=IoTnet, IP=192.168.82.50, channel=1, RSSI -78db
9/16/22 21:57:49 WiFi disconnected.
9/16/22 21:57:51 WiFi connected. SSID=IoTnet, IP=192.168.82.50, channel=1, RSSI -77db
9/16/22 21:59:26 WiFi disconnected.
9/16/22 21:59:30 WiFi connected. SSID=IoTnet, IP=192.168.82.50, channel=11, RSSI -52db
9/16/22 22:02:07 influxDB_v1: stopped, Last post 09/16/22 21:06:10

This is also the latest log message from this unit.


a) it’s not consistently Hz, I have also an example where it’s leftinv_black
b) technically I don’t know that this is the query that’s failing, I’m reading it in tshark and the full response message is as follows:

{"results":[{"statement_id":0,"error":"query interrupted"}]}

If pipelining is the issue [and I don’t know that it is] it could be another query that’s being interrupted.
I have made no significant changes to the Influx [or any other part of the IW] config for several months if not when they were setup a year ago.

Forum won’t let me upload a pcap unless I lie and say it’s something else.

Screenshot the relevant captions.

I looked at the old influxV1 discourse surrounding the query interrupted and it does indeed point to some sort of HTTP race/serialization issue. On startup, IoTaWatt issues a query for each of the measurements to find the most recent. (With V2 and flux it is a single query).

In the back of my mind, I’m recalling some of the performant advice from V1 that talks about tags vs fields and I’m wondering if your measurement and field key both being $units may be causing the queries to take excessive time.

Just a WAG but IoTaWatt does not issue multiple concurrent HTTP requests. That said, if a request times out it is considered complete, and another can be issued. So, if one query times out, it would be possible to issue another request, which the old influx discussion suggests could interrupt the first query that is still in progress. What I’m saying is that the issue from the IoTaWatt side may be a simple timeout. The query interrupted message in influx may be a red-herring and a consequence of that as IoTaWatt sends another request.

You have a lot of data in that database. I don’t know the performance characteristics of the host, but it may be as simple it’s just too slow. Sure, the timeout could be increased, or the query scope reduced, but that isn’t going to happen anytime soon as it would require integrating into a new release.

Some things to consider:

Looking into making your server faster.

Reorganizing your database for better performance. You can convert existing data with a query.

Reducing the size of the 5 second retention database by using a continuous query to consolidate into say a 1-minute retention database, and change the retention of the 5 second database to a smaller timeframe like 3 months.

Switching to influxV2 which has a much more efficient flux query on restart.

Some thoughts re InfluxDB2:

  1. that was what I attempted early on, couldn’t make auth work. Can try again.
  2. how do i disable but not delete the old influx1 config? Inputting it is tedious.
  3. I’m running InfluxDB 1.8 which supports Flux. Does this mean that I can use the Influx2 uploader?

On another note… It’s not obvious from some googling how to change the schema. Change field_key to value on the previous data.
I know in SQL I’d do it something like INSERT INTO Hz (timestamp, value) SELECT timestamp, hz FROM Hz but I’m guessing it’s not that obvious in Influx.

My long-query timeout scenario may not be the problem. It’s just an educated hunch. But assuming that’s what’s going on, there are some things that I could try over the course of a few weeks if you want to pursue it.

I don’t believe 1.8 with flux would work unless it uses th same auth and buckets etc., but the flux query is smarter. It doesn’t allow an open ended last() type search. You have to provide a subset with a start and end time. So, what the influx V2 uploader does is to start with the past 24 hours, which is by far the most common situation and usually hits. If there is no measurement, it increases the scope to a week I think, then longer and longer until it find (or doesn’t) a measurement.

That would probably be a big improvement in the V1 uploader. Timeouts during the query are common, but I don’t know how many delayed response vs poor WiFi . Certainly, many are poor WiFi.

Also, since we are doing a query for each measurement, once a measurement is found it would make sense to not search back before that time for subsequent queries.

I can make those changes in my current development branch and make it available for you to try out. It may take a week or more. It would be diagnostic to see if that change mitigates this issue.

What I do know at the moment, and it doesn’t precisely prove the solution either way, is that changing IWsolr’s field key to value makes it start uploading again. but now I have 2 other units uploading with the per $units field-key, and for IWsolr the field-key is different for different ranges.

And so far, my attempts to find a rename method for a field key are in vain, mostly due to my not understanding the Flux language. rename() function | Flux 0.x Documentation ← closest solution I’ve found so far, but I can’t seem to make it do anything.

I could spend the time to write a perl script that downloads the data and re-inserts it, but that looks really messy b/c I’d have to a) load it b) delete it c) re-insert it.

When you change the field key to “value”, there are no prior measurements in the series, so it would start at your upload date, or the start date of the current log, whichever is greater. Point is that it is a completely different scenario and no surprise that it works. As you say it doesn’t prove the solution either way.

I think you could convert with flux by producing a new set of measurements but not positive and don’t want to spend any time figuring it out.

so far influxdb2 is running well. Only switched on IWsolr and IWbsmt so far. IWmain01 and 02 are stopped until I get around to changing the config.