Constant restarts from heap memory has degraded below safe minimum

amosyuen · February 4, 2022, 9:14am

I have iotawatt setup with InfluxDB upload, but it seems to crash and restart multiple times a day, often within an hour. Though sometimes taking several hours.

Status

InfluxDB Config

Input Config

No Outpus or Integrators configured

IOTAMSGS.TXT (9.7 KB)

overeasy · February 4, 2022, 1:09pm

Could you stop the influx uploader using the stop button in the status display. Leave it off for a few hours and see if it still restarts while also watching the heap size in the status display to see if it stays relatively constant?

If it stays up with steady heap, start the uploader again in the status display and watch the heap as it uploads the backlog. See if it degrades.

amosyuen · February 6, 2022, 8:22pm

I left the infludxb uploader off for a day and saw less restarts, but still saw at least 3 restarts in a day. I realized I also had the Home Assistant integration setup to query iotawatt every 10s, I tried disabling that also and left it on for a while, and so far it hasn’t restarted in 22h.

overeasy · February 6, 2022, 10:20pm

2 posts were split to a new topic: Problem with Heap degradation restarts

overeasy · February 6, 2022, 10:17pm

Starting to sound like maybe a problem handling the HA query. As a next step, could you restart influx but leave HA off to see what happens.

The influx uploaders have the capability to add a measurement with available heap. It is not documented, but I can turn it on in your config file. Could you PM your config.txt? I’ll add the heap trace and send it back with instructions how to upload it. Then we can have a trace of the heap to see how it deteriorates and maybe get some insight into what id causing it.

I suspect this may have something to do with the very long names in your configuration.

amosyuen · February 8, 2022, 5:44am

It looks like it is related to HA. Turning on influxdb but leaving HA off, there have been no restarts for a day.

overeasy · February 8, 2022, 12:41pm

Ok, when you have loaded the new config, can you note the time and start HA again, let it go through a couple of restarts, then plot the new heap measurement in your influx database over the time period from when you restarted HA?

amosyuen · February 10, 2022, 4:51am

Heap graph from last 24 hours (1m resolution):

Looking at message logs, it has been restarting a lot, 7 times from 9/22 21:03 to 9/22 21:45.
iotamsgs.txt (9.7 KB)

overeasy · February 10, 2022, 5:30pm

Can you zero in on about 20 minutes of that plot with the highest resolution possible please?

amosyuen · February 10, 2022, 8:13pm

Last hour (5s resolution):

Last 15m (5s resolution):

overeasy · February 10, 2022, 8:24pm

Thanks, now can I get the latest message log? The old one doesn’t cover today.

amosyuen · February 10, 2022, 9:16pm

Graph

Msglog: iotamsgs.txt (9.6 KB)

overeasy · February 11, 2022, 2:23pm

I’ve got enough here to work on this. It may take some time to recreate and diagnose. The heap clearly degrades at a steady rate, suggesting a memory leak somewhere, and probably in query.

One thing that seems different in your setup is the use of unusually long names. That may be a red herring but I am suspicious.

Will keep you posted.

amosyuen · February 11, 2022, 8:54pm

Thanks for the updates!

overeasy · February 12, 2022, 2:19am

I can’t make it fail on my home system running four uploaders and the HA integration. I didn’t get any heap degradation over the last 24 hours. Looking at the influx plot I can see something happening about every 10 seconds and it doesn’t appear to fully recover it’s heap after each occurrence. You mentioned above that you had modified the HA to update every 10 seconds, so I think one of the update queries is the culprit.

Can you post some details of how you modified the update interval

amosyuen · February 13, 2022, 6:47am

In the HA iotawatt integration I clicked the three dots → System Options → disabled Enable polling for updates.

Then added this automation to run every 10s:

- alias: Iotawatt - Update interval
  mode: single
  max_exceeded: silent
  trigger:
    - platform: time_pattern
      seconds: "/10"
  action:
    - service: homeassistant.update_entity
      data:
        entity_id:
          - sensor.furnace_w # Can be any iotawatt sensor

overeasy · February 15, 2022, 4:34pm

I’ve tried to reproduce this problem with no success. Set my HASS to do the 10 second poling, defined one of the inputs with a really long name, ran it for the last 12 hours and got no heap degradation.

There are no other HA users reporting this problem. So Maybe we need to revisit from a different perspective. I’m going on these reported observations:

The problem goes away when influx is stopped and HASS updates are suspended.
The problem goes away with influx running and HASS updates suspended.

Since I can’t reproduce it, I’ll have to rely on trying things on your end. First thing I’d like to do is try changing the HASS update interval back to 30 seconds, or even a minute. Can you do that and report back please? If it continues to fail, can you post the heap trace and message log please?

amosyuen · February 16, 2022, 6:58pm

It seems like doing a longer interval of one minute for HA integration resulted in no crashes and heap stabilizing.

amosyuen · February 16, 2022, 7:15pm

Hmm, though just realized that apparently InfluxDB upload hasn’t succeeded since last month.

Interesting that the heap was being uploaded fine before, but starting and stopping the InfluxDB uploader seems to have caused it to resume. But also now it seems that my heap is uploaded at 1m intervals instead of 5s intervals.

I don’t think it should affect it, but one other thing I forgot to mention is that I am running two iotawatts, both being queried by the HA instance and uploading to the same InfluxDB.

overeasy · February 16, 2022, 10:15pm

The influx code 422 indicates “unprocessable entity”. That could be a problem with your datalog. When does your influx data begin? I can see that your datalog only goes back to Jan 10. The question is whether any data has been uploaded into your influx? The heap entries are timestamped with the time of the upload. The actual data is timestamped with the time of the measurement being uploaded. I think this has something to do with your problem.