IotaWatt stoped monitoring after wi-fi disconnect


#1

Hi all:

From my understanding, if the IotaWatt losses the wi-fi signal it would continue monitoring power and saving it in the SD card.

I have some questions:
Will the unit upload data to influx once the wifi connection is restored? (Just trying to confirm)
If the backlog is large, how will data be uploaded? At what rate?
Why could the unit stop saving data when the wifi connection is lost?

Here’s the relevant info about this case:

Last data points are on November 20 at 17:27:30. From the CSV output:

2018-11-20 17:27:20, 1078.7
2018-11-20 17:27:25, 1086.8
2018-11-20 17:27:30, 1090.3
2018-11-20 17:27:35, null
2018-11-20 17:27:40, null
2018-11-20 17:27:45, null

Data started getting logged again on November 26, 16:30:10, once the wifi access point was restored and the IotaWatt reset.

2018-11-26 16:30:00, null
2018-11-26 16:30:05, null
2018-11-26 16:30:10, 1271.2
2018-11-26 16:30:15, 1284.8
2018-11-26 16:30:20, 1280.1

The message log shows the wifi disconnection on November 20 at 17:27:31 so it makes sense, but again, I would expect logging to continue with no wifi. The log also shows the unit reset a few minutes before:

11/20/18 22:04:54z Reset reason: Exception

(This is 17:04:54 local time.)

I’m adding a short version of the log file.

iotalog.txt (15.4 KB)

Thanks a lot!

Carlos

#2

Yes, it will upload after WiFi isrestored.
If the backlog is large, it will use larger buffers, more than bulk load, to send more measurements in each post. The rate depends on the number of measurements in each interval, as well as other factors like transmission time and how fast influx responds.

I looked at your log, and I’m not exactly sure what happened. Clearly there was some WiFi drop off. The first event recovered ok, and it kept logging. Then there was an exception. I have seen this before. It happens in some low level esp8266 code after a WiFi failure. It’s usually not a big deal because the unit restarts and moves on.

In your case, for some unknown reason, it spent six days in the WiFi manager code, and then It apparently resolved without a restart. I think there’s more to the story because there was a power cycle a few minutes later.

Nevertheless, this is all definitely WiFi related and in the core or other included code. I believe you have some other units running there as well. If so, what do their logs show for that period?

You are using 02_03_16. The latest general release is 02_03_18, which has newer versions of the core and some of the underlying support code. The newer release has been very solid, and if you look at the forum support topic, there have been very few problems with it. The ESP folks have been working on the newer lwip2 and it’s getting very solid. So I’d encourage you to set auto-update to MINOR and keep current with the improvements.


#3

Thanks for replying. So far, we’ve updated the version now.

Sorry. I should probably have given a few more details. Here they are:

At some time on 11/20 power to the access point, EMILab, was removed (without a warning)
At some time on 11/26 we restored power to the access point, maybe just before 17:30.
At 17:37 we power cycled the IotaWatt. The LED was solid-red before this.

Thanks again.

Carlos


#4

Hi Carlos,

I got a similar problem reported today, and I want to follow up on something came out of that report. Was the device that you used to configure the IoTaWatt within wifi range when the power went off? I’m looking at a scenario where the IoTaWatt comes back up in AP mode before the WiFi router. Then the device that was originally used to connect to the IoTaWatt AP, which might ordinarily wait to connect to the WiFi router, instead connects to the IoTaWatt because it’s available and the WiFi is not. They remain connected indefinitely until something else happens to break them up. Is that a possibility in your case?


#5

That’s a very interesting case.

If you mean when the Iota was power cycled, then yes. But that’s when things started going well again because the access point was working already.

The whole thing was as follows. Iota was logging and AP was working

  1. At some time on the 20th power to the AP was removed

  2. At 17:04 on the 20th there was an Exception Reset

  3. At 18:27:34 on the 20th there was a Software/System restart because “WiFi disconnected more than 60 minutes”

  4. The Iota got stuck on 11/20/18 18:27:38 Connecting with WiFiManager.

  5. Next message in log is: 11/26/18 17:30:04 No WiFi connection.
    This is, apparently, when the AP was powered again.
    Note that the device (laptop) for configuring the Iota was probably within range at this point.

  6. The laptop’s wifi was connected to the AP to check for internet access

  7. Tried to access the Iota from the laptop without success (LED was red)

  8. Power cycled the Iota and green led was blinking rapidly (wifi connected)

  9. Searched for the Iota in the network (we can’t use http://Iota3.local, so we use Bonjour software)

  10. Found the Iota, with a different IP than before, btw.


#6

I mean on the 20th, when power to the router (when I say AP I mean the IoTaWatt in AP mode for configuration) was removed, and more specifically at 18:27:38 when the IoTaWatt was trying to connect with WiFi Manager - was that laptop or any other device that might have been previously connected to the Iotaxxxxxx SSID in the vicinity.

When the IoTaWatt starts up, if it cannot see the configured SSID that it uses to connect to a local network router in STAtion mode, it will activate AP mode as SSID Iotannnnn so that the user can connect to it and configure a different network. If the original local network becomes available before you connect to the AP, it’s supposed to make that connection, stop the AP mode, and move on. What happens if it makes a connection before the local network becomes available is the case I’m exploring. It’s possible it just hangs there waiting for a dialogue to configure a new network.

There are a few hundred issues outstanding against the WiFi Manager package. I sympathize with the author because there are so many different routers and clients out there it’s probably impossible to test. I think he may have lost interest in keeping up with this fast changing environment as well. I have elevated the priority of making an alternative.

Most clients rely on DHCP (Dynamic Host Configuration Protocol) to get their IP address assigned by the router. A client can try to use a fixed IP, but if it conflicts with the DHCP pool, there could be a conflict. If you would like your IoTaWatt to have a fixed IP address (which is handy when accessing from the internet), then you would need to configure whatever facility your router offers to fix it’s IP assignment based on the MAC address.


#7

No. Not a chance. The Iotas are only used by us involved in the project and we’re usually like 400 ft away from that particular IotaWatt.

As for the problem you’re investigating, if we find something we’ll let you know. That’s a very interesting problem and in hindsight something we probably faced before while doing some early tests, when we would sometimes plug and unplug the IotaWatts several times a day and also change routers.

Yeah, we’ve been using DHCP in all of out units, but I don’t recall any of them changing their IP just because the wifi was down, and we use the same DHCP server for all of our network. Anyway, I though that could possibly be related to the issue. Guess it’s not.


#8

Could be. Might indicate something weird happened to the router, or could just be business as usual. I’ve already put in changes so that it only invokes WiFi Manager after a power fail restart. That fixes a known problem and maybe helps this issue as well because it will not invoke WiFi Manager after a “WiFi disconnected more than 60 minutes” restart. Every little bit helps.


#9

Hi: I’m adding here two logs; just in case they’re of any help.

Iota 3 minilog.txt (3.4 KB)

Iota 4 minilog.txt (3.2 KB)

Since we’ve been having access (wifi/internet) problems with the IotaWatt that stopped logging data (Iota3) we decided to set up another unit next to this one. The access problems have been always present, but we wanted to correlate setting up this second unit (Iota4).

We just had a disconnect issue today (12/13) and we couldn’t access neither unit from our cubes. The Iotas are on a different subnetwork and we believe there’s network connectivity access from there (hence the failed Influx posts; the Influx server is in our cubes) but there’s also the WiFi disconnections.

Since I really don’t understand most of the messages in the log, you’ll know more of what happened.

Consider that both units were restarted early today for a time zone change (from -5 to -6) we had neglected since the end of DST… Also, the LED in both units was blinking green at the time we went to check them at about 14:30… but there was no internet access (connecting from a laptop to EMILab AP); although access was intermitent. Also, at one point both Iota4 and laptop could be reached from out cubes, but not Iota3, so we restarted it (that’s the restart at 14:35:39).

Saludos,

 Carlos

#10

The new release in ALPHA has DST support. Also, it’s not necessary to restart to change time zone. Actually, except for the new PVoutput support, all of the internal timing is done with UTC except for the message log which is converted to local-time.

blinking green means it’s sampling, and also that it believes it is connected to WiFi. It would blink red if sampling but not connected to WiFi. In either case, the sampling is being recorded in the datalog, as appears to be the case here. All of the data backlog should have gone to influx when the connection was re-established.

HTTcode -4 means that a connection could not be established. It is possible that you have something interfering with your WiFi. I can see that both IoTa has problems at ~12:31 and again at ~13:14 where Iota4 was disconnected and Iota3 simply failed to connect.

I’d suggest you try using different Wifi channels. That is set in your router. Conventional wisdom recommends 1, 6 and 11. (you are on 9). It might be as simple as that.


#11

Thanks.

I think we can’t use automatic DST since DST here is different than in the US.

We tried changing channels without success. The issue may be the router we’re using. We’ll try to change it. As you may recall the Iotas there are #3 and #4

Now we had brief issues with the Iotas 1 and 2 that we have in the original installation and the diesel power plant next to them. Anyway, those had been very stable (after October 2: 2 Exceptions, several power failures, 1 reset from “WiFi disconnected more than 60 minutes”) until recently.

First thing I’ve noticed, now happening in the 4 units, is the new message: “Incomplete HTTP request detected, id 7, restarting”. I don’t know if that’s just a new message for a previously known issue or if it may be an actual new issue, but it’s been happening on units 3,4 that have the disconnection issues. Here’s the logs just right after the upgrades for units 1 and 2. Notice how the influx issues occur during several days after the upgrade until the “Incomplete HTTP request” message on both units at very similar times. The behavior after the upgrade and is exactly the same in both units for almost 3 weeks.

iota 1 FW upgrade.txt (3.2 KB)
iota 2 FW upgrade.txt (3.2 KB)

Note that as I said, the “Incomplete HTTP request” message also has happened several times after the upgrade, in units 3 and 4.

Back to the Iota1 and Iota2 issues. Here are the extended logs.

iota 1 FW upgrade_extended.txt (14.0 KB)
iota 2 FW upgrade_extended.txt (4.7 KB)

On the morning of the 18, we had no access to these two units and later noticed our access point (IotaNetwork) was knocked from the desk where it was placed (but not disconnected). This happened before 9:00 in the morning and at about 9:00 the AP was moved back to the desk, but without resetting it. The units couldn’t be accessed.

Later in the day, at aroung 18:00 we went again to check the installation. The wifi signal of the AP was available from our laptop but we couldn’t actually access it. Iota1 was apparently in AP Mode (red-green-green) and Iota 2 was fast blinking green. From what we did and the logs this happened:

  • Apparently when the AP was moved back to the desk, Iota1 changed its status
    12/18/18 06:34:50 Connecting with WiFiManager.
    12/18/18 09:02:49 MDNS responder started for hostname Iota1
  • Note that during the day, there were several “Incomplete HTTP request detected, id 7, restarting.” cases
  • A few minutes after 18:00 we power cycled the AP and status changed to slow blinking red. The log shows the following, however, it seems as if the change to “slow blinking red” happened just after we tried to access the unit.
    12/18/18 14:27:29 Connecting with WiFiManager.
    12/18/18 18:13:02 No WiFi connection.
  • At 18:19 we powercycled Iota1, the Iota connected to the AP and it stayed there.

For Iota2 it looks much simpler.
12/18/18 18:09:54 WiFi disconnected.
12/18/18 18:11:07 WiFi connected. SSID=IoTaNetwork, IP=10.0.0.215, channel=4, RSSI -59db

A few notes and questions:

It seems as if the Iotas (or at least the logging) sometimes are getting stuck at the following point, sometimes for a few minutes, sometimes for days.
12/18/18 14:27:29 Connecting with WiFiManager
However, I’m not sure if this normal.

Apparently some of the connectivity issues have increased since version 02_03_18

Connectivity problems look very different for Iota1 and Iota2, even when they are very close together and only a few meters away from the access point.


#12

Where are you and what is your DST rule? The DST rule is encoded in the config file when you activate DST. If your rule is different from US eastern time, it can be added to the rule tables. There are currently four rules available for different areas.

There was a problem where logging just stopped, and there was no indication why. It was always related to poor WiFi and was up to the user to notice and restart the IoTaWatt. The problem seemed to be related to HTTP requests not completing when WiFi drops. This all seems to happen at a lower level beyond my control. So this WDT was added to identify when a HTTP request gets completely lost. Since there doesn’t seem to be anything I can do about it, and I don’t know exactly what happened, the question is what to do when I detect it? I could ignore it and just retry the operation, but that is risky and could have lots of unintended consequences. So I just restart, which resets everything. There is no loss of data because sampling was ongoing.

Right. Lets be clear, both systems worked perfectly for two weeks after the upgrade to 02_03_18. From 11/28/18 to 12/13/18 at 14:23. The WiFi dropped for a minute on 12/10/18 at 13:59 but both systems reconnected uneventfully.

Then the wheels came off the wagon. There are a series of connection failures on 12/13 and 12/14 which effect both systems at about the same times. Then all quiet for a couple of days when it starts up again on 12/17 - 10:57, 12:13, 12/42, 13:01 - both systems, both recovered fine. This is clearly a router or interference issue external to the IoTaWatt. Then sometime around 13:46 or so, something happened that caused both systems to lose TCP context and this time the HTTP requests were completely lost, the WDT detected it, and both systems restarted.

Now after restart, neither system was able to connect to the router, so they hung waiting for the router from 14:02 (09) until 16:26 when the router apparently came back online and both systems connected. You lost data during that period. That issue is fixed in the latest release now in ALPHA.

Half an hour after the connection at 16:26, the WiFi problems started again for iota1. iota2 was unable to get started because it could not query for the last update, so it does not have subsequent connection problems because it is no longer trying to connect.

Bottom line is that everything worked great for two weeks after upgrade to 02_03_18 until something external happened to cause a lot of WiFi issues. When it does connect, the RSSI is great, so I have to wonder if there is something else in the environment that is causing the problem. What changed on 12/13?

Right, that’s a problem with WiFi Manager, which is out of my control. What I have done in 02_03_20 is only use WiFi manager after a power failure, so there should be no significant loss of sampling after an incomplete HTTP request restart.

As I said, it worked great for about two weeks after that. Something else happened on 12/13.

They look exactly the same to me. Where they seem different on 12/17-12/18 it’s because iota2 failed to get the influx service running and so wasn’t doing HTTP transactions. iota1 was and getting a lot of errors with the poor WiFi.


#13

Hello again. Thanks for replying.

I’m in Mexico. There are several rules here, depending on the region (border) and states. The one that applies to us is
First Sunday in April clock moves from 2:00 to 3:00
Last Sunday in October clock moves from 2:00 to 1:00

Sounds reasonable.

Good. We’ll update the units.

We changed the time zone to account for the DST end (which was on OCT 28 but we forgot about). BTW, only Iota4 was manually reset after the change.
However, on the 12/14 , we changed the wifi channel from channel 9 to 6 based on your suggestion. Of course I’m not blaming you; we should have checked other channels and behavior. It’s weird however, that the change to channel 6 improved the RSSI from -72db to around -64db

OK. Is this 02_03_20 the current ALPHA version with the improvements?

Thanks again.


#14

I’ll code and test that rule. What is your standard time offset?

Couple of things. Something happened on 12/13 and restarting after a time zone change could not cause these WiFi issues, especially on two different systems. The log shows that you are on channel 4, not that it’s a problem, just different than what you said, and I don’t see any change between 11/28 and 12/17 in the log.

Yes

de nada, Feliz Navidad.


#15

It’s UTC-6:00. Thanks for the extrawork.

Sorry; the change was on EMILab access point, which is where Iota3 and Iota4 connect to. Here’s the change from Iota3 message log:

12/14/18 16:48:42 WiFi connected. SSID=EMILab, IP=10.0.6.164, channel=9, RSSI -71db
12/14/18 16:48:42 Updater: service started. Auto-update class is NONE
12/14/18 16:48:42 dataLog: service started.
12/14/18 16:48:43 dataLog: Last log entry 12/14/18 16:29:20
12/14/18 16:48:43 historyLog: service started.
12/14/18 16:48:43 historyLog: Last log entry 12/14/18 16:29:00
12/14/18 16:48:46 influxDB: started, url=10.0.3.170:8086, db=example, interval=30
12/14/18 16:48:46 influxDB: Start posting at 12/14/18 15:53:30
12/14/18 16:54:32 influxDB: Post Failed: -4
12/14/18 17:29:55 WiFi disconnected.
12/14/18 17:47:12 WiFi connected. SSID=EMILab, IP=10.0.6.164, channel=6, RSSI -77db
12/14/18 17:53:33 influxDB: Post Failed: -4
12/14/18 18:00:48 WiFi disconnected.
12/14/18 18:00:55 WiFi connected. SSID=EMILab, IP=10.0.6.164, channel=6, RSSI -74db
12/14/18 18:11:35 influxDB: Post Failed: -4
12/14/18 18:16:40 WiFi disconnected.
12/14/18 18:16:47 WiFi connected. SSID=EMILab, IP=10.0.6.164, channel=6, RSSI -73db
12/14/18 18:22:33 influxDB: Post Failed: -4
12/14/18 20:34:00 Incomplete HTTP request detected, id 7, restarting.

Although this shows the power getting worse from channel 9 to 6, most of the WiFi connected messageson later dates show an improvement in power.

Haha ! Cool. ¡ Feliz Navidad para tí también!


#16

Hello

Feliz año 2019 :slight_smile:

The Iota’s behaved very well during vacations. There were some wifi disconnections but apparently monitoring was never interrupted.

The only unexpected messages were these in Iota2. Although I believe it’s nothing to worry about, I’d like to hear your opinion. Thanks !!!

1/07/19 01:19:20 timesync: Kiss-o’-Death, code RATE, ip: 107.181.191.189
1/07/19 01:19:35 timesync: Kiss-o’-Death, code RATE, ip: 107.181.191.189
1/07/19 01:19:50 timesync: Kiss-o’-Death, code RATE, ip: 107.181.191.189


#17

Those are informative messages indicating that an internet time server rejected time requests because it sensed that your devices were doing so at an excessive rate.
This is a problem that crops up from time to time in installations where there are multiple IoTaWatt. Apparently the servers are unable to discern that these are different devices.

I run into that here as well with sometimes as many as eight IoTaWatt running at one time. They will retry and eventually sync up to UTC, and the real time clocks typically don’t vary by more than a second or two per day anyway.

It does become a problem when trying to setup a new IoTaWatt where the RTC hasn’t been initialized. It can take an hour or more to get a timesync in that situation.

Glad to hear your WiFi issues are looking better.


#18

Thanks. Makes sense to see it the messages now that we have our 4 units working…