FR: Prometheus Support

I would love use my IoTaWatt with Prometheus. Everything else I have is in there instead of something like Influx. I opened a feature request here: FR: Prometheus Support · Issue #331 · boblemaire/IoTaWatt · GitHub

Prometheus is super simple, it’s just a text webpage with indicator{keys}=value

I’m not afraid per se to try doing it myself but I really don’t want to end up bricking my device.

Is it pretty easy to test and flash firmware? I looked through the code and it wasn’t immediately clear to me if there was a kwh counter around somewhere that I could use already.

The git is setup to use VScode with platformIO as an IDE. Others have had success with compiling and flashing using those tools.

IoTaWatt has a base class that supports uploaders. The influx uploaders use it, so takes look at that code to see how well you understand it.

Prometheus works backwards from how the uploaders work. It pulls data at a regular interval.

My C++ is a little rusty, I’ve got it connected and did a test build and it works.

Are there global values somewhere that I can access and spit out when someone calls a GET request?
I see the IotaInputChannel type and I see dataBucket and accum1/2 values… But I don’t follow what all of those are…

I would love to be able to access all the channels, their name, the current Watt draw, ideally a counter of Wh and maybe the current voltage and frequency.

Are those values readily available? I can probably put together a PR if you can point me to the best place to access those values.

If you want to pull data, look at the query API. You can access any of the inputs and outputs in any of the supported units and best of all would not need any changes to IoTaWatt.

I think that there is a home Assistant plugin that uses the query api. You should be able to see their code on the appropriate GitHub page.

I can pull metrics from the Query API but it wouldn’t work with Prometheus or any other Cloud Monitoring platform that uses the Open Metrics format.

I think I figured it out though, I opened this PR to add a /metrics route that returns metrics. I’m not totally sure I got all the metrics I should have but I grabbed whatever looked useful. Happy to make any changes if you’d be willing to consider this. Not everyone uses influx.

If you want to take this route, you can maintain your own branch of the firmware. The resources in IoTaWatt are limited and adding what I would consider redundant functionality to reformat data that is already available isn’t a good use of what little resources are left.

Most of what you are extracting, and more is available in the existing status endpoint. You could easily request that and reformat with something like Node-RED.

There are many thousands of IoTaWatt in service worldwide. It makes little sense to burden all of those units with this change when there is very little demand for Promethius support, and can be accomplished with existing off-the-shelf tools.

Have you considered using telegraf with prometheus_client output plugin:

You can ingest the iotawatt data into telegraf using the query api and then output to prometheus.

Is there any chance you would reconsider? I am happy to maintain my own fork but I believe there is just going to be more and more demand for Prometheus going forward. It’s probably one of the most widely used metrics collectors in the world (if not the most used) especially when it comes to cloud computing.

Heck, I hadn’t even noticed the PR before mine was someone that was working on a prometheus fork as well and accidentally opened it on the main repo, so I know I’m not the only one that would prefer this format.

While I understand the concern for resources, the code in this PR should have zero impact on performance unless you call the endpoint (and even when you do then I don’t think there’s much) and as far as resources utilized on the Esp8266, the footprint it uses is pretty small all things considered.

Current master branch:
RAM:   [=====     ]  53.8% (used 44084 bytes from 81920 bytes)
Flash: [=====     ]  54.7% (used 570897 bytes from 1044464 bytes)

With this PR:
RAM:   [=====     ]  55.0% (used 45028 bytes from 81920 bytes)
Flash: [=====     ]  54.9% (used 573013 bytes from 1044464 bytes)

The way that Prometheus operates using a pull mechanism makes it more difficult to manage. Having to use 1 or more intermediaries to pull metrics and translate them is a bit of a hassle when the device could do it with not much difficultly. I would totally understand if it was some random format but Prometheus has a huge user base.

It doesn’t seem like much but 1K of RAM means 1K less heap. I’m already trying to mitigate the 1K lost in the next release due to the latest core. Probably the most common problem reported these days is low-heap restarts. While that is nearly always associated with WiFi issues, it doesn’t help when the starting point is lower. Dynamically, each call will temporarily reduce heap by at least several K due to the verbose Json response. The heap loss can be reduced by using more efficient PROGMEM for the formatting, but you are dumping in so many metrics into the response that the buffer will still be very large and if it exceeds about 1400 bytes will cause multiple writes which are blocking and suspend sampling.

While it’s doable, there is a lot more that would need to be resolved. What you have coded serves your needs, but maybe not others. As I ponder the metrics I see being sent to influx, Amps are pretty common, some folks like to see power-factor, VAR sometimes.

If you just dump out everything available, it would be prohibitably large. But it doesn’t stop there. There is no capability to export outputs. That is a fundamental part of all of the uploaders. In fact, everything that is uploaded is essentially an output as they use the script system to develop the output. Simple case in point is a US split phase usage upload where the total usage is the sum of the two mains. Nobody is really interested in the individual inputs, just the total. When solar is included, scripts are used to produce the usage, import and export metrics.

I guaranty that there would be a push to include script outputs in any Promethius endpoint. There is an infrastructure within IoTaWatt for defining a scrtptset to be used and for including that in the confic.txt file and for generating the metrics in the endpoint handler. It would require considerable work in the setup app (again infrastructure is there with the calculator) and it would require building a response handler that processes that data. It would also consume more heap.

Stepping back a bit and reading the Promethius comparison to influx, the big contrast to me is that Promethius is a real-time only database. They state clearly that Influx is better suited to billing which can be interpreted as historical accuracy. It appears to just log what it gets, when it gets it. All of the existing uploaders in IoTaWatt maintain the integrity of the external database regardless of interruptions in communications or the database system itself. Upon startup of the uploaders, the database is queried, the time of last update is determined, and timestamped uploads are resumed from that point. The data has historical integrity. The value of any historical reporting from a Promethius database would be questionable.

In many ways the use of Promethius is similar to various reporting systems that use MQTT. I have resisted supporting that as well for the same reason. The data is real-time with no ability to correct for gaps caused by external failures. The Home Assistant Energy integration suffers from the same problem, but it is third party and uses the query API.

With limited resources you have to draw the line somewhere. I can count on one hand the number of inquiries expressing an interest in Promethius support. It’s about the same interest level as MongoDB. On the other hand, the interest in influx is and has been very strong.

I appreciate the the well thought out response. Prometheus is indeed a real-time source but that’s why, for the most part, you typically work with counter values (like Watt-Hours) so even if you miss a poll or two, your data is still correct. You can still turn it into an average watt value for the time period.

With that said, I will continue to roll with a fork for now. I may try to rework it to include scripts as well as possibly trying to figure out how to make it use less RAM. (Since there is a lot of static text, maybe it can copy direct to the network vs having to use a buffer)

Thanks for taking the time to explain the reasoning.