Prometheus exporter

Hi all! I just ordered my IoTaWatt last night, and I’m pretty excited to get the unit and get it installed.

For most of my home automation and monitoring solutions, I use a system called Prometheus as a data store. It is similar in nature to InfluxDB, although a bit more specialized. I have alerting and graphing (Grafana) capabilities linked up with my Prometheus installation.

Prometheus functions differently than InfluxDB, et al.; it uses a pull model rather than a push model. The Prometheus server reaches out to other systems over HTTP and queries /metrics (or a configurable URL), which respond with the current status of the system being monitored in a text-based format (example: docs/exposition_formats.md at main · prometheus/docs · GitHub). Prometheus queries /metrics on a fixed schedule (eg. every 30s, 60s, …), and records this information into Prometheus’ time-series database.

I’d like to contribute an update to the IoTaWatt firmware to support a /metrics endpoint for Prometheus collection. But before I do this development work, I wanted to reach out to see whether such an addition would be welcome as a contribution; I don’t want to support my own custom firmware forever. If this wasn’t welcome, I would just write an external tool to bridge Prometheus → Tool → IoTaWatt.

A Prometheus firmware contribution would effectively look like a copy of the handleStatus /status HTTP handler, but reformatting the output in the Prometheus-supported format. And I’d be happy to include documentation, and follow any recommended best-practices for development.

Would this contribution be welcomed, or would it be better to implement this as an independent piece of software bridging to the existing HTTP data sources?

Thanks!

1 Like

Sounds like you have reviewed the code in the webserver and understand what’s involved, so in principle I’d be interested in a PR to support this pull protocol.

That said, I want to convey a couple of thoughts:

Outages occur. WiFi can drop off, and servers go down. Meanwhile, IoTaWatt keeps on ticking. All of the push protocols work off the datalog and fill in the hole after an outage, so an influx database for instance will not have any holes.

When you pull real time, there will be holes.

The source of measurements used in the /status handler is a damped recent value intended to represent the most recent couple of seconds. If Prometheus is looking for a 15 or 30 second data point, it would probably be more appropriate to go to the datalog to get that. Its fairly easy, and the scripting system is very useful for extracting the various metrics available from the datalog.

Can you point me at a concise description of the Prometheus pull facilities? It may be possible to simply use the new IoTaWatt /query to pull the data, which would be very slick.

I agree with your points regarding reliability – IoTaWatt’s persistent data storage granting the ability to catch-up and fill holes in InfluxDB is a stronger design in this regard. I’m still somewhat interested since Prometheus is a system I’m more familiar with using, but it’s a good argument that this isn’t an optimal path. Maybe when I see how poor my WiFi is near my load center, I might change my mind. :slight_smile:

Proemtheus does not support much (or really, any) flexibility in the data format that it reads from a system. Typically it relies on either a system having native support for Prometheus data formats, or using a middleware/bridge, referred to as an “exporter” in Prometheus terminology (examples; memcached_exporter, postgres_exporter).

It does support flexibility in the HTTP layer; it could invoke /query with custom arguments such as /query?...&format=prometheus. But I think looking at the CSVquery code this would likely be more of a hack, and less of a useful code re-use. Prometheus is designed to only be reading the current values from a metric source and importing it, not a recent history of values.

Regarding the damped recent values… I think a typical approach to this problem with Prometheus would be to retrieve metrics that are counters, rather than gauges, in Prometheus terminology. A gauge represents an instantaneous measurement, while a counter is a monotonically increasing value. So if I had main_A as an input, a gauge would be the instantaneous power utilization, and counter would be the power utilization since IoTaWatt boot (or configuration). Prometheus collecting a counter would minimize the risk of missing power utilization – if there was a 29s spike between two 30s collections, the counter would theoretically include it, while the gauge reading at 0s & 30s might miss it. Later Prometheus could calculate rates of power usage based upon the counters.

This might suggest that exporting the input channel’s data bucket accumulators is more what Prometheus would expect/support. But I feel a bit like the data flowing into Prometheus is a pretty abstract combination of calculations, which I don’t fully understand, so I’m not sure how accurate an analysis based upon Prometheus would end up being.

I’m on the road today, so can’t continue a conversation at this level. I did take a quick look at the docs link in the first post and that appeared to me to be a timestamped push protocol, but I might be missing something obvious.

It’s clear I need a Prometheus definition of terms. I’ll review this tonight or tomorrow and get back to you.

@overeasy | @mfenniak- Did anything become of Prometheus support?

Looks like I dropped the ball on this. I recall reading up somewhat on Prometheus and don’t recall seeing anything that I proved on influx.

Apparently there was no clarification on my observation that there did appear to be a push protocol.

Can anyone make a case for the virtues of Prometheus over influx?

Might be a bit biased but Prometheus has a comparison here.

Was curious if Prometheus could be installed on Raspberry PI, but found this.

I am curious if I could setup a local Raspi, with a push from there to a TimescaleDB in the cloud.

Prometheus is a far more common metrics system on server environments, especially Kubernetes environments. It does have a push protocol but it is only enabled with an additional service (pushgateway) and it has limits as to what it can handle. The pull protocol is designed that if it misses points you never actually loose data entirely, as all data is reported as absolutes (ever increasing numbers) that it will do the math to resolve. The main benefit over Influx is that it supports histograms using a constant set of buckets, a sum and a count.

It should not be necessary to integrate initially, as you can use an influxdb exporter to expose the metrics. This is not the best method as it will loose track of the data if it gets restarted, but it will work.