Prometheus exporter

Hi all! I just ordered my IoTaWatt last night, and I’m pretty excited to get the unit and get it installed.

For most of my home automation and monitoring solutions, I use a system called Prometheus as a data store. It is similar in nature to InfluxDB, although a bit more specialized. I have alerting and graphing (Grafana) capabilities linked up with my Prometheus installation.

Prometheus functions differently than InfluxDB, et al.; it uses a pull model rather than a push model. The Prometheus server reaches out to other systems over HTTP and queries /metrics (or a configurable URL), which respond with the current status of the system being monitored in a text-based format (example: https://github.com/prometheus/docs/blob/master/content/docs/instrumenting/exposition_formats.md#text-format-example). Prometheus queries /metrics on a fixed schedule (eg. every 30s, 60s, …), and records this information into Prometheus’ time-series database.

I’d like to contribute an update to the IoTaWatt firmware to support a /metrics endpoint for Prometheus collection. But before I do this development work, I wanted to reach out to see whether such an addition would be welcome as a contribution; I don’t want to support my own custom firmware forever. If this wasn’t welcome, I would just write an external tool to bridge Prometheus -> Tool -> IoTaWatt.

A Prometheus firmware contribution would effectively look like a copy of the handleStatus /status HTTP handler, but reformatting the output in the Prometheus-supported format. And I’d be happy to include documentation, and follow any recommended best-practices for development.

Would this contribution be welcomed, or would it be better to implement this as an independent piece of software bridging to the existing HTTP data sources?

Thanks!

Sounds like you have reviewed the code in the webserver and understand what’s involved, so in principle I’d be interested in a PR to support this pull protocol.

That said, I want to convey a couple of thoughts:

Outages occur. WiFi can drop off, and servers go down. Meanwhile, IoTaWatt keeps on ticking. All of the push protocols work off the datalog and fill in the hole after an outage, so an influx database for instance will not have any holes.

When you pull real time, there will be holes.

The source of measurements used in the /status handler is a damped recent value intended to represent the most recent couple of seconds. If Prometheus is looking for a 15 or 30 second data point, it would probably be more appropriate to go to the datalog to get that. Its fairly easy, and the scripting system is very useful for extracting the various metrics available from the datalog.

Can you point me at a concise description of the Prometheus pull facilities? It may be possible to simply use the new IoTaWatt /query to pull the data, which would be very slick.

I agree with your points regarding reliability – IoTaWatt’s persistent data storage granting the ability to catch-up and fill holes in InfluxDB is a stronger design in this regard. I’m still somewhat interested since Prometheus is a system I’m more familiar with using, but it’s a good argument that this isn’t an optimal path. Maybe when I see how poor my WiFi is near my load center, I might change my mind. :slight_smile:

Proemtheus does not support much (or really, any) flexibility in the data format that it reads from a system. Typically it relies on either a system having native support for Prometheus data formats, or using a middleware/bridge, referred to as an “exporter” in Prometheus terminology (examples; memcached_exporter, postgres_exporter).

It does support flexibility in the HTTP layer; it could invoke /query with custom arguments such as /query?...&format=prometheus. But I think looking at the CSVquery code this would likely be more of a hack, and less of a useful code re-use. Prometheus is designed to only be reading the current values from a metric source and importing it, not a recent history of values.

Regarding the damped recent values… I think a typical approach to this problem with Prometheus would be to retrieve metrics that are counters, rather than gauges, in Prometheus terminology. A gauge represents an instantaneous measurement, while a counter is a monotonically increasing value. So if I had main_A as an input, a gauge would be the instantaneous power utilization, and counter would be the power utilization since IoTaWatt boot (or configuration). Prometheus collecting a counter would minimize the risk of missing power utilization – if there was a 29s spike between two 30s collections, the counter would theoretically include it, while the gauge reading at 0s & 30s might miss it. Later Prometheus could calculate rates of power usage based upon the counters.

This might suggest that exporting the input channel’s data bucket accumulators is more what Prometheus would expect/support. But I feel a bit like the data flowing into Prometheus is a pretty abstract combination of calculations, which I don’t fully understand, so I’m not sure how accurate an analysis based upon Prometheus would end up being.

I’m on the road today, so can’t continue a conversation at this level. I did take a quick look at the docs link in the first post and that appeared to me to be a timestamped push protocol, but I might be missing something obvious.

It’s clear I need a Prometheus definition of terms. I’ll review this tonight or tomorrow and get back to you.