influxDB support

overeasy · March 24, 2018, 1:45pm

Hi Pete,
You’re holding my feet to the fire on this. That’s good. Better to have interested parties critique the approach.

I don’t recall making any proposals. I did change the way it all works, and said I’m open to making more changes while still in ALPHA. What you describe above doesn’t match either the old or new implementations. It’s interesting how you integrated the newer “units” functionality into this, but that wasn’t the intent.

I’m still not convinced of the advantages of sending multiple measurements as you show above.

At the end of the day, at 5 second intervals, the IoTaWatt is going to have posted 17,280 times. Under the new implementation, that will mean 17,280 measurements. No matter how it’s sliced and diced, there will be 17,280 unique timestamps. With the old implementation, that became 17,280 x <#fields> points, and created 17,280 new hashes in <#fields> indices.

What you describe above creates 17,280 x 4 = 69,120 points, and adds 69,120 new hashes to the output tag index.

The new implementation creates 17,280 data points, and adds 17,280 new hashes to <#tags> indices.

All queries, whether implicitly or explicitly, reference the timestamp. They retrieve all of the points in the union of that time domain and any other criteria in the where clause. So in the query:

for the day described, would process the output tag index for the time domain and yield references to 34,560 data points. Those data points would be retrieved and the mean of the power field computed.

In the flat scheme of the new implementation, the query might be:

select mean(/light/) from measurement where <time-clause>

With this query, only the basic flat 17,280 data points are retrieved, and the mean of all of the fields matching the regex is computed. Same can be done for intergal(/regex/,1000h) to get aggregate kwh for the group matching the regex.

I think this is more efficient on a variety of levels, and actually seems like a simpler more efficient query in that multiple fields or regex grouped fields can be processed in a single query, fetching the basic data points only once.

Backing away from the household example, the driving force behind this revamping of influxDB support is the growing interest in using IoTaWatt in multi-site environments. There is growing interest from the property management sector as well as energy management. By exploding the number of points from a single 5sec IoTaWatt measurement into many points, the indices and overhead of the extra labels, tags, timestamps, etc becomes a concerning issue. Retrieving a day or even a month worth of points for a single installation is fairly trivial, and processing as a flat file isn’t going to amount to a lot of work for influx. When a database grows to several IoTaWatt at 100 locations, those extra tag indices would become a real bottleneck, both as the thousands of points are posted every 5 seconds, and when trying to retrieve useful cross-location comparative data. The intent of the tag fields is to allow organizing those various IoTaWatt data into regions, buildings, owners, even countries, and to allow differentiating input from IoTaWatt from other souces contributing the same measurements to the database.

Take a fresh look at what I’m suggesting with using regex on the fields of a flat file and see if it isn’t workable.