influxDB support

Matt,
I’m looking at this and I’m still not sure I understand the issue. I need more time to ponder the question, but my thinking right now is that although it may seem to simplify the query, I think it actually may make the retrieval process less efficient. As long as you are querying a specific measurement (power) and filtering by time period, the database engine will need to retrieve as many rows in the table as fit the time period. If those rows contain all of the fields of a particular point, then the values of the various fields can be summed and averaged straightaway. When you specify two field names as tags, I think you are going to do some index work to determine the union of the two tag indices, and arrive at the same set of data points. Then there’s the extra day-to-day overhead in time and storage to maintain all those indices for the tags.

Does that make sense?

(Love the graphs - thanks)

Influx is designed to be indexed by tags, and value compression/retrieval is improved by keeping similar values within the same series.

There is lots of schema design suggestions here.

Distilling my philosophical concerns from the practical “how’s” are mainly:

  • I’d like to be able to perform any mathematics on any number of inputs and not rely on Iota scripting and be able to utilize influx’s built in capabilities.
  • I’d like to perform my data-mapping within influx (which easily supports renaming fields) instead of within the iota.
  • I’d like to measure the statistics and deviation across different SCT type sensors and identify when the linearity begins to change (based on temperature it seems some).

There are downsides from Iota’s perspective (additional string work, additional data sent per measurement), but I think its mostly positive to continue exposing the internal functions to influx. The more data that gets exposed, the more operations across series can be done.

All of that said. I currently work around this by aggregating individual measurements using continuous queries and TICK scripting. Similar to [Solved] Writing a TICKscript to aggregate multiple measurements into one - Telegraf - InfluxData Community Forums

Again, I’m just sharing a different data-philosophy. TSDB’s are a bit “different” than databases in how they work and aggregate data. I do agree that its more painful to deal with on the metric emitting side and totally understand if its not worth tackling at the moment.

Thanks for considering the proposal!

OK Matt,

This is all interesting, but I’m glad you appreciate how much i’ve got on my plate right now. I’ll put those reference on my summer reading list (northern hemisphere) and hopefully over time will be able to balance the utility of what you are saying with the practicality of the impact on IoTaWatt.

Philosophically, my primary concern is keeping the user interface simple and straightforward, so as to make the device useful and unintimidating to as many as possible. Power users like you, if I may put you in that category, will find solutions as you have with the Tick stack.

Hi Matt, I’m also trying to follow your proposal: Do I get it right that you mainly propose to have a possibility to set tags for each channel? This way all power value could have the same field name (for example „power“ and they would only be differentiated with a tag which could for example be “channal_1”.
Saving voltage and frequency in an own measurement seems not so important for the queries you showed.

Is that correct?

@overeasy, Just want to see if I have got your original proposal right:

  • sample period is batched into a single influxdb write transaction
  • each measurement has the same custom name e.g. “measurement”
  • each measurement has a tag based on the associated Output (CT/VT) name
  • each measurement contains fields that correspond to the “units” selected for that Output

So in influxdb line protocol for a particular 5 second period would be something like:
measurement,output=hallway_lights power=100,volts=240,kwh=1.23,pf=1.0 14222900257 measurement,output=bedroom_lights power=80,volts=240,kwh=1.01,pf=1.0 14222900257 measurement,output=main_ring power=150,volts=240,kwh=2.75,pf=0.8 14222900257 measurement,output=house_feed power=330,volts=240,kwh=4.99,pf=0.8 14222900257

Which would allow for influxdb tag filtering/aliasing/regex in the query, e.g. select mean("power") where output=/light/ to include all of my light CTs or something like where output!=/house/ to exclude the CT on the main feed. But will repeat the VT information across each of the CT measurements if selected by the user.

Does this simplify the IotaWatt UI to allow you to select multiple “unit” checkboxes for each output e.g. power, kwh, volts… rather than having to create a multitude of outputs to get all of the desired fields. That would be sweet.

Hi Pete,
You’re holding my feet to the fire on this. That’s good. Better to have interested parties critique the approach.

I don’t recall making any proposals. I did change the way it all works, and said I’m open to making more changes while still in ALPHA. What you describe above doesn’t match either the old or new implementations. It’s interesting how you integrated the newer “units” functionality into this, but that wasn’t the intent.

I’m still not convinced of the advantages of sending multiple measurements as you show above.

At the end of the day, at 5 second intervals, the IoTaWatt is going to have posted 17,280 times. Under the new implementation, that will mean 17,280 measurements. No matter how it’s sliced and diced, there will be 17,280 unique timestamps. With the old implementation, that became 17,280 x <#fields> points, and created 17,280 new hashes in <#fields> indices.

What you describe above creates 17,280 x 4 = 69,120 points, and adds 69,120 new hashes to the output tag index.

The new implementation creates 17,280 data points, and adds 17,280 new hashes to <#tags> indices.

All queries, whether implicitly or explicitly, reference the timestamp. They retrieve all of the points in the union of that time domain and any other criteria in the where clause. So in the query:

for the day described, would process the output tag index for the time domain and yield references to 34,560 data points. Those data points would be retrieved and the mean of the power field computed.

In the flat scheme of the new implementation, the query might be:

select mean(/light/) from measurement where <time-clause>

With this query, only the basic flat 17,280 data points are retrieved, and the mean of all of the fields matching the regex is computed. Same can be done for intergal(/regex/,1000h) to get aggregate kwh for the group matching the regex.

I think this is more efficient on a variety of levels, and actually seems like a simpler more efficient query in that multiple fields or regex grouped fields can be processed in a single query, fetching the basic data points only once.

Backing away from the household example, the driving force behind this revamping of influxDB support is the growing interest in using IoTaWatt in multi-site environments. There is growing interest from the property management sector as well as energy management. By exploding the number of points from a single 5sec IoTaWatt measurement into many points, the indices and overhead of the extra labels, tags, timestamps, etc becomes a concerning issue. Retrieving a day or even a month worth of points for a single installation is fairly trivial, and processing as a flat file isn’t going to amount to a lot of work for influx. When a database grows to several IoTaWatt at 100 locations, those extra tag indices would become a real bottleneck, both as the thousands of points are posted every 5 seconds, and when trying to retrieve useful cross-location comparative data. The intent of the tag fields is to allow organizing those various IoTaWatt data into regions, buildings, owners, even countries, and to allow differentiating input from IoTaWatt from other souces contributing the same measurements to the database.

Take a fresh look at what I’m suggesting with using regex on the fields of a flat file and see if it isn’t workable.

1 Like

@matt Your grafana dashboard looks absolutely stunning, you don’t care to share the json?

grafana.iota.json.txt (56.0 KB)

No problem sharing the JSON – here it is.

2 Likes

Hey Bob,

I attempted to upgrade to ALPHA again today; but I’m back at BETA for now. Influx porting turned out more complicated than I expected to upgrade (also ran into backfill issues, not 100% sure what those are yet. If I figured that out I’ll make a new thread but it may have been related to all my influx tinkering)

I have another couple of comments regarding Influx usage. The regex field based selection you advocate seems to mostly work. The logging of each “unit” is also great.

After some experimentation, I ended up with:

Right_Mains_W
Right_Mains_A
Right_Mains_VA
Left_Mains_W
Left_Mains_A
Left_Mains_VA
...

This pattern, is however, pretty painful to follow for entering in via the UI (I ended up writing a script to generate the JSON) and its even worse for graphing within Grafana.

While I can write a query such as select mean(/Left_Mains/) from measurement where device='iotawatt' that just seems… strangely formed.

Why is “measurement” there? Is this because of the need to support multiple IotaWatt devices (I would expect the “static” tagging to enable that, not the static measurement).

select mean(watt) from Right_Main where device='iotawatt' seems much more “clear” to me. Perhaps the unit can inform what the field will be? And there would be no need for me to duplicate the “output” field for every measurement I was interested in. Unfortunately this construction does make it /impossible/ to actually do math across measurements; Influx’s continuous queries would be needed here.
Inverting this…
select mean(Right_Main) from watt where device='iotawatt'
is almost better, but unfortunately you could only combine like units since influx doesn’t permit math across measurements.

Rehashing previous discussion in a new form, I’ll advocate a more influx-idiomatic format:
select mean(watt) from iotawatt where device='iotawatt' and output='Right_Main'
or
select mean(watt) from iotawatt_device1 where output='Right_Main'
Each of these allow arbitrary combinations and segmentation based on requirements.

Hi Matt,
First the backfill issue. When the influx service starts up, it queries the database for the last entry of the specified measurement with the first specified tag. If there are no tags, it looks for the last entry. Uploading starts with the next time interval after that. If the measurement doesn’t exist, it will start uploading from a week ago, thus providing a week’s worth of data right away.

The one week window was arbitrary. I could have gone with the oldest entry in your current log, or the present time, or a month ago. Now that it’s all working, I’ll try to provide a way to specify the starting time for upload for a new measurement.

There is a workaround. If you are making a new measurement, say “homepower” in a database called “iotawatt” with no tags, and you will want to start uploading from Jan 1, 2018. Before starting an iotawatt service for that measurement, you can seed your database with a starting entry using the following CLI command:

insert homepower first=0 1514764800000000

To get the time string I used one of the online unix time converters to get the unix millisecond time for 1/1/2018 and added three zeroes to convert to nanoseconds. There may be a way to specify the date directly but I couldn’t do it easily. In any event, that will seed your database for that measurement and the IoTaWatt will start uploading from there. If you are using tags, add the first tag to the insert using the standard line protocol.

As I look at what you are doing, the scenario is three outputs for each input = potentially 14x3 = 42 outputs. That’s a lot. No matter how you slice it, your gonna do a lot of typing. I might caution you as well that will generate some very large transactions to influxDB and you are doing it every 5 seconds. This is, after all, an IoT device and there are limits to the size and throughput available. I think it will handle it, but bulk uploading after a communication lapse for instance could take a really long time and dim the lights for quite awhile.

but the primary mission of IoTaWatt is to be able to easily configure and produce accurate basic usage data. To the extent that you can do what you want to do, have at it, but complicating the user interface to facilitate that level of detail is problematic. That said, I can see now that the underlying Json configuration might be restructured to allow producing multiple measurements and tags that are derived from input fields as you would like. With something like that, the door would be open to a separate application to edit a more complex payload specification - to the extent that heap and HTTP bandwidth will run it.

That is strange. You are using iotawatt for the measurement and also as a key field. Unless you have multiple iotawatt that you want to be able to differentiate, the key would seem to be superfluous.

The line protocol requires it. It is fundamental

<measurement>[,<tag_key>=<tag_value>[,<tag_key>=<tag_value>]] <field_key>=<field_value>[,<field_key>=<field_value>] [<timestamp>]

I’ll refer you to the Schema design discussion from influx. While we don’t know exactly how it works under the hood, there are specific recommendations that influence my thinking:

Store data in fields if you plan to use them with an InfluxQL function

So that says to me that mean and sum and integral work best with data values in fields. I suspect the filed data is stored in time weighted form (so for instance watts is really Wh or maybe Wnanoseconds) That makes the integral and mean functions possible without processing all of the points within a grouping.

Store data in fields if you need them to be something other than a string - tag values are always interpreted as strings

Again, these watts and volts are numbers, storing them as strings is inefficient.

Don’t have too many series

Tags containing highly variable information like UUIDs, hashes, and random strings will lead to a large number of series in the database, known colloquially as high series cardinality. High series cardinality is a primary driver of high memory usage for many database workloads.

I think in this context watts (to several decimal places) would be considered highly variable. Do you really need to query all of the measurements in which the Left_Main is 1246.34 watts?

Thanks for all this feedback. Even in presenting these counterpoints, I have had to go back and rethink much of this and it helps a lot. Working with grafana now, I’m exploring some interesting features with variables that I think will make it possible to build an interactive panel where you can select fields and tags and possibly units (watts or Wh) as well as time period.

1 Like

Great Thanks! This is a helpful tip; the error from iota at the time was actually that this query kept failing. This helped me resolve that.

During my experimentation I also somehow ended up with new series named…

There must be a concatenation bug somewhere…

This was a bit rhetorical – I intended to raise a question regarding why do we have a static value here that isn’t valuable in any queries. I also intended it to question you for other use cases that may not be clear to me (i.e. was there tooling that isn’t obvious already built elsewhere that required this…)

This comment makes me think we are talking past each other here. I don’t understand the issue your raising with cardinality (I’m only proposing tags for the sensor input names – those are strings and shouldn’t change).

Your concerns about storage are answered here (in which they recommend device identifiers are added as tags :wink: )

Setup background: I am running Influx in a Docker instance on an intel i5 quad (not an rPi!) consuming lots of local IoT metrics (not just IoTaWatt) at a rate of (100 events/s @ 2% cpu usage and only 5MB/12hr) so I agree that I’m not a normal user… never claimed to be.

I also acknowledge your point about the limits of the ESP device – that’s why I’d love to have all of this data in influx with as little CPU/BW as possible and avoid complicated script execution!! JFYI – to level-set context – I interact with global-scale Influx-like environments on a daily basis, I probably am making assumptions on shared-experiences that aren’t there, I’m just trying to steer these metrics with some lessons learned from a different life.

Anyway – let’s ground a use-case here.

My examples in the previous post were – bad – and lacked context. I have a lot of half-baked ideas for the IoTaWatt (this is why I love the OSS format, thank you again for all your work) but here are the ones I am currently working on (happy to contribute back if others are interested), that I’d love to see full support for with the new influx format.

Goals for influx data:

  • Calculate un-monitored components (W/A/VA/PF/AC Cycles on channel).
  • Calculate total usage across Right/Left of 220v input (W/VA/PF)
  • Integrate for KWH across specific components
  • Calculate VA/PF derivative across specific components/sensors
  • Estimate W shifts/error based on V shifts on non-shifting components (i.e. LED 100khz smps constant current pwr supply won’t shift W usage based on voltage changes so these effectively detect real IoTaWatt measurement errors).
  • Other Data-is-fun projects using ML which require lots of “signal” instead of just /watts/.

Setup:

  1. I use 12 inputs at the moment, and I have a web-scraper that monitors some of the internal IoTaWatt metrics (from the json); side-bar - Publishing those cycle counts/total counts to variables that could make it to influx would be great.
  2. 2 inputs are on the 120v US split-220v-main; right and left.

Acknowledging our goals are different… and recognizing scripting could make this easier AND also to avoid talking past each other, lets try to solve my first goal – purely in InfluxQL with 15m increments (with the intent to graph in grafana) using the various data-models we have discussed and that are built today:

BETA InfluxQL:

In influx-client:
create continuous query combined_cq on iotawatt
RESAMPLE EVERY 15m
begin
select sum(“watts”) AS “total_watts” into combined from (
select integral(value, 15m) as “watts” from /^power_.*/ GROUP BY time(15m) fill(null)
) GROUP BY time(15m) fill(null)
end;

create continuous query combined_cqv on power
resample every 15m
begin
select mean(value) as “volts” into combined from voltage group by time(15m) fill(null)
END;

create continuous query combined_cq_other on power
resample every 15m
begin
select integral(value,15m) as “counted_watts” into combined from /^sensor_.*/ group by time(15m) fill(null)
END;

In Grafana:
Watts Graph

  1. Calculate Unlabeled Power:
    select total_watts - counted_watts from combined WHERE ($timeFilter)
  2. Calculated Labeled Power:
    select total_watts from combined where ($timeFilter)
  3. Calculate Total Power:
    select integral(value, 15m) as “watts” from /^power_.*/ where ($timeFilter) GROUP BY time(15m) fill(null)

Amps Graph:

  1. Unlabeled:
    select (total_watts - counted_watts)/volts from combined WHERE ($timeFilter)
  2. Labeled:
    select total_watts/volts from combined where ($timeFilter)
  3. Total:
    select integral(value, 15m)/volts from /^power_.*/ where ($timeFilter) GROUP BY time(15m) fill(null)

In Alpha

Watts Graph:

  1. Unlabeled:
    select (integral(power_right_w,15m)+integral(power_left_w,15m)-(integral(“sensor1”,15m)+integral(“sensor2”,15m)+…) from iotawatt WHERE ($timeFilter) GROUP BY time(15m)
    a) List each sensor here in expansion … regexes not supported in integral function ;(
  2. Labeled:
    select (integral(power_right_w,15m)+integral(power_left_w,15m) from iotawatt WHERE ($timeFilter) GROUP BY time(15m)
  3. Sum Labeled:
    select integral(“sensor1”,15m)+integral(“sensor2”,15m)+… from iotawatt where ($timeFilter) GROUP BY time(15m) fill(null)
    a) List each sensor here in expansion … regexes not supported in integral function ;(
  4. Total:
    select (integral(power_right_w,15m)+integral(power_left_w,15m) from iotawatt WHERE ($timeFilter) GROUP BY time(15m)

    Removes the need for CQ’s – this is great and a huge improvement above the current usage of series, still somewhat complicated queries that aren’t definable within Grafana (harder for someone to just use grafana to set this up) and requires maintenance/mapping manually.

Using tagging:

Where the output tag is the sensor_output name, and watt/amp/hz is a field name.

Watts Graph:

  1. Unlabeled:
    select integral(watt,15m) -
    ( select integral(watt,15m) from iotawatt where output=/^sensor_.*/ and ($timeFilter) GROUP BY time(15) fill(0))
    from iotawatt where output=/^main_.*/ and ($timeFilter) GROUP BY time(15) fill(null)

  2. Labeled:
    select integral(watt) from iotawatt where ($timeFilter) GROUP by time(15m) fill(null)
    a) Each sensor: select integral(watt) from iotawatt where ($timeFilter) GROUP by output, time(15m) fill(null)

  3. Sum Labeled:
    select integral(watt) from iotawatt where output=/^sensor_.*/ and ($timeFilter) GROUP by output, time(15m) fill(null)

  4. Total:
    select integral(watt,15m) from iotawatt where output=/^main_.*/ and ($timeFilter) GROUP BY time(15m) fill(null)

Almost fully data-driven – only one query is difficult to replicate (the subquery) within grafana. It can probably be achieved in another way with Grafana. Grafana’s alias by tagging is really nice combined with this tag grouping [$tag_output]

There may be easier ways to do each of these things, just trying to highlight the benefits of tagging when writing data-driven queries.

First the easier stuff. There is a use case and I’ll try to explain it. There is interest in a lot of sectors (energy auditing, property management, solar PV networks) to monitor multiple installations. In that scenario, the measurements are the collection of power sensor outputs (fields) related to each installation, at each time interval. The individual sites are defined and categorized using tags. All of the data is uploaded to a single influxDB database. Depending on the industry the data is sliced and diced into region, user, organization, facility type, whatever. I have users who need this capability, so I’ve been leaning toward this relatively flat layout because it’s simple to understand, deploy, and use. Once the points are atomized by unique tags, these users would need to fall back on regular expressions, as you have, to accomplish what are simple tasks under the current scheme. With respect to influxDB I see the IoTaWatt as more a node in a network of measuring locations than a single home appliance.

I admit that I’m over my head but that’s my point. When it gets that complicated, it’s time to question if that’s what the device should be doing. As you say, that’s the beauty of OSS, you can fork it and rework the influxService to your liking.

So thinking out load about how that works… In the old implementation, the input name was the measurement name. IMO the IoTaWatt measures power every 5 seconds. I like to equate that to a measurement. So maybe each of the (14) inputs becomes a different point within the same measurement? as in:
power,output="main_left" value=932.61 152236784
That produces a tag output with a cardinality equal to the number of outputs, a relatively low value. But it also forces the data for one 5 second interval to be stored as 14 points (thanks for the technical references).

true that. Not going to dump my resume, but let’s just say I’m rusty. I’m wearing a lot of hats (all the hats) trying to get this in line and working reliably. I appreciate that you understand I have limitations. Right now I’m in the middle of sourcing accessories from Asia, regulatory and safety compliance testing, developing distribution channels, and organizing this influx/grafana component to make the device more useful. I’d love to just program and maybe port to the ESP32, but that’s not the job right now.

Thanks for all your input.

@matt,

I’ve been working with this some more and looking at it from a different perspective. Here’s what I’m thinking:

For each “output” from IotaWatt - produce a measurement that:

Has a measurement name of the output.
Has a field called “value” that is the output metric.
Contains the global tag-set specified for that IotaWatt.

This is pretty much like the original implementation except for the addition of the tag-set, which will allow for identification of nodes, regions, etc.

What drove me to this was my inability to get grafana to produce a composite graph from a multi value variable of fields. I could easily make a query to do it, but I couldn’t get grafana to do that. Their forum was unresponsive.

What I would like to know is if this would better accommodate your needs?

1 Like

This sounds exactly what grafana is experiencing (and the way HomeAssistant is uploading data to influxdb).

May I suggest that the measurement part follow the unit of the measurement too (Watt, VA, PF, Wh, etc.).
I know that the bandwidth will increase with this format but it is apparently the “correct way”. (And if all measurements for all input could be sent as a long string that is newline separated it should not be that big issue.)

I was thinking of doing that. So you are saying instead of using the field key “value” use “watts” or “amps” etc? That makes sense. I was trying to come up with a way to specify multiple units for one measurement, so for instance you could ask for watts and pf for a circuit and the entry would have two entries in the field set for those units. In any event, the format leaves the door open for that in the future.

That’s how it works now. When it gets behind, or if bulk-send is greater than 1, it will aggregate the measurements into a larger HTTP transactions.

1 Like

According to this https://docs.influxdata.com/influxdb/v1/guides/write_data/ I think the following format should be perfect:

watts,input=1,name=MyFirstInput value=0.55 1422568543702900257
amps,input=1,name=MyFirstInput value=2.0 1422568543702900257

etc…

@overeasy@fredrike – I’m just happy we’re discussing this. I haven’t submitted any PR’s yet, just a happy user, will look into doing so in the future. (Would you prefer PR’s in GitHub over the forums here?)

Influx isn’t your only consumer and this data-model complexity may not be worth it (I can always reorganize locally). Just popped in since I saw the database format was changing. I think my queries previously capture the format/query goals I was trying to advocate.

If i had to come up with a UI for this, perhaps it would be a IoTaWatt input label + some check boxes:
INPUT_1_LABEL [ amps ] [ watts ] [ va ] [ pf ] ...
OUTPUT_X configured_output_unit [ enable ]

Which would then correspond to an influx “line protocol” of something very simlar to @fredrike’s proposed format:
<unit>,name=INPUT_1_LABEL,input=1 <configured value>=XX.XXf <time>
^^ i really liked the name and the input here – allows some improved mapping from the measurement to the database without double de-aliasing.

Just to highlight the tags here – Grafana’s alias via tags is a very powerful feature that I’ve grown to love.

1 Like

We should also support user custom tags here to uniquely identify each IoTaWatt unit.

Should we really have <configured value> should that not “always” just be value?

I created an issue on github a couple of days ago and in the comment I show how data from Home-Assistant is represented in influxdb vs. data from IoTaWatt (Better support for outputs · Issue #145 · boblemaire/IoTaWatt · GitHub), allowing grouping by (in the Home-Assistant case) friendly_name allows for really easy and powerful graphs in grafana.

Don’t forget the ability to use Variable syntax | Grafana documentation for selecting subset of “inputs”.

With some pressure to finalize this, I’ve had to make some decisions. My sense is that while it doesn’t seem possible to accommodate all of the suggestions above, the input has helped shape an approach that should be workable for everyone.

I take some comfort in seeing that the influData folks themselves have different approaches in the several examples that they use. For instance the basic CPU load measurement examples and the entomological census example. It all bears on how the data will be used.

So here’s what I have in first level testing:

As hinted at a few days ago, the basic line protocol measurement is:

<output-name>[,<1st-tag-key=1st-tag-value[,2nd-tag-key=2nd-tag-value[,....]]] value=<output-value> <time>

I liked the idea of using the measurement units instead on “value” in the field-set-key, but this doesn’t seem to play well with grafana in specifying aggregation functions with regular expressions. The basic configuration screen looks like this:


Notice that there are some new specifiers:

  • retention policy has been added. I’m not sure why this hasn’t been mentioned before by anybody, but it needed to added and if specified will be associated with the measurements. If left off, influx will use the default policy.
  • upload history from: If specified, establishes the starting point for uploading available history. If there are any entries in the database, qualified by the first tag, the time of the last entry will override. This is mostly useful when first setting up an influx server, or for cleaning existing data and uploading a modified data view.

The status display now has a tab for influxDB:
image
The influx service can now be started and stopped. This way, you don’t need to discard your configuration to suspend updating. The current running/stopped state is displayed, and the time of the most recent posted measurements is continuously displayed. When the service is stopped, it will query the database for most recent updates when restarted. So if you drop the most recent measurements, it’s possible to upload fresh measurements using a modified specification.

I hope to have this available for ALPHA by the weekend. The standard disclaimers apply to ALPHA subscribers.