influxDB support

overeasy · April 9, 2018, 2:44pm

Hi Matt,
First the backfill issue. When the influx service starts up, it queries the database for the last entry of the specified measurement with the first specified tag. If there are no tags, it looks for the last entry. Uploading starts with the next time interval after that. If the measurement doesn’t exist, it will start uploading from a week ago, thus providing a week’s worth of data right away.

The one week window was arbitrary. I could have gone with the oldest entry in your current log, or the present time, or a month ago. Now that it’s all working, I’ll try to provide a way to specify the starting time for upload for a new measurement.

There is a workaround. If you are making a new measurement, say “homepower” in a database called “iotawatt” with no tags, and you will want to start uploading from Jan 1, 2018. Before starting an iotawatt service for that measurement, you can seed your database with a starting entry using the following CLI command:

insert homepower first=0 1514764800000000

To get the time string I used one of the online unix time converters to get the unix millisecond time for 1/1/2018 and added three zeroes to convert to nanoseconds. There may be a way to specify the date directly but I couldn’t do it easily. In any event, that will seed your database for that measurement and the IoTaWatt will start uploading from there. If you are using tags, add the first tag to the insert using the standard line protocol.

As I look at what you are doing, the scenario is three outputs for each input = potentially 14x3 = 42 outputs. That’s a lot. No matter how you slice it, your gonna do a lot of typing. I might caution you as well that will generate some very large transactions to influxDB and you are doing it every 5 seconds. This is, after all, an IoT device and there are limits to the size and throughput available. I think it will handle it, but bulk uploading after a communication lapse for instance could take a really long time and dim the lights for quite awhile.

but the primary mission of IoTaWatt is to be able to easily configure and produce accurate basic usage data. To the extent that you can do what you want to do, have at it, but complicating the user interface to facilitate that level of detail is problematic. That said, I can see now that the underlying Json configuration might be restructured to allow producing multiple measurements and tags that are derived from input fields as you would like. With something like that, the door would be open to a separate application to edit a more complex payload specification - to the extent that heap and HTTP bandwidth will run it.

That is strange. You are using iotawatt for the measurement and also as a key field. Unless you have multiple iotawatt that you want to be able to differentiate, the key would seem to be superfluous.

The line protocol requires it. It is fundamental

<measurement>[,<tag_key>=<tag_value>[,<tag_key>=<tag_value>]] <field_key>=<field_value>[,<field_key>=<field_value>] [<timestamp>]

I’ll refer you to the Schema design discussion from influx. While we don’t know exactly how it works under the hood, there are specific recommendations that influence my thinking:

Store data in fields if you plan to use them with an InfluxQL function

So that says to me that mean and sum and integral work best with data values in fields. I suspect the filed data is stored in time weighted form (so for instance watts is really Wh or maybe Wnanoseconds) That makes the integral and mean functions possible without processing all of the points within a grouping.

Store data in fields if you need them to be something other than a string - tag values are always interpreted as strings

Again, these watts and volts are numbers, storing them as strings is inefficient.

Don’t have too many series

Tags containing highly variable information like UUIDs, hashes, and random strings will lead to a large number of series in the database, known colloquially as high series cardinality. High series cardinality is a primary driver of high memory usage for many database workloads.

I think in this context watts (to several decimal places) would be considered highly variable. Do you really need to query all of the measurements in which the Left_Main is 1246.34 watts?

Thanks for all this feedback. Even in presenting these counterpoints, I have had to go back and rethink much of this and it helps a lot. Working with grafana now, I’m exploring some interesting features with variables that I think will make it possible to build an interactive panel where you can select fields and tags and possibly units (watts or Wh) as well as time period.