Lately I’ve begun the process of developing an ESP32 version of IoTaWatt. I had a breadboarded ESP32 hooked up to an SD card that I used for experiments into SDMMC speed (It’s faster, but really hits it’s stride with longer transfers).
So I extended that project board by wiring it to an ESP8266less IoTaWatt.
It looks like a ragtag mess cobbled together, but is actually working flawlessly. Front and center is the ESP32 DevKit. To it’s left is an SDcard adapter transformed into a microSD socket connected to the ESP’s SDMMC interface. To the right is a 3.3V supply that powers the IoTaWatt ADCs, which of course are on the standard IoTaWatt V5 board in the background.
The ESP8266 has one processor running at 80MHz. The ESP32 has two processors running at 240MHz. That’s roughly six times the raw horsepower. The V5 (ESP8266) version spends 67% of it’s time sampling power, and 33% doing everything else, like running the Webserver, uploading data and servicing queries. Sometimes that other stuff slows down sampling. Sometimes that other stuff takes a long time. Beyond the performance issues there is a general resource scarcity in the V5 that inhibits the development of new capabilities.
So the first order of business with the ESP32 was to decide the basic architecture of the firmware - in a broad sense how to use the resources. Right now, I’m using the Arduino platform that is layered on the ESP-IDF and the FreeRTOS operating system. The Arduino platform seemed a natural because the V5 uses it, but now I’m rethinking that. There are is a lot of useful functionality and control available in the ESP-IDF that is not exposed through the standard Arduino core. At least I’d like to just run Arduino on top of a custom IDF build, but really, it would be nice to lose the Arduino layer altogether.
In any event, the new environment takes advantage of the FreeRTOS environment and uses multi-threading with multiple tasks to efficiently use resources and prioritize the work.
The fundamental approach that I’ve settled on is to dedicate one processor to sampling power. That involved moving all of the existing environment to core 0, and starting a dedicated task to sample on core 1. The result is that cycles sampled is now equal to line-frequency. Where the V5 strives to sample 40 out of 60 cycles per second. The ESP32 sampler does 60 out of 60. The resulting samples are post-processed by a high priority task running on the other core. The FreeRTOS queues make it easy to synchronize the workflow into and out of the sampler as it runs flat out against line frequency.
So that took a long time to get working properly. It’s a complete rewrite of the sample code. The SPI works differently on the ESP32, so it was necessary to hack some of the HAL (hardware abstraction layer) code to get the needed speed and flexibility. The V5 does some equivalent things to utilize the wait time during SPI transfers on the ESP8266.
The V5 uses the standard synchronous ESP8266WebServer. With the ESP32, I’ve changed over to me-no-dev’s Async Web Server. The V5 was already using the AsyncTCP component for outbound client I/O in the upload servers. This WebServer went in pretty smooth, although there is a lot left to do to add authorization and all of the handlers in the V5.
The ESP32 has robust support for SNTP, so time services needed to be revamped. That touches everything, but is much more straightforward than the code in the V5 as it uses standard “c” time services with native support for timezones. More work needed there but promises global timezone and DST support.
Once the datalog started working, query support was ported in with just a few changes to the way time is handled. The result is that Graph+ is working. This is a milestone because it offers a real insight into ESP32 performance. Where a typical graph query/plot is two or three seconds on the V5, it’s usually about a second on the ESP32. The query is typically 1/2 second while the rest is the browser doing it’s javascript thing.
Here’s what it looks like now:
Some significant differences:
- Samples per AC cycle are up to 717 from about 643 in the V5 (I predict 860 for 50Hz up from about 770 in the V5).
- AC cycles sampled/second is equal to the number of cycles. This is a 50% increase, and so the architecture would support another ADC to add 8 more inputs.
- Free Heap is eight times the ESP8266 heap. That will go down a little as the project moves forward, but still well over 100K, so possible that HTTPS will be a reality as well as other big improvements.
This is a long way from ready for prime-time. There’s a lot left to do, and once the ice is gone, I’ll be sailing again. Even once there is a viable unit, I’ll have to decide whether to spend the money needed to get FCC, ETL, and CE certifications.
Meanwhile, the development work presents a rewarding challenge.