Weather Station with Light, Humidity, Temperature: More Data QA/QC

Capture and Store Data

Three approaches to ensure the accurate of data and the uncertainty as well are:

  1. Using multiple sensors. For temperature, this version has 4 sensors.
  2. Data of a parameter such as light intensity will be displayed as the average and ∓ 2σ band to represent a 95% confident level.
  3. Local data will be charted with other open source data for comparison.

The third step will be for another time. This tutorial may get a bit too long and more stuff in one place.

5.1 Expect MQTT Message

By now, the expected output from the Serial Monitor with the deepsleep like the below:

mqtt sent

It is hard to see and too late for recovery since I did not have another copy of this screenshot. Basically, with deeepsleep in place, after the ESP boots up, it produces some weird characters first, followed by the connection to the WiFi and MQTT. Then, it is taking readings from the sensors and publishing to the MQTT broker. Finally, it takes 5-minute snap. This is the expected outcome from client side (ESP8266 with sensors). Everything seemed working and the message was pushed to the local broker from the photo above.

5.2 Capture and Store Data

The message is published to a topic in MQTT broker, we need to create a script working as a client to listen to the topic and store data to the database. My current choices are Python and Sqlite.

I used the same approach for listening to MQTT message and store the data into SQL database here , so I would like not to repeat myself entirely. Basic steps are:

  • Install Python libraries. I used Linux to demonstrate but the semantics is similar to Windows and Mac. You have to install Python first in Windows.
  • Create a blank table in SQL database
  • Employ paho-mqtt libraries to listen to MQTT message and write the data to the database
Let move on:

  1. Install libraries:
  2. sudo apt install sqlite3
    pip3 install mqtt-paho numpy pandas --user

    (--user) is optional in case a problem with user's permission. Numpy is fast library to calculate basic statistics and Pandas is for a large data in tabular form for the next step.

  3. Create blank SQL table, node1 for raw data, and node1_post for pre-processing data:
  4. CREATE TABLE node1 (time DATETIME, ldr_1 INT, ldr_2 INT, temt6000 INT, bh1750 INT,\
                    tsl1_f INT, tsl1_lux INT, tsl2_f INT, tsl2_lux INT, ds_1 REAL, ds_2 REAL,\
                     tdht REAL, tsht REAL, hdht REAL, hsht REAL, hic REAL);
    mqtt received
  5. Finally, execute the Python code to capture data:
  6. python3

    Add nohup before python3 to dump the output to the file and & at the end to keep the script runs even after the terminal closed.

Download the Python script on GitHub

What does this script perform?

  1. It starts in the end (of the file):
  2. client = mqtt.Client()
    client.username_pw_set(username=mqtt_username, password=mqtt_password)
    client.connect(mqtt_broker_ip, 1883, 60)
    client.on_connect = on_connect
    client.on_message = on_message
    client.on_disconnect = on_disconnect

    by creating an instance of class mqtt.Client(), setting up the authentication and hooking up functions. The loop_forever() lets the instance started again after 1-second sleep.

  3. After being connected, the instance is await for a new MQTT message published to MQTT broker. When the message is available, it checks for flag if the message is new and with the right client:
  4. if msg.retain == 0: if payload['sensor'] =='node1':
  5. then create a sql command for node1 table and a node1_post table
  6. sql_cmd_one = """INSERT INTO {1[sensor]} VALUES ('{0}', {1[ldr_1]}, {1[ldr_2]}, {1[t6000]}, {1[bh1750]}, {1[tsl1_f]}, {1[tsl2_f]},{1[tsl1_l]}, {1[tsl2_l]},
                        {1[ds_1]:.2f}, {1[ds_2]:.2f},{1[tdht]}, {1[tsht]:.2f},
                        {1[hdht]:.2f}, {1[hsht]:.2f}, {1[hic]:.2f});""".format(time_, payload)
  7. finally, it executes the sql command to store the data and wait for the next MQTT message
  8. def writeToDB(sql_command):
        '''store captured data'''
        conn = sqlite3.connect(dbFile)
        cur = conn.cursor()

For data pre-processing, I used Numpy library. For a few input, this approach is over-killed but you could image this is a cleaner way if you a few dozens of one parameter. Numpy and Pandas are the right tools for data organized in a tabular format. Calculation the average and the standard deviation (σ) is a breeze. In the next step, I will show how Pandas library supports a SQL query and loads external data into DataFrame. Working with DataFrame reduces the dimension but does require users to be able to abstract such approach, so that command are felt more intuitive.

▣ ▣ ▣