Test lowcost temperature sensors

Aug. 8, 2021

Comparing temperature sensors (Dallas DS18B20, Microchip MCP9808, Resistance Temperature Detector (RTD) PT-100, Sensirion SHT(3x, 21).

Cleaning and reformatting complex data for pandas. The data was stored in text file with list of dictionary and the key as another dictionary

Using seaborn to display data by using its powerful feature hue


Processing records of experiment with lowcost temperature probes

{'ds18_1': {'temp': 25.0625},
 'ds18_2': {'temp': 25.4375},
 'epoch': 1619566926,
 'hostname': 'temperature__6faf24',
 'mcp_1': {'temp': -11},
 'restart': False,
 'rtd_1': {'rtd': 8255, 'temp': 21.40634},
 'rtd_2': {'rtd': 8449, 'temp': 27.96803},
 'rtd_3': {'rtd': 8432, 'temp': 27.39254},
 'sensor': 'temperature',
 'sht21_1': {'humid': 118.9924, 'temp': 23.58172},
 'sht3x_1': {'humid': 98.06, 'temp': 24.1},
 'sht3x_2': {'humid': 96.53, 'temp': 24.29},
 'time': '2021-04-27 23:42:11'}

TOP

Library & Setup

In [1]:
import matplotlib
%matplotlib inline
In [2]:
import matplotlib.pyplot as plt
import datetime
import pandas as pd
import numpy as np
In [3]:
import seaborn as sns
In [4]:
# plt.rcParams['axes.titlesize']= 'large'
# plt.rcParams['axes.titleweight'] = 'bold'
# plt.rcParams['font.size']=13
# plt.rcParams['font.sans-serif'] = 'Open Sans'
# # plt.rcParams['font.family'] = 'sans-serif'
# plt.rcParams['text.color'] = '#4c4c4c'
# plt.rcParams['axes.labelcolor']= '#4c4c4c'
# plt.rcParams['xtick.color'] = '#4c4c4c'
# plt.rcParams['ytick.color'] = '#4c4c4c'
In [5]:
import json
In [6]:
plt.style.use('default')
plt.rcParams["figure.figsize"] = (8,6)

TOP

Experiment Setup

In [7]:
from IPython.core.display import HTML
import base64
def convert_img_base64(fpath=None):
    '''convert image to bytes and display on jupyter'''
    
    with open(fpath, "rb") as image_file:
        encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
    img = f'data:image/png;base64,{encoded_string}'
#     return img
#     print(img[:200])
    return HTML(f"{img}>")

Here are photos of the experiments. I intended to make a comparision of several temperature probes and place ice inside the box to reduce the temperature

List of temperature probes are

  1. Three PT-100 with two configued as 3-wired and one at 2-wired setup
  2. Total 3 Sensirion sensor (SHT31, SHT30, SHT21)
  3. Two Dallas one-wired DS18B20
  4. One Microchip MCP9808

MCU ESP8266 was used to read data from the probes. Collected data packed into dictionary and sent over MQTT server.

Entire breadboards, probes and the battery were placed inside a plastic box (2L) with a pack of ice, isolated a layer of cotton shirt, then thick blanket

In [8]:
convert_img_base64('rtd1.jpg')
Out[8]:
In [9]:
convert_img_base64('rtd2.jpg')
Out[9]:

TOP

Clean data

In [10]:
# file = 'http://localhost:8000/media/files/shared/temperature.txt'
file = 'https://b-io.info/media/files/shared/temperature.txt'
In [11]:
import requests
In [12]:
req = requests.get(file)
req.status_code
Out[12]:
200
In [13]:
lines = req.text.split('\n')
In [14]:
# with open(file) as f:
#     lines = f.readlines()
In [15]:
len(lines)
Out[15]:
550
In [16]:
lines[0]
Out[16]:
'{"sensor": "temperature", "restart": true, "time": "2021-04-27 21:32:21"}'
In [17]:
lines[-2]
Out[17]:
'{"sensor": "temperature", "hostname": "temperature__6faf24", "restart": false, "epoch": 1619566926, "rtd_1": {"rtd": 8255, "temp": 21.40634}, "rtd_2": {"rtd": 8449, "temp": 27.96803}, "rtd_3": {"rtd": 8432, "temp": 27.39254}, "sht3x_1": {"temp": 24.1, "humid": 98.06}, "sht3x_2": {"temp": 24.29, "humid": 96.53}, "mcp_1": {"temp": -11}, "sht21_1": {"temp": 23.58172, "humid": 118.9924}, "ds18_1": {"temp": 25.0625}, "ds18_2": {"temp": 25.4375}, "time": "2021-04-27 23:42:11"}'
In [18]:
from pprint import pprint
In [19]:
pprint(json.loads(lines[-2]))
{'ds18_1': {'temp': 25.0625},
 'ds18_2': {'temp': 25.4375},
 'epoch': 1619566926,
 'hostname': 'temperature__6faf24',
 'mcp_1': {'temp': -11},
 'restart': False,
 'rtd_1': {'rtd': 8255, 'temp': 21.40634},
 'rtd_2': {'rtd': 8449, 'temp': 27.96803},
 'rtd_3': {'rtd': 8432, 'temp': 27.39254},
 'sensor': 'temperature',
 'sht21_1': {'humid': 118.9924, 'temp': 23.58172},
 'sht3x_1': {'humid': 98.06, 'temp': 24.1},
 'sht3x_2': {'humid': 96.53, 'temp': 24.29},
 'time': '2021-04-27 23:42:11'}
In [20]:
# we will convert a list of records into dataframe
def clean_data_sensor(data, debug=False):
    '''read file into a list of line then pack into a dataframe'''
    
#     with open(file) as f:
#         data = f.readlines()
    cdata = list()
    try: 
        for line in data:
            cdata.append(json.loads(line))
    except Exception as e:
        print(e, line)
        pass

    df = pd.DataFrame(data=cdata)
    if debug:
        max_keys = max(len(line.keys()) for line in cdata)
        for line in cdata:
            if len(line.keys()) == max_keys:
                print(line.keys())
                print(line)
                break
        print(df.info())
    return df    
In [21]:
df = clean_data_sensor(lines)
df.head(3)
Expecting value: line 1 column 1 (char 0) 
Out[21]:
sensor restart time hostname epoch rtd_1 rtd_2 rtd_3 sht3x_1 sht3x_2 mcp_1 sht21_1 ds18_1 ds18_2
0 temperature True 2021-04-27 21:32:21 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 temperature True 2021-04-27 21:32:52 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 temperature False 2021-04-27 21:32:56 temperature__6faf24 1.619559e+09 {'rtd': 8360, 'temp': 24.92235} {'rtd': 8491, 'temp': 29.3564} {'rtd': 0, 'temp': 28.91611} {'temp': 28.47, 'humid': 74.15} {'temp': 28.61, 'humid': 74.51} {'temp': 25.25} {'temp': 27.96828, 'humid': 74.35278} {'temp': -127} {'temp': -127}
In [22]:
df.info()

RangeIndex: 549 entries, 0 to 548
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   sensor    549 non-null    object 
 1   restart   549 non-null    bool   
 2   time      549 non-null    object 
 3   hostname  545 non-null    object 
 4   epoch     545 non-null    float64
 5   rtd_1     545 non-null    object 
 6   rtd_2     545 non-null    object 
 7   rtd_3     545 non-null    object 
 8   sht3x_1   545 non-null    object 
 9   sht3x_2   545 non-null    object 
 10  mcp_1     545 non-null    object 
 11  sht21_1   545 non-null    object 
 12  ds18_1    545 non-null    object 
 13  ds18_2    545 non-null    object 
dtypes: bool(1), float64(1), object(12)
memory usage: 56.4+ KB
In [23]:
# we can drop some columns that are the same for each sensor
# also NaN value on epoch
df.drop(columns=['sensor', 'hostname', 'time', 'restart'], inplace=True)
df.dropna(subset=['epoch'], inplace=True)
In [24]:
df.head()
Out[24]:
epoch rtd_1 rtd_2 rtd_3 sht3x_1 sht3x_2 mcp_1 sht21_1 ds18_1 ds18_2
2 1.619559e+09 {'rtd': 8360, 'temp': 24.92235} {'rtd': 8491, 'temp': 29.3564} {'rtd': 0, 'temp': 28.91611} {'temp': 28.47, 'humid': 74.15} {'temp': 28.61, 'humid': 74.51} {'temp': 25.25} {'temp': 27.96828, 'humid': 74.35278} {'temp': -127} {'temp': -127}
3 1.619559e+09 {'rtd': 8361, 'temp': 24.92235} {'rtd': 8492, 'temp': 29.39024} {'rtd': 8478, 'temp': 28.98388} {'temp': 28.5, 'humid': 74.1} {'temp': 28.62, 'humid': 74.38} {'temp': 25.25} {'temp': 27.97901, 'humid': 73.39148} {'temp': -127} {'temp': -127}
4 1.619559e+09 {'rtd': 8360, 'temp': 24.95614} {'rtd': 8493, 'temp': 29.42408} {'rtd': 8479, 'temp': 28.98388} {'temp': 28.5, 'humid': 73.92} {'temp': 28.68, 'humid': 74.2} {'temp': 25.25} {'temp': 27.98973, 'humid': 73.49066} {'temp': -127} {'temp': -127}
5 1.619559e+09 {'rtd': 8361, 'temp': 24.98994} {'rtd': 8492, 'temp': 29.42408} {'rtd': 8481, 'temp': 27.90028} {'temp': 28.51, 'humid': 73.86} {'temp': 28.69, 'humid': 74.1} {'temp': 25.125} {'temp': 28.00046, 'humid': 73.31519} {'temp': -127} {'temp': -127}
6 1.619559e+09 {'rtd': 8363, 'temp': 25.02383} {'rtd': 8492, 'temp': 29.42408} {'rtd': 8481, 'temp': 29.01772} {'temp': 28.56, 'humid': 73.93} {'temp': 28.73, 'humid': 74.18} {'temp': 25.25} {'temp': 28.03263, 'humid': 73.80347} {'temp': -127} {'temp': -127}
In [25]:
# colummn names
cols = list(df.columns)[1:]
cols
Out[25]:
['rtd_1',
 'rtd_2',
 'rtd_3',
 'sht3x_1',
 'sht3x_2',
 'mcp_1',
 'sht21_1',
 'ds18_1',
 'ds18_2']
In [26]:
# use epoch as timestamp
df.epoch = df.epoch.astype(int)
df.set_index('epoch', inplace=True)
df.head()
Out[26]:
rtd_1 rtd_2 rtd_3 sht3x_1 sht3x_2 mcp_1 sht21_1 ds18_1 ds18_2
epoch
1619559174 {'rtd': 8360, 'temp': 24.92235} {'rtd': 8491, 'temp': 29.3564} {'rtd': 0, 'temp': 28.91611} {'temp': 28.47, 'humid': 74.15} {'temp': 28.61, 'humid': 74.51} {'temp': 25.25} {'temp': 27.96828, 'humid': 74.35278} {'temp': -127} {'temp': -127}
1619559186 {'rtd': 8361, 'temp': 24.92235} {'rtd': 8492, 'temp': 29.39024} {'rtd': 8478, 'temp': 28.98388} {'temp': 28.5, 'humid': 74.1} {'temp': 28.62, 'humid': 74.38} {'temp': 25.25} {'temp': 27.97901, 'humid': 73.39148} {'temp': -127} {'temp': -127}
1619559198 {'rtd': 8360, 'temp': 24.95614} {'rtd': 8493, 'temp': 29.42408} {'rtd': 8479, 'temp': 28.98388} {'temp': 28.5, 'humid': 73.92} {'temp': 28.68, 'humid': 74.2} {'temp': 25.25} {'temp': 27.98973, 'humid': 73.49066} {'temp': -127} {'temp': -127}
1619559208 {'rtd': 8361, 'temp': 24.98994} {'rtd': 8492, 'temp': 29.42408} {'rtd': 8481, 'temp': 27.90028} {'temp': 28.51, 'humid': 73.86} {'temp': 28.69, 'humid': 74.1} {'temp': 25.125} {'temp': 28.00046, 'humid': 73.31519} {'temp': -127} {'temp': -127}
1619559220 {'rtd': 8363, 'temp': 25.02383} {'rtd': 8492, 'temp': 29.42408} {'rtd': 8481, 'temp': 29.01772} {'temp': 28.56, 'humid': 73.93} {'temp': 28.73, 'humid': 74.18} {'temp': 25.25} {'temp': 28.03263, 'humid': 73.80347} {'temp': -127} {'temp': -127}

The column name ( rtd - Resistance Temperature Detector) are the type of sensor. I also have SHT from Sensirion, MCP from Microchip, DS18B20 from Dallas

The row are timestamp in epoch (number of seconds since 1970), and the value in cell are a dictionary with many pairs key:value as needed

TOP

Game plan

An ultimate goal to arrange data in cell by its value (numeric or categorical), and in column for its name

We essentially turn a wide table to a long table

Let try to unpack one column to a new dataframe with the columns for paramaters and cells are the values

In [27]:
df['rtd_1']
Out[27]:
epoch
1619559174    {'rtd': 8360, 'temp': 24.92235}
1619559186    {'rtd': 8361, 'temp': 24.92235}
1619559198    {'rtd': 8360, 'temp': 24.95614}
1619559208    {'rtd': 8361, 'temp': 24.98994}
1619559220    {'rtd': 8363, 'temp': 25.02383}
                           ...               
1619566866    {'rtd': 8249, 'temp': 21.16982}
1619566881    {'rtd': 8250, 'temp': 21.23744}
1619566896     {'rtd': 8253, 'temp': 21.2712}
1619566911    {'rtd': 8254, 'temp': 21.37258}
1619566926    {'rtd': 8255, 'temp': 21.40634}
Name: rtd_1, Length: 545, dtype: object
In [28]:
# unpack dictionary
df['rtd_1'].apply(pd.Series)
Out[28]:
rtd temp
epoch
1619559174 8360.0 24.92235
1619559186 8361.0 24.92235
1619559198 8360.0 24.95614
1619559208 8361.0 24.98994
1619559220 8363.0 25.02383
... ... ...
1619566866 8249.0 21.16982
1619566881 8250.0 21.23744
1619566896 8253.0 21.27120
1619566911 8254.0 21.37258
1619566926 8255.0 21.40634

545 rows × 2 columns

In [29]:
# pivot table, turn columns to rows
df['rtd_1'].apply(pd.Series).unstack().reset_index()
Out[29]:
level_0 epoch 0
0 rtd 1619559174 8360.00000
1 rtd 1619559186 8361.00000
2 rtd 1619559198 8360.00000
3 rtd 1619559208 8361.00000
4 rtd 1619559220 8363.00000
... ... ... ...
1085 temp 1619566866 21.16982
1086 temp 1619566881 21.23744
1087 temp 1619566896 21.27120
1088 temp 1619566911 21.37258
1089 temp 1619566926 21.40634

1090 rows × 3 columns

In [30]:
# now we can write a function to process other columns
def rearrange_col(df, col):
    dft = pd.DataFrame()
    dft = df[col].apply(pd.Series).unstack().reset_index()
    dft.columns = ['parameter', 'epoch', 'value']
    dft['sensor'] = col
    return dft
In [31]:
ls_df = list()
for col in cols:
    print(col)
    dft = rearrange_col(df, col)
    ls_df.append(dft)

# concatinate a list of dataframe to one df
dfs = pd.concat(ls_df)
rtd_1
rtd_2
rtd_3
sht3x_1
sht3x_2
mcp_1
sht21_1
ds18_1
ds18_2
In [32]:
dfs.head()
Out[32]:
parameter epoch value sensor
0 rtd 1619559174 8360.0 rtd_1
1 rtd 1619559186 8361.0 rtd_1
2 rtd 1619559198 8360.0 rtd_1
3 rtd 1619559208 8361.0 rtd_1
4 rtd 1619559220 8363.0 rtd_1
In [33]:
# sensor columns contains all sensors
dfs['sensor'].unique()
Out[33]:
array(['rtd_1', 'rtd_2', 'rtd_3', 'sht3x_1', 'sht3x_2', 'mcp_1',
       'sht21_1', 'ds18_1', 'ds18_2'], dtype=object)
In [34]:
# column parameter is for the key in dictionary
dfs['parameter'].unique()
Out[34]:
array(['rtd', 'temp', 'humid'], dtype=object)
In [35]:
# now we use powerful seaborn library for plotting
sns.scatterplot(data=dfs, x='epoch', y='value', hue='sensor')
Out[35]: