The temperature data from the low-cost unit was first compared with duplicated DS18B20 sensors as shown in Figure 1. DS18B20 is a truly affordable, reliable, and robust sensor.
Figure 1 offers a larger look at the monthly trend and fitted the temperature pattern in Hanoi. The Figs. 2-4 are cutouts of three occasions in 2019 to a closer looks on each reading, aggregated data such as hourly and daily averages.
With low-cost sensors, one key disadvantage is unstable readings, in which the output, in this case, the temperature, can be yielded to default error value such as -127. This happened with weak electrical connections and temperature reading is not available in the sensor's buffer (memory), and hence a default code is sent. The second problem is the longevity of the sensor which is a headache for capacitive ones such as those measure humidity contents. With over one year of operation, DS18B20 has not shown such issues.
The reproducibility is another dimension with low-cost sensors, in which the sensor in the same type or measuring the same parameter should yield a similar reading. Fig. 5 shows aggregated data from two DS18B20 sensors for one week.
Next, I show some analysis of the different readings in these two sensors. First, Fig. 6 shows five bins of differences in temperature readings with the x-axis label marked the average of each bin. The data taken this distribution is hourly average and each hour included 12 readings. The data showed that two-third of readings is less than 0.4 ° differences. This is inlined with the specification of the manufacturer, in which DS18B20 has an accuracy of ±0.5°in -10 to +85°.
Of course, if we could dig into details, with open tools such as pandas, we can see the variation of two sensors, one type on temperature by the hour of the days, and by months as shown in Figs. 7-8.
The outcome is little counter-intuitive. A higher temperature in the summer led to a lower difference but glaring sunlight in mid-afternoons appears to contribute to the larger difference in readings of the same type of the sensor.
In a sample box, I have another DS18B20 sensor installed about 2cm above the Raspberry Pi. The sample box has a fan installed on the rear end to draw out the air through the front grid window while the low-cost unit only has a window cut out from the plastic box. The Pi is a low-power Single-on-Chip (SoC) device, similar to ESP8266 which is the microcontroller for the low-cost weather unit, but much more powerful in computing power. One would hypothesize that the Pi would generate more heat and thus lead to a higher temperature. Fig. 9 presented data of the sensors on top of the Pi and two from the outside. The data does not support the hypothesis, possibly because the heat was drawn out by the fan, and Pi was running around 10% of CPU usage.
Above, I analyzed the data by DS18B20 in detail. In today's market, DS18B20 is only one of many low-cost sensors available. Some of the popular sensors for temperature are DHT11, DHT22, BME280, SHTxx, HTU21, SHT7i. This class of sensor includes a capacitive instrument for measuring the humidity content and converted to the relative humidity.
The distribution is skewed-right which is desirable. With the mean of the distribution of 1.3°C, it shows a larger variable than 0.5° accuracy listed by each sensor. At the same time, this value reflected the nature of low-cost sensors. And thus, for scientific observation, those drawbacks should be noted and mitigation should be applied such as a denser grid of sensors or multiple sensors installed in one place.
With access to open API, we can query global forecasting systems such as the NOAA GFS model. To directly extract data from the GFS model output, it can be a daunting task for a limited storage and computing power. Alternatively, some website such as OpenWeatherMap.org or Darksky.net offers simple and programming friendly tools to get the time series with specific coordinates. Fig. 12 shows the actual measurement (highrise) with forecasting data such as with DarkSky and the data from the UNIS School website.
The pattern is clear here. The trend of daily averages between one in a highrise building with two-another forecasting data is matching. The absolute value, however, is distinctively different. The temperature from forecasting data is lower than the actual records. The difference can be attributed to the heat retention of the building making it slower to change with the environment temperature.
A 3.4° higher on average in buildings is an important outcome. This indicates that the temperature in the building is hotter than the forecast one.
Next, we will compare the data from Láng station operated the Vietnam Meteorological and Hydrological Administration from the upper-air dataset. In Fig. 15, the hourly data from MERRA-2 and observational values were compared. This dataset only contains two points a day, but the comparison already messy. In Figs. 16-18, some snapshots with a shorter period to compare the two datasets.
These three close snapshots indicated two sources of data in a close reading but can be distinctively different. The general trend is the Lang Station's reading is higher than the MERRA-2's. Fig. 19 confirmed this outcome by plotting the distribution of the difference in the reading of 2019.
Two Celsius degree difference is significant. The value is inlined with an urban heat island (UHI), in which the temperature in the city is warmer than the surrounding due to the surface property and energy uses in the city. However, it would be naive to assume that MERRA-2 has not taken account for UHI.
Finally, the three sets of data: local observation in a highrise building, a forecast data, and a reanalysis set are charted on one graph as shown in Figures 20-21. One drawback of this analysis is that I cannot specify the exact digital products from DarkSky.net. The full list of the data sources of this open API is here.