Working with temperature data sources for Hanoi, 2019

4. Results and Discussion

3.1 Low-cost sensor performance

The temperature data from the low-cost unit was first compared with duplicated DS18B20 sensors as shown in Figure 1. DS18B20 is a truly affordable, reliable, and robust sensor.

Fig. 1: Data temperature of 2019 measured by a low-cost DIY unit. The unit is mounted in a balcony in a high-rise building

Figure 1 offers a larger look at the monthly trend and fitted the temperature pattern in Hanoi. The Figs. 2-4 are cutouts of three occasions in 2019 to a closer looks on each reading, aggregated data such as hourly and daily averages.

Fig. 2: Temperature data in springtime, Hanoi 2019.
Fig. 3: Temperature data in summertime, Hanoi 2019
Fig. 4: Temperature data in wintertime, Hanoi 2019.

With low-cost sensors, one key disadvantage is unstable readings, in which the output, in this case, the temperature, can be yielded to default error value such as -127. This happened with weak electrical connections and temperature reading is not available in the sensor's buffer (memory), and hence a default code is sent. The second problem is the longevity of the sensor which is a headache for capacitive ones such as those measure humidity contents. With over one year of operation, DS18B20 has not shown such issues.

The reproducibility is another dimension with low-cost sensors, in which the sensor in the same type or measuring the same parameter should yield a similar reading. Fig. 5 shows aggregated data from two DS18B20 sensors for one week.

Fig. 5: Data from DS18B20 installed in duplication

Next, I show some analysis of the different readings in these two sensors. First, Fig. 6 shows five bins of differences in temperature readings with the x-axis label marked the average of each bin. The data taken this distribution is hourly average and each hour included 12 readings. The data showed that two-third of readings is less than 0.4 ° differences. This is inlined with the specification of the manufacturer, in which DS18B20 has an accuracy of ±0.5°in -10 to +85°.

percent dist
Fig. 6: Distribution of the difference in reading between two DS18B20 sensors.

Of course, if we could dig into details, with open tools such as pandas, we can see the variation of two sensors, one type on temperature by the hour of the days, and by months as shown in Figs. 7-8.

Fig. 7: Difference in reading between two DS18B20 sensors by hours
Fig. 8: Difference in reading between two DS18B20 sensors by months

The outcome is little counter-intuitive. A higher temperature in the summer led to a lower difference but glaring sunlight in mid-afternoons appears to contribute to the larger difference in readings of the same type of the sensor.

In a sample box, I have another DS18B20 sensor installed about 2cm above the Raspberry Pi. The sample box has a fan installed on the rear end to draw out the air through the front grid window while the low-cost unit only has a window cut out from the plastic box. The Pi is a low-power Single-on-Chip (SoC) device, similar to ESP8266 which is the microcontroller for the low-cost weather unit, but much more powerful in computing power. One would hypothesize that the Pi would generate more heat and thus lead to a higher temperature. Fig. 9 presented data of the sensors on top of the Pi and two from the outside. The data does not support the hypothesis, possibly because the heat was drawn out by the fan, and Pi was running around 10% of CPU usage.

on RPI
Fig. 9: Temperature readings by DS18B20 sensors

Above, I analyzed the data by DS18B20 in detail. In today's market, DS18B20 is only one of many low-cost sensors available. Some of the popular sensors for temperature are DHT11, DHT22, BME280, SHTxx, HTU21, SHT7i. This class of sensor includes a capacitive instrument for measuring the humidity content and converted to the relative humidity.

Fig. 10: Temperature readings by low-cost sensors
diff sensor
Fig. 11: Distribution of the difference

The distribution is skewed-right which is desirable. With the mean of the distribution of 1.3°C, it shows a larger variable than 0.5° accuracy listed by each sensor. At the same time, this value reflected the nature of low-cost sensors. And thus, for scientific observation, those drawbacks should be noted and mitigation should be applied such as a denser grid of sensors or multiple sensors installed in one place.

3.2 Local measure vs. forecasting data

With access to open API, we can query global forecasting systems such as the NOAA GFS model. To directly extract data from the GFS model output, it can be a daunting task for a limited storage and computing power. Alternatively, some website such as or offers simple and programming friendly tools to get the time series with specific coordinates. Fig. 12 shows the actual measurement (highrise) with forecasting data such as with DarkSky and the data from the UNIS School website.

Fig. 12: Temperature in building vs. forecasting
local year
Fig. 13: Temperature in building vs. forecasting with the hourly average in the background

The pattern is clear here. The trend of daily averages between one in a highrise building with two-another forecasting data is matching. The absolute value, however, is distinctively different. The temperature from forecasting data is lower than the actual records. The difference can be attributed to the heat retention of the building making it slower to change with the environment temperature.

local forecast
Fig. 14: Distribution of the difference in temperature readings between in building vs. forecasting with the hourly average

A 3.4° higher on average in buildings is an important outcome. This indicates that the temperature in the building is hotter than the forecast one.

Next, we will compare the data from Láng station operated the Vietnam Meteorological and Hydrological Administration from the upper-air dataset. In Fig. 15, the hourly data from MERRA-2 and observational values were compared. This dataset only contains two points a day, but the comparison already messy. In Figs. 16-18, some snapshots with a shorter period to compare the two datasets.

Fig. 15: Observational temperature from Láng station is overlayed with reanalysis temperature in Hanoi area
lang 1902
Fig. 16: Observational temperature vs. reanalysis temperature in Hanoi area, February 2019
lang 1906
Fig. 17: Observational temperature vs. reanalysis temperature in Hanoi area, July 2019
lang 1912
Fig. 18: Observational temperature vs. reanalysis temperature in Hanoi area, December 2019

These three close snapshots indicated two sources of data in a close reading but can be distinctively different. The general trend is the Lang Station's reading is higher than the MERRA-2's. Fig. 19 confirmed this outcome by plotting the distribution of the difference in the reading of 2019.

dist. merra lang
Fig. 19: The difference of temperature data between MERRA-2 and Lang Station

Two Celsius degree difference is significant. The value is inlined with an urban heat island (UHI), in which the temperature in the city is warmer than the surrounding due to the surface property and energy uses in the city. However, it would be naive to assume that MERRA-2 has not taken account for UHI.

Finally, the three sets of data: local observation in a highrise building, a forecast data, and a reanalysis set are charted on one graph as shown in Figures 20-21. One drawback of this analysis is that I cannot specify the exact digital products from The full list of the data sources of this open API is here.

combine 2019
Fig. 20: A gallery of local, forecast, and reanalysis data of temperature in Hanoi, 2019.
Fig. 21: Distribution of temperature value between a forecast set, and reanalysis data in Hanoi Vietnam (2019-2020).