CO2 daily

Oct. 1, 2021

This series of CO2 data analysis was a part of my effort to interpreting CO2 data from lowcost sensors. The sensors are located in a high-rise building near the center of Hanoi.

In this post, we will work with ESRL/NOAA data taken daily from several sites over the world. The sample was taken daily using flask sampling. The site was selected to refect various conditions of high photosynthesis, on a tiny island, a reference station in Mauna Loa (Hawaii)...


In [1]:
import requests
import pandas as pd
import re
import datetime
import matplotlib.pyplot as plt
import numpy as np

Clean data

In [2]:
# to store location of the site
site_meta = dict()
In [3]:
def clean_flask_co2(url, verbose=False):
    '''return a dataframe of CO2 concentration'''
    
    res = requests.get(url)
    if res.status_code != 200:
        return "Bad request, check url"
    lines = res.text.split('\n')
    for i, line in enumerate(lines):
        if 'data_fields' in line:
            print(f'Header line: {i}')
            break
    cols = lines[i].split(':')[-1].split(r'\s+')[-1].strip().split(' ')
    if verbose:
        print(f'headers {cols}')
    data = lines[i+1:]
    data = [re.split(r'\s+', line) for line in data]
    df = pd.DataFrame(data=data, columns=cols)
    df.dropna(inplace=True)
    
    # get metadata
    _line = df.iloc[0]
    site= _line['sample_site_code']
    
    global site_meta
    
    site_data = {site:{
        'lat': _line['sample_latitude'],
        'lon': _line['sample_longitude'],
        'alt': _line['sample_altitude'],
        'ele': _line['sample_elevation'],
        'intake_h': _line['sample_intake_height']
    }}
    site_meta.update(site_data)
    
    df['time'] = df.apply(make_dt, axis=1)
    df = df[['time', 'analysis_value']]   
    df.columns = ['time', site]
    
    df[site] = pd.to_numeric(df[site])
    df.set_index('time', inplace=True)
    print(df.head())
    return df
In [4]:
def make_dt(row, verbose=False):
    '''convert time data to datetime object'''
    
    try:
        year = int(row['sample_year'])
        month =  int(row['sample_month'])
        day = int(row['sample_day'])
        hour = int(row['sample_hour'])
        minute = int(row['sample_minute'])
        seconds = int(row['sample_seconds'])
        dt = datetime.datetime(year, month, day, hour, minute, seconds)
        return dt
    except Exception as e:
        print(f'Exception converting datetime: {e}')
        print(row)
    return None

TOP

Get data

  • I am interested to sites close to Vietnam. Available sites for data download is available on gml.noaa.gov .
  • Here is a map to easier visualization ( map )
  • I select three sites for comparison:
    1. MLO (Hawaii) used as the global reference
    2. BKT (Indonesia) surrounded by forest, strongly influenced by photosynthesis
    3. DSI (Taiwan), is a middle of nowhere, on a tiny island
    4. AMY (Anmyeon-do, Korean): a site located near a big city
    5. NWR (Denver), located near big city and high altitude
In [5]:
sites = ['MLO', 'BKT', 'DSI', 'AMY', 'NWR']
In [6]:
# let get the data and store them in a dictionary with a the value is a dataframe
dfs = dict()
for site in sites:
    print(site)
    aurl = f'https://gml.noaa.gov/aftp/data/trace_gases/co2/flask/surface/co2_{site.lower()}_surface-flask_1_ccgg_event.txt'
    dfs[site] =  clean_flask_co2(aurl)
MLO
Header line: 70
                        MLO
time                       
1969-08-20 17:55:00  323.17
1969-08-20 17:55:00  324.72
1969-08-20 18:30:00  331.02
1969-08-20 18:30:00 -999.99
1969-08-27 19:15:00 -999.99
BKT
Header line: 69
                        BKT
time                       
2004-01-08 08:13:00  371.77
2004-01-08 08:13:00  372.04
2004-01-14 07:36:00  374.63
2004-01-14 07:36:00  375.01
2004-01-21 06:40:00  374.31
DSI
Header line: 68
                        DSI
time                       
2010-03-05 01:51:00  391.36
2010-03-05 01:51:00  391.25
2010-03-09 00:55:00  401.42
2010-03-09 00:55:00  401.40
2010-03-12 02:40:00  394.46
AMY
Header line: 69
                        AMY
time                       
2013-12-03 05:44:00  406.54
2013-12-03 05:44:00  406.59
2013-12-10 05:00:00  402.59
2013-12-10 05:00:00  402.63
2013-12-19 05:00:00  405.47
NWR
Header line: 69
                        NWR
time                       
1967-05-18 19:06:00  324.85
1967-05-18 19:06:00  324.73
1967-05-18 19:15:00  327.53
1967-05-18 19:15:00  329.40
1967-07-06 18:01:00  323.86
In [7]:
dfs.keys()
Out[7]:
dict_keys(['MLO', 'BKT', 'DSI', 'AMY', 'NWR'])
In [8]:
site_meta
Out[8]:
{'MLO': {'lat': '19.5300',
  'lon': '-155.5800',
  'alt': '3399.00',
  'ele': '3397.00',
  'intake_h': '2.00'},
 'BKT': {'lat': '-0.2000',
  'lon': '100.3200',
  'alt': '875.00',
  'ele': '845.00',
  'intake_h': '30.00'},
 'DSI': {'lat': '20.6992',
  'lon': '116.7297',
  'alt': '8.00',
  'ele': '3.00',
  'intake_h': '5.00'},
 'AMY': {'lat': '36.5389',
  'lon': '126.3295',
  'alt': '87.00',
  'ele': '47.00',
  'intake_h': '40.00'},
 'NWR': {'lat': '40.0500',
  'lon': '-105.6300',
  'alt': '3526.00',
  'ele': '3523.00',
  'intake_h': '3.00'}}

TOP

Site map

In [9]:
import folium
from folium.features import DivIcon
In [10]:
def add_marker(m, metadata):
    '''return each marker for each station'''
    for station, data in metadata.items():
        location = [data['lat'], data['lon']]
        
        tooltip = f"Elevation, m: {data['ele']}
Intake height, m:
{data['intake_h']}" folium.Circle(location=location, fill=True, radius = 200_000, tooltip=tooltip, color='maroon' ).add_to(m) folium.map.Marker( location=location, icon=DivIcon( icon_size=(150,36), icon_anchor=(0,0), html='%s' % station, ) ).add_to(m) return m
In [11]:
south = min([line['lat'] for line in site_meta.values()])
west = min([line['lon'] for line in site_meta.values()])
north = max([line['lat'] for line in site_meta.values()])
east = max([line['lon']  for line in site_meta.values()])
south, west, north, east
Out[11]:
('-0.2000', '-105.6300', '40.0500', '126.3295')
In [21]:
m = folium.Map(location=[45.5236, -122.6750])
m = add_marker(m, site_meta)
m.fit_bounds([[south, west], [north, east]])
m
Out[21]:
Make this Notebook Trusted to load map: File -> Trust Notebook

TOP

Concentration chart

In [14]:
dfs.keys()
Out[14]:
dict_keys(['MLO', 'BKT', 'DSI', 'AMY', 'NWR'])
In [15]:
year = 2015
for site, df in dfs.items():
    
    # clean up some outliner
    df.where(df[site] > 375, np.NaN, inplace=True)
    df.where(df[site] < 500, np.NaN, inplace=True)
    dfs[site] = df[df.index.year >= year]
#     dfs[site] = df[df.index.year <= 2020]
In [16]:
plt.style.use('default')
In [17]:
# data is showed up, but it is noisy

fig, ax = plt.subplots()
for site, df in dfs.items():
    ax.plot(df, label=site, lw=0.8)
ax.legend()
fig.autofmt_xdate()
In [18]:
plt.rcParams['font.family'] = 'monospace'
WHITE = "#FFFCFC"
In [19]:
add_text = "Data is provided by ESRL/NOAA.  The concentration of $\mathrm{CO_2}$ from the network is depended on location and altitude.\nThe bold lines show the moving average of 30-day (monthly) concentration. Visualization by Binh Nguyen, Oct. 2021."
In [20]:
import matplotlib.dates as mdates

colors = {
    'MLO': 'black',
    'BKT': 'darkgreen',
    'DSI': 'navy',
    'AMY': 'blue',
    'NWR': 'firebrick'
}
fig, ax = plt.subplots(figsize=(10,6))
fig.patch.set_facecolor(WHITE)
ax.set_facecolor(WHITE)
ax.set_title(add_text, fontsize='small')
fig.suptitle('Surface $\mathrm{CO_2}$ Concentration sampled by Flasks')
for site, df in dfs.items():
    ax.plot(df.rolling('1D').mean(), lw=0.8, alpha=0.3, color=colors[site])
    ax.plot(df.rolling('30D').mean(), label=site, lw=1.5, alpha=0.8, color=colors[site])
ax.legend()

ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('\n%Y'),)
ax.xaxis.set_minor_locator(mdates.MonthLocator(bymonth=[1, 4, 7, 10]))
ax.xaxis.set_minor_formatter(mdates.DateFormatter('%b'))

ax.tick_params(axis='both', direction='in', length=8,)
ax.tick_params(axis='both', direction='in', length=4, which='minor')
fig.tight_layout()
fig.savefig('co2.png', dpi=300)
  • This graph is not provided any insight yet to the question of CO2 trend in hourly resolution. It does suggest the CO2 concentration is varried accross location and season and over time.
  • We will continue with a higher resolution data (and a bigger file) to see what we can make sense of.

TOP

References

In [ ]:
 
get Jupyter Notebook: