Working with weather data¶
Here is an example of how to easily manipulate a toy weather dataset using xray and other recommended Python libraries:
Shared setup:
import xray
import numpy as np
import pandas as pd
import seaborn as sns # pandas aware plotting library
np.random.seed(123)
times = pd.date_range('2000-01-01', '2001-12-31', name='time')
annual_cycle = np.sin(2 * np.pi * (times.dayofyear / 365.25 - 0.28))
base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 5 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 5 * np.random.randn(annual_cycle.size, 3)
ds = xray.Dataset({'tmin': (('time', 'location'), tmin_values),
'tmax': (('time', 'location'), tmax_values)},
{'time': times, 'location': ['IA', 'IN', 'IL']})
Examine a dataset with pandas and seaborn¶
In [1]: ds
Out[1]:
<xray.Dataset>
Dimensions: (location: 3, time: 731)
Coordinates:
* location (location) |S2 'IA' 'IN' 'IL'
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 ...
Variables:
tmax (time, location) float64 18.15 2.038 7.818 -2.705 7.169 4.621 5.444 6.993 ...
tmin (time, location) float64 -10.21 0.2062 -3.366 -12.35 -7.715 3.435 -16.99 ...
In [2]: ds.to_dataframe().head()
Out[2]:
tmax tmin
location time
IA 2000-01-01 18.154567 -10.208631
2000-01-02 -2.705392 -12.353746
2000-01-03 5.444285 -16.993078
2000-01-04 -0.255829 -9.226395
2000-01-05 -2.067176 2.535651
In [3]: ds.to_dataframe().describe()
Out[3]:
tmax tmin
count 2193.000000 2193.000000
mean 20.187129 9.965787
std 11.693882 11.610836
min -9.250351 -19.214657
25% 10.294622 0.330388
50% 20.000428 9.655506
75% 30.017643 19.963175
max 48.870196 39.157475
In [4]: ds.mean(dim='location').to_dataframe().plot()
Out[4]: <matplotlib.axes._subplots.AxesSubplot at 0x7fd4e2012a10>
In [5]: sns.pairplot(ds[['tmin', 'tmax', 'time.month']].to_dataframe(),
...: vars=ds.vars, hue='time.month')
...:
Out[5]: <seaborn.axisgrid.PairGrid at 0x7fd4e1a49510>
Probability of freeze by calendar month¶
In [6]: freeze = (ds['tmin'] <= 0).groupby('time.month').mean('time')
In [7]: freeze
Out[7]:
<xray.DataArray 'tmin' (location: 3, time.month: 12)>
array([[ 0.83870968, 0.63157895, 0.27419355, ..., 0.09677419,
0.35 , 0.74193548],
[ 0.72580645, 0.70175439, 0.19354839, ..., 0.0483871 ,
0.4 , 0.72580645],
[ 0.82258065, 0.66666667, 0.20967742, ..., 0.01612903,
0.3 , 0.67741935]])
Coordinates:
* location (location) |S2 'IA' 'IN' 'IL'
* time.month (time.month) int64 1 2 3 4 5 6 7 8 9 10 11 12
In [8]: freeze.to_series().unstack('location').plot()
Out[8]: <matplotlib.axes._subplots.AxesSubplot at 0x7fd4e13cd950>
Monthly averaging¶
def year_month(xray_obj):
"""Given an xray object with a 'time' coordinate, return an DataArray
with values given by the first date of the month in which each time
falls.
"""
time = xray_obj.coords['time']
values = time.to_index().to_period('M').to_timestamp()
return xray.DataArray(values, [time], name='year_month')
In [9]: monthly_avg = ds.groupby(year_month(ds)).mean()
In [10]: monthly_avg.to_dataframe().plot(style='s-')
Out[10]: <matplotlib.axes._subplots.AxesSubplot at 0x7fd4f0167c90>
Calculate monthly anomalies¶
In climatology, “anomalies” refer to the difference between observations and typical weather for a particular season. Unlike observations, anomalies should not show any seasonal cycle.
In [11]: climatology = ds.groupby('time.month').mean('time')
In [12]: anomalies = ds.groupby('time.month') - climatology
In [13]: anomalies.mean('location').reset_coords(drop=True).to_dataframe().plot()
Out[13]: <matplotlib.axes._subplots.AxesSubplot at 0x7fd4e160ac50>