Toy weather data

Here is an example of how to easily manipulate a toy weather dataset using xray and other recommended Python libraries:

Shared setup:

import xray
import numpy as np
import pandas as pd
import seaborn as sns # pandas aware plotting library

np.random.seed(123)

times = pd.date_range('2000-01-01', '2001-12-31', name='time')
annual_cycle = np.sin(2 * np.pi * (times.dayofyear / 365.25 - 0.28))

base = 10 + 15 * annual_cycle.reshape(-1, 1)
tmin_values = base + 5 * np.random.randn(annual_cycle.size, 3)
tmax_values = base + 10 + 5 * np.random.randn(annual_cycle.size, 3)

ds = xray.Dataset({'tmin': (('time', 'location'), tmin_values),
                   'tmax': (('time', 'location'), tmax_values)},
                  {'time': times, 'location': ['IA', 'IN', 'IL']})

Examine a dataset with pandas and seaborn

In [1]: ds
Out[1]: 
<xray.Dataset>
Dimensions:   (location: 3, time: 731)
Coordinates:
  * location  (location) |S2 'IA' 'IN' 'IL'
  * time      (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04 ...
Data variables:
    tmax      (time, location) float64 18.15 2.038 7.818 -2.705 7.169 4.621 5.444 6.993 ...
    tmin      (time, location) float64 -10.21 0.2062 -3.366 -12.35 -7.715 3.435 -16.99 ...

In [2]: ds.to_dataframe().head()
Out[2]: 
                          tmax       tmin
location time                            
IA       2000-01-01  18.154567 -10.208631
         2000-01-02  -2.705392 -12.353746
         2000-01-03   5.444285 -16.993078
         2000-01-04  -0.255829  -9.226395
         2000-01-05  -2.067176   2.535651

In [3]: ds.to_dataframe().describe()
Out[3]: 
              tmax         tmin
count  2193.000000  2193.000000
mean     20.187129     9.965787
std      11.693882    11.610836
min      -9.250351   -19.214657
25%      10.294622     0.330388
50%      20.000428     9.655506
75%      30.017643    19.963175
max      48.870196    39.157475

In [4]: ds.mean(dim='location').to_dataframe().plot()
Out[4]: <matplotlib.axes._subplots.AxesSubplot at 0x7feebf842410>
../_images/examples_tmin_tmax_plot.png
In [5]: for var in ['tmin', 'tmax']:
   ...:     sns.kdeplot(ds[var].to_series())
   ...: 
../_images/examples_pairplot.png

Probability of freeze by calendar month

In [6]: freeze = (ds['tmin'] <= 0).groupby('time.month').mean('time')

In [7]: freeze
Out[7]: 
<xray.DataArray 'tmin' (month: 12, location: 3)>
array([[ 0.83870968,  0.72580645,  0.82258065],
       [ 0.63157895,  0.70175439,  0.66666667],
       [ 0.27419355,  0.19354839,  0.20967742],
       ..., 
       [ 0.09677419,  0.0483871 ,  0.01612903],
       [ 0.35      ,  0.4       ,  0.3       ],
       [ 0.74193548,  0.72580645,  0.67741935]])
Coordinates:
  * month     (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
  * location  (location) |S2 'IA' 'IN' 'IL'

In [8]: freeze.to_pandas().T.plot()
Out[8]: <matplotlib.axes._subplots.AxesSubplot at 0x7feeba9dd050>
../_images/examples_freeze_prob.png

Monthly averaging

In [9]: monthly_avg = ds.resample('1MS', dim='time', how='mean')

In [10]: monthly_avg.sel(location='IA').to_dataframe().plot(style='s-')
Out[10]: <matplotlib.axes._subplots.AxesSubplot at 0x7feeba478dd0>
../_images/examples_tmin_tmax_plot_mean.png

Note that MS here refers to Month-Start; M labels Month-End (the last day of the month).

Calculate monthly anomalies

In climatology, “anomalies” refer to the difference between observations and typical weather for a particular season. Unlike observations, anomalies should not show any seasonal cycle.

In [11]: climatology = ds.groupby('time.month').mean('time')

In [12]: anomalies = ds.groupby('time.month') - climatology

In [13]: anomalies.mean('location').to_dataframe()[['tmin', 'tmax']].plot()
Out[13]: <matplotlib.axes._subplots.AxesSubplot at 0x7feeba454690>
../_images/examples_anomalies_plot.png