Interpolating data

xarray offers flexible interpolation routines, which have a similar interface to our indexing.

Note

interp requires scipy installed.

Scalar and 1-dimensional interpolation

Interpolating a DataArray works mostly like labeled indexing of a DataArray,

In [1]: da = xr.DataArray(np.sin(0.3 * np.arange(12).reshape(4, 3)),
   ...:                   [('time', np.arange(4)),
   ...:                    ('space', [0.1, 0.2, 0.3])])
   ...: 

# label lookup
In [2]: da.sel(time=3)
Out[2]: 
<xarray.DataArray (space: 3)>
array([ 0.42738 ,  0.14112 , -0.157746])
Coordinates:
    time     int64 3
  * space    (space) float64 0.1 0.2 0.3

# interpolation
In [3]: da.interp(time=3.5)
Out[3]: 
<xarray.DataArray (space: 3)>
array([nan, nan, nan])
Coordinates:
  * space    (space) float64 0.1 0.2 0.3
    time     float64 3.5

Similar to the indexing, interp() also accepts an array-like, which gives the interpolated result as an array.

# label lookup
In [4]: da.sel(time=[2, 3])
Out[4]: 
<xarray.DataArray (time: 2, space: 3)>
array([[ 0.973848,  0.863209,  0.675463],
       [ 0.42738 ,  0.14112 , -0.157746]])
Coordinates:
  * time     (time) int64 2 3
  * space    (space) float64 0.1 0.2 0.3

# interpolation
In [5]: da.interp(time=[2.5, 3.5])
Out[5]: 
<xarray.DataArray (time: 2, space: 3)>
array([[0.700614, 0.502165, 0.258859],
       [     nan,      nan,      nan]])
Coordinates:
  * space    (space) float64 0.1 0.2 0.3
  * time     (time) float64 2.5 3.5

Note

Currently, our interpolation only works for regular grids. Therefore, similarly to sel(), only 1D coordinates along a dimension can be used as the original coordinate to be interpolated.

Multi-dimensional Interpolation

Like sel(), interp() accepts multiple coordinates. In this case, multidimensional interpolation is carried out.

# label lookup
In [6]: da.sel(time=2, space=0.1)
Out[6]: 
<xarray.DataArray ()>
array(0.973848)
Coordinates:
    time     int64 2
    space    float64 0.1

# interpolation
In [7]: da.interp(time=2.5, space=0.15)
Out[7]: 
<xarray.DataArray ()>
array(0.601389)
Coordinates:
    time     float64 2.5
    space    float64 0.15

Array-like coordinates are also accepted:

# label lookup
In [8]: da.sel(time=[2, 3], space=[0.1, 0.2])
Out[8]: 
<xarray.DataArray (time: 2, space: 2)>
array([[0.973848, 0.863209],
       [0.42738 , 0.14112 ]])
Coordinates:
  * time     (time) int64 2 3
  * space    (space) float64 0.1 0.2

# interpolation
In [9]: da.interp(time=[1.5, 2.5], space=[0.15, 0.25])
Out[9]: 
<xarray.DataArray (time: 2, space: 2)>
array([[0.888106, 0.867052],
       [0.601389, 0.380512]])
Coordinates:
  * time     (time) float64 1.5 2.5
  * space    (space) float64 0.15 0.25

Interpolation methods

We use scipy.interpolate.interp1d() for 1-dimensional interpolation and scipy.interpolate.interpn() for multi-dimensional interpolation.

The interpolation method can be specified by the optional method argument.

In [10]: da = xr.DataArray(np.sin(np.linspace(0, 2 * np.pi, 10)), dims='x',
   ....:                   coords={'x': np.linspace(0, 1, 10)})
   ....: 

In [11]: da.plot.line('o', label='original')
Out[11]: [<matplotlib.lines.Line2D at 0x7f1eec336e80>]

In [12]: da.interp(x=np.linspace(0, 1, 100)).plot.line(label='linear (default)')
Out[12]: [<matplotlib.lines.Line2D at 0x7f1eec2cf668>]

In [13]: da.interp(x=np.linspace(0, 1, 100), method='cubic').plot.line(label='cubic')
Out[13]: [<matplotlib.lines.Line2D at 0x7f1eec3366a0>]

In [14]: plt.legend()
Out[14]: <matplotlib.legend.Legend at 0x7f1f00cec0b8>
_images/interpolation_sample1.png

Additional keyword arguments can be passed to scipy’s functions.

# fill 0 for the outside of the original coordinates.
In [15]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={'fill_value': 0.0})
Out[15]: 
<xarray.DataArray (x: 10)>
array([ 0.      ,  0.      ,  0.      ,  0.813798,  0.604023, -0.604023,
       -0.813798,  0.      ,  0.      ,  0.      ])
Coordinates:
  * x        (x) float64 -0.5 -0.2778 -0.05556 0.1667 0.3889 0.6111 0.8333 ...

# extrapolation
In [16]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={'fill_value': 'extrapolate'})
Out[16]: 
<xarray.DataArray (x: 10)>
array([-2.892544, -1.606969, -0.321394,  0.813798,  0.604023, -0.604023,
       -0.813798,  0.321394,  1.606969,  2.892544])
Coordinates:
  * x        (x) float64 -0.5 -0.2778 -0.05556 0.1667 0.3889 0.6111 0.8333 ...

Advanced Interpolation

interp() accepts DataArray as similar to sel(), which enables us more advanced interpolation. Based on the dimension of the new coordinate passed to interp(), the dimension of the result are determined.

For example, if you want to interpolate a two dimensional array along a particular dimension, as illustrated below, you can pass two 1-dimensional DataArray s with a common dimension as new coordinate.

advanced indexing and interpolation

For example:

In [17]: da = xr.DataArray(np.sin(0.3 * np.arange(20).reshape(5, 4)),
   ....:                   [('x', np.arange(5)),
   ....:                    ('y', [0.1, 0.2, 0.3, 0.4])])
   ....: 

# advanced indexing
In [18]: x = xr.DataArray([0, 2, 4], dims='z')

In [19]: y = xr.DataArray([0.1, 0.2, 0.3], dims='z')

In [20]: da.sel(x=x, y=y)
Out[20]: 
<xarray.DataArray (z: 3)>
array([ 0.      ,  0.42738 , -0.772764])
Coordinates:
    x        (z) int64 0 2 4
    y        (z) float64 0.1 0.2 0.3
Dimensions without coordinates: z

# advanced interpolation
In [21]: x = xr.DataArray([0.5, 1.5, 2.5], dims='z')

In [22]: y = xr.DataArray([0.15, 0.25, 0.35], dims='z')

In [23]: da.interp(x=x, y=y)
Out[23]: 
<xarray.DataArray (z: 3)>
array([ 0.556264,  0.634961, -0.466433])
Coordinates:
    x        (z) float64 0.5 1.5 2.5
    y        (z) float64 0.15 0.25 0.35
Dimensions without coordinates: z

where values on the original coordinates (x, y) = ((0.5, 0.15), (1.5, 0.25), (2.5, 0.35)) are obtained by the 2-dimensional interpolation and mapped along a new dimension z.

If you want to add a coordinate to the new dimension z, you can supply DataArray s with a coordinate,

In [24]: x = xr.DataArray([0.5, 1.5, 2.5], dims='z', coords={'z': ['a', 'b','c']})

In [25]: y = xr.DataArray([0.15, 0.25, 0.35], dims='z',
   ....:                  coords={'z': ['a', 'b','c']})
   ....: 

In [26]: da.interp(x=x, y=y)
Out[26]: 
<xarray.DataArray (z: 3)>
array([ 0.556264,  0.634961, -0.466433])
Coordinates:
    x        (z) float64 0.5 1.5 2.5
    y        (z) float64 0.15 0.25 0.35
  * z        (z) <U1 'a' 'b' 'c'

For the details of the advanced indexing, see more advanced indexing.

Interpolating arrays with NaN

Our interp() works with arrays with NaN the same way that scipy.interpolate.interp1d and scipy.interpolate.interpn do. linear and nearest methods return arrays including NaN, while other methods such as cubic or quadratic return all NaN arrays.

In [27]: da = xr.DataArray([0, 2, np.nan, 3, 3.25], dims='x',
   ....:                   coords={'x': range(5)})
   ....: 

In [28]: da.interp(x=[0.5, 1.5, 2.5])
Out[28]: 
<xarray.DataArray (x: 3)>
array([ 1., nan, nan])
Coordinates:
  * x        (x) float64 0.5 1.5 2.5

In [29]: da.interp(x=[0.5, 1.5, 2.5], method='cubic')
Out[29]: 
<xarray.DataArray (x: 3)>
array([nan, nan, nan])
Coordinates:
  * x        (x) float64 0.5 1.5 2.5

To avoid this, you can drop NaN by dropna(), and then make the interpolation

In [30]: dropped = da.dropna('x')

In [31]: dropped
Out[31]: 
<xarray.DataArray (x: 4)>
array([0.  , 2.  , 3.  , 3.25])
Coordinates:
  * x        (x) int64 0 1 3 4

In [32]: dropped.interp(x=[0.5, 1.5, 2.5], method='cubic')
Out[32]: 
<xarray.DataArray (x: 3)>
array([1.190104, 2.507812, 2.929688])
Coordinates:
  * x        (x) float64 0.5 1.5 2.5

If NaNs are distributed rondomly in your multidimensional array, dropping all the columns containing more than one NaNs by dropna() may lose a significant amount of information. In such a case, you can fill NaN by interpolate_na(), which is similar to pandas.Series.interpolate().

In [33]: filled = da.interpolate_na(dim='x')

In [34]: filled
Out[34]: 
<xarray.DataArray (x: 5)>
array([0.  , 2.  , 2.5 , 3.  , 3.25])
Coordinates:
  * x        (x) int64 0 1 2 3 4

This fills NaN by interpolating along the specified dimension. After filling NaNs, you can interpolate:

In [35]: filled.interp(x=[0.5, 1.5, 2.5], method='cubic')
Out[35]: 
<xarray.DataArray (x: 3)>
array([1.308594, 2.316406, 2.738281])
Coordinates:
  * x        (x) float64 0.5 1.5 2.5

For the details of interpolate_na(), see Missing values.

Example

Let’s see how interp() works on real data.

# Raw data
In [36]: ds = xr.tutorial.load_dataset('air_temperature').isel(time=0)

In [37]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))

In [38]: ds.air.plot(ax=axes[0])
Out[38]: <matplotlib.collections.QuadMesh at 0x7f1ed672b1d0>

In [39]: axes[0].set_title('Raw data')
Out[39]: Text(0.5,1,'Raw data')

# Interpolated data
In [40]: new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims['lon'] * 4)

In [41]: new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims['lat'] * 4)

In [42]: dsi = ds.interp(lat=new_lat, lon=new_lon)

In [43]: dsi.air.plot(ax=axes[1])
Out[43]: <matplotlib.collections.QuadMesh at 0x7f1eef015940>

In [44]: axes[1].set_title('Interpolated data')
Out[44]: Text(0.5,1,'Interpolated data')
_images/interpolation_sample3.png

Our advanced interpolation can be used to remap the data to the new coordinate. Consider the new coordinates x and z on the two dimensional plane. The remapping can be done as follows

# new coordinate
In [45]: x = np.linspace(240, 300, 100)

In [46]: z = np.linspace(20, 70, 100)

# relation between new and original coordinates
In [47]: lat = xr.DataArray(z, dims=['z'], coords={'z': z})

In [48]: lon = xr.DataArray((x[:, np.newaxis]-270)/np.cos(z*np.pi/180)+270,
   ....:                    dims=['x', 'z'], coords={'x': x, 'z': z})
   ....: 

In [49]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))

In [50]: ds.air.plot(ax=axes[0])
Out[50]: <matplotlib.collections.QuadMesh at 0x7f1ed6712358>

# draw the new coordinate on the original coordinates.
In [51]: for idx in [0, 33, 66, 99]:
   ....:     axes[0].plot(lon.isel(x=idx), lat, '--k')
   ....: 

In [52]: for idx in [0, 33, 66, 99]:
   ....:     axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), '--k')
   ....: 

In [53]: axes[0].set_title('Raw data')
Out[53]: Text(0.5,1,'Raw data')

In [54]: dsi = ds.interp(lon=lon, lat=lat)

In [55]: dsi.air.plot(ax=axes[1])
Out[55]: <matplotlib.collections.QuadMesh at 0x7f1ed67cd400>

In [56]: axes[1].set_title('Remapped data')
Out[56]: Text(0.5,1,'Remapped data')
_images/interpolation_sample4.png