xarray.Dataset.reindex

Dataset.reindex(self, indexers: Mapping[Hashable, Any] = None, method: str = None, tolerance: numbers.Number = None, copy: bool = True, fill_value: Any = <NA>, **indexers_kwargs: Any) → 'Dataset'

Conform this object onto a new set of indexes, filling in missing values with fill_value. The default fill value is NaN.

Parameters
  • indexers (dict. optional) – Dictionary with keys given by dimension names and values given by arrays of coordinates tick labels. Any mis-matched coordinate values will be filled in with NaN, and any mis-matched dimension names will simply be ignored. One of indexers or indexers_kwargs must be provided.

  • method ({None, 'nearest', 'pad'/'ffill', 'backfill'/'bfill'}, optional) –

    Method to use for filling index values in indexers not found in this dataset:

    • None (default): don’t fill gaps

    • pad / ffill: propagate last valid index value forward

    • backfill / bfill: propagate next valid index value backward

    • nearest: use nearest valid index value

  • tolerance (optional) – Maximum distance between original and new labels for inexact matches. The values of the index at the matching locations must satisfy the equation abs(index[indexer] - target) <= tolerance.

  • copy (bool, optional) – If copy=True, data in the return value is always copied. If copy=False and reindexing is unnecessary, or can be performed with only slice operations, then the output may share memory with the input. In either case, a new xarray object is always returned.

  • fill_value (scalar, optional) – Value to use for newly missing values

  • sparse (use sparse-array. By default, False) –

  • **indexers_kwargs ({dim: indexer, ..}, optional) – Keyword arguments in the same form as indexers. One of indexers or indexers_kwargs must be provided.

Returns

reindexed – Another dataset, with this dataset’s data but replaced coordinates.

Return type

Dataset

Examples

Create a dataset with some fictional data.

>>> import xarray as xr
>>> import pandas as pd
>>> x = xr.Dataset(
...     {
...         "temperature": ("station", 20 * np.random.rand(4)),
...         "pressure": ("station", 500 * np.random.rand(4))
...     },
...     coords={"station": ["boston", "nyc", "seattle", "denver"]})
>>> x
<xarray.Dataset>
Dimensions:      (station: 4)
Coordinates:
* station      (station) <U7 'boston' 'nyc' 'seattle' 'denver'
Data variables:
    temperature  (station) float64 18.84 14.59 19.22 17.16
    pressure     (station) float64 324.1 194.3 122.8 244.3
>>> x.indexes
station: Index(['boston', 'nyc', 'seattle', 'denver'], dtype='object', name='station')

Create a new index and reindex the dataset. By default values in the new index that do not have corresponding records in the dataset are assigned NaN.

>>> new_index = ['boston', 'austin', 'seattle', 'lincoln']
>>> x.reindex({'station': new_index})
<xarray.Dataset>
Dimensions:      (station: 4)
Coordinates:
* station      (station) object 'boston' 'austin' 'seattle' 'lincoln'
Data variables:
    temperature  (station) float64 18.84 nan 19.22 nan
    pressure     (station) float64 324.1 nan 122.8 nan

We can fill in the missing values by passing a value to the keyword fill_value.

>>> x.reindex({'station': new_index}, fill_value=0)
<xarray.Dataset>
Dimensions:      (station: 4)
Coordinates:
* station      (station) object 'boston' 'austin' 'seattle' 'lincoln'
Data variables:
    temperature  (station) float64 18.84 0.0 19.22 0.0
    pressure     (station) float64 324.1 0.0 122.8 0.0

Because the index is not monotonically increasing or decreasing, we cannot use arguments to the keyword method to fill the NaN values.

>>> x.reindex({'station': new_index}, method='nearest')
Traceback (most recent call last):
...
    raise ValueError('index must be monotonic increasing or decreasing')
ValueError: index must be monotonic increasing or decreasing

To further illustrate the filling functionality in reindex, we will create a dataset with a monotonically increasing index (for example, a sequence of dates).

>>> x2 = xr.Dataset(
...     {
...         "temperature": ("time", [15.57, 12.77, np.nan, 0.3081, 16.59, 15.12]),
...         "pressure": ("time", 500 * np.random.rand(6))
...     },
...     coords={"time": pd.date_range('01/01/2019', periods=6, freq='D')})
>>> x2
<xarray.Dataset>
Dimensions:      (time: 6)
Coordinates:
* time         (time) datetime64[ns] 2019-01-01 2019-01-02 ... 2019-01-06
Data variables:
    temperature  (time) float64 15.57 12.77 nan 0.3081 16.59 15.12
    pressure     (time) float64 103.4 122.7 452.0 444.0 399.2 486.0

Suppose we decide to expand the dataset to cover a wider date range.

>>> time_index2 = pd.date_range('12/29/2018', periods=10, freq='D')
>>> x2.reindex({'time': time_index2})
<xarray.Dataset>
Dimensions:      (time: 10)
Coordinates:
* time         (time) datetime64[ns] 2018-12-29 2018-12-30 ... 2019-01-07
Data variables:
    temperature  (time) float64 nan nan nan 15.57 ... 0.3081 16.59 15.12 nan
    pressure     (time) float64 nan nan nan 103.4 ... 444.0 399.2 486.0 nan

The index entries that did not have a value in the original data frame (for example, 2018-12-29) are by default filled with NaN. If desired, we can fill in the missing values using one of several options.

For example, to back-propagate the last valid value to fill the NaN values, pass bfill as an argument to the method keyword.

>>> x3 = x2.reindex({'time': time_index2}, method='bfill')
>>> x3
<xarray.Dataset>
Dimensions:      (time: 10)
Coordinates:
* time         (time) datetime64[ns] 2018-12-29 2018-12-30 ... 2019-01-07
Data variables:
    temperature  (time) float64 15.57 15.57 15.57 15.57 ... 16.59 15.12 nan
    pressure     (time) float64 103.4 103.4 103.4 103.4 ... 399.2 486.0 nan

Please note that the NaN value present in the original dataset (at index value 2019-01-03) will not be filled by any of the value propagation schemes.

>>> x2.where(x2.temperature.isnull(), drop=True)
<xarray.Dataset>
Dimensions:      (time: 1)
Coordinates:
* time         (time) datetime64[ns] 2019-01-03
Data variables:
    temperature  (time) float64 nan
    pressure     (time) float64 452.0
>>> x3.where(x3.temperature.isnull(), drop=True)
<xarray.Dataset>
Dimensions:      (time: 2)
Coordinates:
* time         (time) datetime64[ns] 2019-01-03 2019-01-07
Data variables:
    temperature  (time) float64 nan nan
    pressure     (time) float64 452.0 nan

This is because filling while reindexing does not look at dataset values, but only compares the original and desired indexes. If you do want to fill in the NaN values present in the original dataset, use the fillna() method.