# Reshaping and reorganizing data¶

These methods allow you to reorganize

## Reordering dimensions¶

To reorder dimensions on a DataArray or across all variables on a Dataset, use transpose() or the .T property:

In [1]: ds = xr.Dataset({'foo': (('x', 'y', 'z'), [[[42]]]), 'bar': (('y', 'z'), [[24]])})

In [2]: ds.transpose('y', 'z', 'x')
Out[2]:
<xarray.Dataset>
Dimensions:  (x: 1, y: 1, z: 1)
Coordinates:
* x        (x) int64 0
* y        (y) int64 0
* z        (z) int64 0
Data variables:
foo      (y, z, x) int64 42
bar      (y, z) int64 24

In [3]: ds.T
Out[3]:
<xarray.Dataset>
Dimensions:  (x: 1, y: 1, z: 1)
Coordinates:
* x        (x) int64 0
* y        (y) int64 0
* z        (z) int64 0
Data variables:
foo      (z, y, x) int64 42
bar      (z, y) int64 24


## Converting between datasets and arrays¶

To convert from a Dataset to a DataArray, use to_array():

In [4]: arr = ds.to_array()

In [5]: arr
Out[5]:
<xarray.DataArray (variable: 2, x: 1, y: 1, z: 1)>
array([[[[42]]],

[[[24]]]])
Coordinates:
* y         (y) int64 0
* x         (x) int64 0
* z         (z) int64 0
* variable  (variable) |S3 'foo' 'bar'


This method broadcasts all data variables in the dataset against each other, then concatenates them along a new dimension into a new array while preserving coordinates.

To convert back from a DataArray to a Dataset, use to_dataset():

In [6]: arr.to_dataset(dim='variable')
Out[6]:
<xarray.Dataset>
Dimensions:  (x: 1, y: 1, z: 1)
Coordinates:
* y        (y) int64 0
* x        (x) int64 0
* z        (z) int64 0
Data variables:
foo      (x, y, z) int64 42
bar      (x, y, z) int64 24


The broadcasting behavior of to_array means that the resulting array includes the union of data variable dimensions:

In [7]: ds2 = xr.Dataset({'a': 0, 'b': ('x', [3, 4, 5])})

# the input dataset has 4 elements
In [8]: ds2
Out[8]:
<xarray.Dataset>
Dimensions:  (x: 3)
Coordinates:
* x        (x) int64 0 1 2
Data variables:
a        int64 0
b        (x) int64 3 4 5

# the resulting array has 6 elements
In [9]: ds2.to_array()
Out[9]:
<xarray.DataArray (variable: 2, x: 3)>
array([[0, 0, 0],
[3, 4, 5]])
Coordinates:
* variable  (variable) |S1 'a' 'b'
* x         (x) int64 0 1 2


Otherwise, the result could not be represented as an orthogonal array.

If you use to_dataset without supplying the dim argument, the DataArray will be converted into a Dataset of one variable:

In [10]: arr.to_dataset(name='combined')
Out[10]:
<xarray.Dataset>
Dimensions:   (variable: 2, x: 1, y: 1, z: 1)
Coordinates:
* y         (y) int64 0
* x         (x) int64 0
* z         (z) int64 0
* variable  (variable) |S3 'foo' 'bar'
Data variables:
combined  (variable, x, y, z) int64 42 24


## Stack and unstack¶

As part of xarray’s nascent support for pandas.MultiIndex, we have implemented stack() and unstack() method, for combining or splitting dimensions:

In [11]: array = xr.DataArray(np.random.randn(2, 3),
....:                      coords=[('x', ['a', 'b']), ('y', [0, 1, 2])])
....:

In [12]: stacked = array.stack(z=('x', 'y'))

In [13]: stacked
Out[13]:
<xarray.DataArray (z: 6)>
array([ 0.469, -0.283, -1.509, -1.136,  1.212, -0.173])
Coordinates:
* z        (z) object ('a', 0) ('a', 1) ('a', 2) ('b', 0) ('b', 1) ('b', 2)

In [14]: stacked.unstack('z')
Out[14]:
<xarray.DataArray (x: 2, y: 3)>
array([[ 0.469, -0.283, -1.509],
[-1.136,  1.212, -0.173]])
Coordinates:
* x        (x) object 'a' 'b'
* y        (y) int64 0 1 2


These methods are modeled on the pandas.DataFrame methods of the same name, although they in xarray they always create new dimensions rather than adding to the existing index or columns.

Like DataFrame.unstack, xarray’s unstack always succeeds, even if the multi-index being unstacked does not contain all possible levels. Missing levels are filled in with NaN in the resulting object:

In [15]: stacked2 = stacked[::2]

In [16]: stacked2
Out[16]:
<xarray.DataArray (z: 3)>
array([ 0.469, -1.509,  1.212])
Coordinates:
* z        (z) object ('a', 0) ('a', 2) ('b', 1)

In [17]: stacked2.unstack('z')
Out[17]:
<xarray.DataArray (x: 2, y: 3)>
array([[ 0.469,    nan, -1.509],
[   nan,  1.212,    nan]])
Coordinates:
* x        (x) object 'a' 'b'
* y        (y) int64 0 1 2


However, xarray’s stack has an important difference from pandas: unlike pandas, it does not automatically drop missing values. Compare:

In [18]: array = xr.DataArray([[np.nan, 1], [2, 3]], dims=['x', 'y'])

In [19]: array.stack(z=('x', 'y'))
Out[19]:
<xarray.DataArray (z: 4)>
array([ nan,   1.,   2.,   3.])
Coordinates:
* z        (z) object (0, 0) (0, 1) (1, 0) (1, 1)

In [20]: array.to_pandas().stack()
Out[20]:
x  y
0  1    1
1  0    2
1    3
dtype: float64


We departed from pandas’s behavior here because predictable shapes for new array dimensions is necessary for Out of core computation with dask.

## Shift and roll¶

To adjust coordinate labels, you can use the shift() and roll() methods:

In [21]: array = xr.DataArray([1, 2, 3, 4], dims='x')

In [22]: array.shift(x=2)
Out[22]:
<xarray.DataArray (x: 4)>
array([ nan,  nan,   1.,   2.])
Coordinates:
* x        (x) int64 0 1 2 3

In [23]: array.roll(x=2)
Out[23]:
<xarray.DataArray (x: 4)>
array([3, 4, 1, 2])
Coordinates:
* x        (x) int64 2 3 0 1