Data Structures¶
DataArray¶
xarray.DataArray
is xarray’s implementation of a labeled,
multi-dimensional array. It has several key properties:
values
: anumpy.ndarray
holding the array’s valuesdims
: dimension names for each axis (e.g.,('x', 'y', 'z')
)coords
: a dict-like container of arrays (coordinates) that label each point (e.g., 1-dimensional arrays of numbers, datetime objects or strings)attrs
: anOrderedDict
to hold arbitrary metadata (attributes)
xarray uses dims
and coords
to enable its core metadata aware operations.
Dimensions provide names that xarray uses instead of the axis
argument found
in many numpy functions. Coordinates enable fast label based indexing and
alignment, building on the functionality of the index
found on a pandas
DataFrame
or Series
.
DataArray objects also can have a name
and can hold arbitrary metadata in
the form of their attrs
property (an ordered dictionary). Names and
attributes are strictly for users and user-written code: xarray makes no attempt
to interpret them, and propagates them only in unambiguous cases (see FAQ,
What is your approach to metadata?).
Creating a DataArray¶
The DataArray
constructor takes:
data
: a multi-dimensional array of values (e.g., a numpy ndarray,Series
,DataFrame
orPanel
)coords
: a list or dictionary of coordinatesdims
: a list of dimension names. If omitted, dimension names are taken fromcoords
if possible.attrs
: a dictionary of attributes to add to the instancename
: a string that names the instance
In [1]: data = np.random.rand(4, 3)
In [2]: locs = ['IA', 'IL', 'IN']
In [3]: times = pd.date_range('2000-01-01', periods=4)
In [4]: foo = xr.DataArray(data, coords=[times, locs], dims=['time', 'space'])
In [5]: foo
Out[5]:
<xarray.DataArray (time: 4, space: 3)>
array([[ 0.12697 , 0.966718, 0.260476],
[ 0.897237, 0.37675 , 0.336222],
[ 0.451376, 0.840255, 0.123102],
[ 0.543026, 0.373012, 0.447997]])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
* space (space) <U2 'IA' 'IL' 'IN'
Only data
is required; all of other arguments will be filled
in with default values:
In [6]: xr.DataArray(data)
Out[6]:
<xarray.DataArray (dim_0: 4, dim_1: 3)>
array([[ 0.12697 , 0.966718, 0.260476],
[ 0.897237, 0.37675 , 0.336222],
[ 0.451376, 0.840255, 0.123102],
[ 0.543026, 0.373012, 0.447997]])
Dimensions without coordinates: dim_0, dim_1
As you can see, dimension names are always present in the xarray data model: if
you do not provide them, defaults of the form dim_N
will be created.
However, coordinates are always optional, and dimensions do not have automatic
coordinate labels.
Note
This is different from pandas, where axes always have tick labels, which
default to the integers [0, ..., n-1]
.
Prior to xarray v0.9, xarray copied this behavior: default coordinates for each dimension would be created if coordinates were not supplied explicitly. This is no longer the case.
Coordinates can be specified in the following ways:
- A list of values with length equal to the number of dimensions, providing
coordinate labels for each dimension. Each value must be of one of the
following forms:
- A
DataArray
orVariable
- A tuple of the form
(dims, data[, attrs])
, which is converted into arguments forVariable
- A pandas object or scalar value, which is converted into a
DataArray
- A 1D array or list, which is interpreted as values for a one dimensional coordinate variable along the same dimension as it’s name
- A
- A dictionary of
{coord_name: coord}
where values are of the same form as the list. Supplying coordinates as a dictionary allows other coordinates than those corresponding to dimensions (more on these later). If you supplycoords
as a dictionary, you must explicitly providedims
.
As a list of tuples:
In [7]: xr.DataArray(data, coords=[('time', times), ('space', locs)])
Out[7]:
<xarray.DataArray (time: 4, space: 3)>
array([[ 0.12697 , 0.966718, 0.260476],
[ 0.897237, 0.37675 , 0.336222],
[ 0.451376, 0.840255, 0.123102],
[ 0.543026, 0.373012, 0.447997]])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
* space (space) <U2 'IA' 'IL' 'IN'
As a dictionary:
In [8]: xr.DataArray(data, coords={'time': times, 'space': locs, 'const': 42,
...: 'ranking': ('space', [1, 2, 3])},
...: dims=['time', 'space'])
...:
Out[8]:
<xarray.DataArray (time: 4, space: 3)>
array([[ 0.12697 , 0.966718, 0.260476],
[ 0.897237, 0.37675 , 0.336222],
[ 0.451376, 0.840255, 0.123102],
[ 0.543026, 0.373012, 0.447997]])
Coordinates:
const int64 42
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
ranking (space) int64 1 2 3
* space (space) <U2 'IA' 'IL' 'IN'
As a dictionary with coords across multiple dimensions:
In [9]: xr.DataArray(data, coords={'time': times, 'space': locs, 'const': 42,
...: 'ranking': (('time', 'space'), np.arange(12).reshape(4,3))},
...: dims=['time', 'space'])
...:
Out[9]:
<xarray.DataArray (time: 4, space: 3)>
array([[ 0.12697 , 0.966718, 0.260476],
[ 0.897237, 0.37675 , 0.336222],
[ 0.451376, 0.840255, 0.123102],
[ 0.543026, 0.373012, 0.447997]])
Coordinates:
const int64 42
* time (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03 2000-01-04
ranking (time, space) int64 0 1 2 3 4 5 6 7 8 9 10 11
* space (space) <U2 'IA' 'IL' 'IN'
If you create a DataArray
by supplying a pandas
Series
, DataFrame
or
Panel
, any non-specified arguments in the
DataArray
constructor will be filled in from the pandas object:
In [10]: df = pd.DataFrame({'x': [0, 1], 'y': [2, 3]}, index=['a', 'b'])
In [11]: df.index.name = 'abc'
In [12]: df.columns.name = 'xyz'
In [13]: df
Out[13]:
xyz x y
abc
a 0 2
b 1 3
In [14]: xr.DataArray(df)