What’s New

Warning

Xarray plans to drop support for python 2.7 at the end of 2018. This means that new releases of xarray published after this date will only be installable on python 3+ environments, but older versions of xarray will always be available to python 2.7 users. For more information see the following references

v0.10.9 (21 September 2018)

This minor release contains a number of backwards compatible enhancements.

Announcements of note:

  • Xarray is now a NumFOCUS fiscally sponsored project! Read the anouncment for more details.
  • We have a new Development roadmap that outlines our future development plans.

Enhancements

  • differentiate() and differentiate() are newly added. (GH1332) By Keisuke Fujii.
  • Default colormap for sequential and divergent data can now be set via set_options() (GH2394) By Julius Busecke.
  • min_count option is newly supported in sum(), prod() and sum(), and prod(). (GH2230) By Keisuke Fujii.
  • plot() now accepts the kwargs xscale, yscale, xlim, ylim, xticks, yticks just like Pandas. Also xincrease=False, yincrease=False now use matplotlib’s axis inverting methods instead of setting limits. By Deepak Cherian. (GH2224)
  • DataArray coordinates and Dataset coordinates and data variables are now displayed as a b … y z rather than a b c d …. (GH1186) By Seth P.
  • A new CFTimeIndex-enabled cftime_range() function for use in generating dates from standard or non-standard calendars. By Spencer Clark.
  • When interpolating over a datetime64 axis, you can now provide a datetime string instead of a datetime64 object. E.g. da.interp(time='1991-02-01') (GH2284) By Deepak Cherian.
  • A clear error message is now displayed if a set or dict is passed in place of an array (GH2331) By Maximilian Roos.
  • Applying unstack to a large DataArray or Dataset is now much faster if the MultiIndex has not been modified after stacking the indices. (GH1560) By Maximilian Maahn.
  • You can now control whether or not to offset the coordinates when using the roll method and the current behavior, coordinates rolled by default, raises a deprecation warning unless explicitly setting the keyword argument. (GH1875) By Andrew Huang.
  • You can now call unstack without arguments to unstack every MultiIndex in a DataArray or Dataset. By Julia Signell.
  • Added the ability to pass a data kwarg to copy to create a new object with the same metadata as the original object but using new values. By Julia Signell.

Bug fixes

  • xarray.plot.imshow() correctly uses the origin argument. (GH2379) By Deepak Cherian.
  • Fixed DataArray.to_iris() failure while creating DimCoord by falling back to creating AuxCoord. Fixed dependency on var_name attribute being set. (GH2201) By Thomas Voigt.
  • Fixed a bug in zarr backend which prevented use with datasets with invalid chunk size encoding after reading from an existing store (GH2278). By Joe Hamman.
  • Tests can be run in parallel with pytest-xdist By Tony Tung.
  • Follow up the renamings in dask; from dask.ghost to dask.overlap By Keisuke Fujii.
  • Now raises a ValueError when there is a conflict between dimension names and level names of MultiIndex. (GH2299) By Keisuke Fujii.
  • Follow up the renamings in dask; from dask.ghost to dask.overlap By Keisuke Fujii.
  • Now xr.apply_ufunc() raises a ValueError when the size of input_core_dims is inconsistent with the number of arguments. (GH2341) By Keisuke Fujii.
  • Fixed Dataset.filter_by_attrs() behavior not matching netCDF4.Dataset.get_variables_by_attributes(). When more than one key=value is passed into Dataset.filter_by_attrs() it will now return a Dataset with variables which pass all the filters. (GH2315) By Andrew Barna.

v0.10.8 (18 July 2018)

Breaking changes

  • Xarray no longer supports python 3.4. Additionally, the minimum supported versions of the following dependencies has been updated and/or clarified:

    • Pandas: 0.18 -> 0.19
    • NumPy: 1.11 -> 1.12
    • Dask: 0.9 -> 0.16
    • Matplotlib: unspecified -> 1.5

    (GH2204). By Joe Hamman.

Enhancements

Bug fixes

v0.10.7 (7 June 2018)

Enhancements

Bug fixes

  • Fixed a bug in rasterio backend which prevented use with distributed. The rasterio backend now returns pickleable objects (GH2021). By Joe Hamman.

v0.10.6 (31 May 2018)

The minor release includes a number of bug-fixes and backwards compatible enhancements.

Enhancements

  • New PseudoNetCDF backend for many Atmospheric data formats including GEOS-Chem, CAMx, NOAA arlpacked bit and many others. See Formats supported by PseudoNetCDF for more details. By Barron Henderson.
  • The Dataset constructor now aligns DataArray arguments in data_vars to indexes set explicitly in coords, where previously an error would be raised. (GH674) By Maximilian Roos.
  • sel(), isel() & reindex(), (and their Dataset counterparts) now support supplying a dict as a first argument, as an alternative to the existing approach of supplying kwargs. This allows for more robust behavior of dimension names which conflict with other keyword names, or are not strings. By Maximilian Roos.
  • rename() now supports supplying **kwargs, as an alternative to the existing approach of supplying a dict as the first argument. By Maximilian Roos.
  • cumsum() and cumprod() now support aggregation over multiple dimensions at the same time. This is the default behavior when dimensions are not specified (previously this raised an error). By Stephan Hoyer
  • DataArray.dot() and dot() are partly supported with older dask<0.17.4. (related to GH2203) By Keisuke Fujii.
  • Xarray now uses Versioneer to manage its version strings. (GH1300). By Joe Hamman.

Bug fixes

  • Fixed a regression in 0.10.4, where explicitly specifying dtype='S1' or dtype=str in encoding with to_netcdf() raised an error (GH2149). Stephan Hoyer
  • apply_ufunc() now directly validates output variables (GH1931). By Stephan Hoyer.
  • Fixed a bug where to_netcdf(..., unlimited_dims='bar') yielded NetCDF files with spurious 0-length dimensions (i.e. b, a, and r) (GH2134). By Joe Hamman.
  • Removed spurious warnings with Dataset.update(Dataset) (GH2161) and array.equals(array) when array contains NaT (GH2162). By Stephan Hoyer.
  • Aggregations with Dataset.reduce() (including mean, sum, etc) no longer drop unrelated coordinates (GH1470). Also fixed a bug where non-scalar data-variables that did not include the aggregation dimension were improperly skipped. By Stephan Hoyer
  • Fix stack() with non-unique coordinates on pandas 0.23 (GH2160). By Stephan Hoyer
  • Selecting data indexed by a length-1 CFTimeIndex with a slice of strings now behaves as it does when using a length-1 DatetimeIndex (i.e. it no longer falsely returns an empty array when the slice includes the value in the index) (GH2165). By Spencer Clark.
  • Fix DataArray.groupby().reduce() mutating coordinates on the input array when grouping over dimension coordinates with duplicated entries (GH2153). By Stephan Hoyer
  • Fix Dataset.to_netcdf() cannot create group with engine="h5netcdf" (GH2177). By Stephan Hoyer

v0.10.4 (16 May 2018)

The minor release includes a number of bug-fixes and backwards compatible enhancements. A highlight is CFTimeIndex, which offers support for non-standard calendars used in climate modeling.

Documentation

Enhancements

  • Add an option for using a CFTimeIndex for indexing times with non-standard calendars and/or outside the Timestamp-valid range; this index enables a subset of the functionality of a standard pandas.DatetimeIndex. See Non-standard calendars and dates outside the Timestamp-valid range for full details. (GH789, GH1084, GH1252) By Spencer Clark with help from Stephan Hoyer.
  • Allow for serialization of cftime.datetime objects (GH789, GH1084, GH2008, GH1252) using the standalone cftime library. By Spencer Clark.
  • Support writing lists of strings as netCDF attributes (GH2044). By Dan Nowacki.
  • to_netcdf() with engine='h5netcdf' now accepts h5py encoding settings compression and compression_opts, along with the NetCDF4-Python style settings gzip=True and complevel. This allows using any compression plugin installed in hdf5, e.g. LZF (GH1536). By Guido Imperiale.
  • dot() on dask-backed data will now call dask.array.einsum(). This greatly boosts speed and allows chunking on the core dims. The function now requires dask >= 0.17.3 to work on dask-backed data (GH2074). By Guido Imperiale.
  • plot.line() learned new kwargs: xincrease, yincrease that change the direction of the respective axes. By Deepak Cherian.
  • Added the parallel option to open_mfdataset(). This option uses dask.delayed to parallelize the open and preprocessing steps within open_mfdataset. This is expected to provide performance improvements when opening many files, particularly when used in conjunction with dask’s multiprocessing or distributed schedulers (GH1981). By Joe Hamman.
  • New compute option in to_netcdf(), to_zarr(), and save_mfdataset() to allow for the lazy computation of netCDF and zarr stores. This feature is currently only supported by the netCDF4 and zarr backends. (GH1784). By Joe Hamman.

Bug fixes

v0.10.3 (13 April 2018)

The minor release includes a number of bug-fixes and backwards compatible enhancements.

Enhancements

  • isin() and isin() methods, which test each value in the array for whether it is contained in the supplied list, returning a bool array. See Selecting values with isin for full details. Similar to the np.isin function. By Maximilian Roos.
  • Some speed improvement to construct DataArrayRolling object (GH1993) By Keisuke Fujii.
  • Handle variables with different values for missing_value and _FillValue by masking values for both attributes; previously this resulted in a ValueError. (GH2016) By Ryan May.

Bug fixes

  • Fixed decode_cf function to operate lazily on dask arrays (GH1372). By Ryan Abernathey.
  • Fixed labeled indexing with slice bounds given by xarray objects with datetime64 or timedelta64 dtypes (GH1240). By Stephan Hoyer.
  • Attempting to convert an xarray.Dataset into a numpy array now raises an informative error message. By Stephan Hoyer.
  • Fixed a bug in decode_cf_datetime where int32 arrays weren’t parsed correctly (GH2002). By Fabien Maussion.
  • When calling xr.auto_combine() or xr.open_mfdataset() with a concat_dim, the resulting dataset will have that one-element dimension (it was silently dropped, previously) (GH1988). By Ben Root.

v0.10.2 (13 March 2018)

The minor release includes a number of bug-fixes and enhancements, along with one possibly backwards incompatible change.

Backwards incompatible changes

  • The addition of __array_ufunc__ for xarray objects (see below) means that NumPy ufunc methods (e.g., np.add.reduce) that previously worked on xarray.DataArray objects by converting them into NumPy arrays will now raise NotImplementedError instead. In all cases, the work-around is simple: convert your objects explicitly into NumPy arrays before calling the ufunc (e.g., with .values).

Enhancements

  • Added dot(), equivalent to np.einsum(). Also, dot() now supports dims option, which specifies the dimensions to sum over. (GH1951) By Keisuke Fujii.

  • Support for writing xarray datasets to netCDF files (netcdf4 backend only) when using the dask.distributed scheduler (GH1464). By Joe Hamman.

  • Support lazy vectorized-indexing. After this change, flexible indexing such as orthogonal/vectorized indexing, becomes possible for all the backend arrays. Also, lazy transpose is now also supported. (GH1897) By Keisuke Fujii.

  • Implemented NumPy’s __array_ufunc__ protocol for all xarray objects (GH1617). This enables using NumPy ufuncs directly on xarray.Dataset objects with recent versions of NumPy (v1.13 and newer):

    In [1]: ds = xr.Dataset({'a': 1})
    
    In [2]: np.sin(ds)
    Out[2]: 
    <xarray.Dataset>
    Dimensions:  ()
    Data variables:
        a        float64 0.8415
    

    This obliviates the need for the xarray.ufuncs module, which will be deprecated in the future when xarray drops support for older versions of NumPy. By Stephan Hoyer.

  • Improve rolling() logic. DataArrayRolling() object now supports construct() method that returns a view of the DataArray / Dataset object with the rolling-window dimension added to the last axis. This enables more flexible operation, such as strided rolling, windowed rolling, ND-rolling, short-time FFT and convolution. (GH1831, GH1142, GH819) By Keisuke Fujii.

  • line() learned to make plots with data on x-axis if so specified. (GH575) By Deepak Cherian.

Bug fixes

v0.10.1 (25 February 2018)

The minor release includes a number of bug-fixes and backwards compatible enhancements.

Documentation

Enhancements

New functions and methods:

Plotting enhancements:

Other enhancements:

  • Reduce methods such as DataArray.sum() now handles object-type array.

    In [3]: da = xr.DataArray(np.array([True, False, np.nan], dtype=object), dims='x')
    
    In [4]: da.sum()
    Out[4]: 
    <xarray.DataArray ()>
    array(1)
    

    (GH1866) By Keisuke Fujii.

  • Reduce methods such as DataArray.sum() now accepts dtype arguments. (GH1838) By Keisuke Fujii.

  • Added nodatavals attribute to DataArray when using open_rasterio(). (GH1736). By Alan Snow.

  • Use pandas.Grouper class in xarray resample methods rather than the deprecated pandas.TimeGrouper class (GH1766). By Joe Hamman.

  • Experimental support for parsing ENVI metadata to coordinates and attributes in xarray.open_rasterio(). By Matti Eskelinen.

  • Reduce memory usage when decoding a variable with a scale_factor, by converting 8-bit and 16-bit integers to float32 instead of float64 (PR1840), and keeping float16 and float32 as float32 (GH1842). Correspondingly, encoded variables may also be saved with a smaller dtype. By Zac Hatfield-Dodds.

  • Speed of reindexing/alignment with dask array is orders of magnitude faster when inserting missing values (GH1847). By Stephan Hoyer.

  • Fix axis keyword ignored when applying np.squeeze to DataArray (GH1487). By Florian Pinault.

  • netcdf4-python has moved the its time handling in the netcdftime module to a standalone package (netcdftime). As such, xarray now considers netcdftime an optional dependency. One benefit of this change is that it allows for encoding/decoding of datetimes with non-standard calendars without the netcdf4-python dependency (GH1084). By Joe Hamman.

New functions/methods

Bug fixes

  • Rolling aggregation with center=True option now gives the same result with pandas including the last element (GH1046). By Keisuke Fujii.
  • Support indexing with a 0d-np.ndarray (GH1921). By Keisuke Fujii.
  • Added warning in api.py of a netCDF4 bug that occurs when the filepath has 88 characters (GH1745). By Liam Brannigan.
  • Fixed encoding of multi-dimensional coordinates in to_netcdf() (GH1763). By Mike Neish.
  • Fixed chunking with non-file-based rasterio datasets (GH1816) and refactored rasterio test suite. By Ryan Abernathey
  • Bug fix in open_dataset(engine=’pydap’) (GH1775) By Keisuke Fujii.
  • Bug fix in vectorized assignment (GH1743, GH1744). Now item assignment to __setitem__() checks
  • Bug fix in vectorized assignment (GH1743, GH1744). Now item assignment to DataArray.__setitem__() checks coordinates of target, destination and keys. If there are any conflict among these coordinates, IndexError will be raised. By Keisuke Fujii.
  • Properly point DataArray.__dask_scheduler__() to dask.threaded.get. By Matthew Rocklin.
  • Bug fixes in DataArray.plot.imshow(): all-NaN arrays and arrays with size one in some dimension can now be plotted, which is good for exploring satellite imagery (GH1780). By Zac Hatfield-Dodds.
  • Fixed UnboundLocalError when opening netCDF file (GH1781). By Stephan Hoyer.
  • The variables, attrs, and dimensions properties have been deprecated as part of a bug fix addressing an issue where backends were unintentionally loading the datastores data and attributes repeatedly during writes (GH1798). By Joe Hamman.
  • Compatibility fixes to plotting module for Numpy 1.14 and Pandas 0.22 (GH1813). By Joe Hamman.
  • Bug fix in encoding coordinates with {'_FillValue': None} in netCDF metadata (GH1865). By Chris Roth.
  • Fix indexing with lists for arrays loaded from netCDF files with engine='h5netcdf (GH1864). By Stephan Hoyer.
  • Corrected a bug with incorrect coordinates for non-georeferenced geotiff files (GH1686). Internally, we now use the rasterio coordinate transform tool instead of doing the computations ourselves. A parse_coordinates kwarg has beed added to open_rasterio() (set to True per default). By Fabien Maussion.
  • The colors of discrete colormaps are now the same regardless if seaborn is installed or not (GH1896). By Fabien Maussion.
  • Fixed dtype promotion rules in where() and concat() to match pandas (GH1847). A combination of strings/numbers or unicode/bytes now promote to object dtype, instead of strings or unicode. By Stephan Hoyer.
  • Fixed bug where isnull() was loading data stored as dask arrays (GH1937). By Joe Hamman.

v0.10.0 (20 November 2017)

This is a major release that includes bug fixes, new features and a few backwards incompatible changes. Highlights include:

  • Indexing now supports broadcasting over dimensions, similar to NumPy’s vectorized indexing (but better!).
  • resample() has a new groupby-like API like pandas.
  • apply_ufunc() facilitates wrapping and parallelizing functions written for NumPy arrays.
  • Performance improvements, particularly for dask and open_mfdataset().

Breaking changes

  • xarray now supports a form of vectorized indexing with broadcasting, where the result of indexing depends on dimensions of indexers, e.g., array.sel(x=ind) with ind.dims == ('y',). Alignment between coordinates on indexed and indexing objects is also now enforced. Due to these changes, existing uses of xarray objects to index other xarray objects will break in some cases.

    The new indexing API is much more powerful, supporting outer, diagonal and vectorized indexing in a single interface. The isel_points and sel_points methods are deprecated, since they are now redundant with the isel / sel methods. See Vectorized Indexing for the details (GH1444, GH1436). By Keisuke Fujii and Stephan Hoyer.

  • A new resampling interface to match pandas’ groupby-like API was added to Dataset.resample() and DataArray.resample() (GH1272). Timeseries resampling is fully supported for data with arbitrary dimensions as is both downsampling and upsampling (including linear, quadratic, cubic, and spline interpolation).

    Old syntax:

    In [5]: ds.resample('24H', dim='time', how='max')
    Out[5]: 
    <xarray.Dataset>
    [...]
    

    New syntax:

    In [6]: ds.resample(time='24H').max()
    Out[6]: 
    <xarray.Dataset>
    [...]
    

    Note that both versions are currently supported, but using the old syntax will produce a warning encouraging users to adopt the new syntax. By Daniel Rothenberg.

  • Calling repr() or printing xarray objects at the command line or in a Jupyter Notebook will not longer automatically compute dask variables or load data on arrays lazily loaded from disk (GH1522). By Guido Imperiale.

  • Supplying coords as a dictionary to the DataArray constructor without also supplying an explicit dims argument is no longer supported. This behavior was deprecated in version 0.9 but will now raise an error (GH727).

  • Several existing features have been deprecated and will change to new behavior in xarray v0.11. If you use any of them with xarray v0.10, you should see a FutureWarning that describes how to update your code:

    • Dataset.T has been deprecated an alias for Dataset.transpose() (GH1232). In the next major version of xarray, it will provide short- cut lookup for variables or attributes with name 'T'.
    • DataArray.__contains__ (e.g., key in data_array) currently checks for membership in DataArray.coords. In the next major version of xarray, it will check membership in the array data found in DataArray.values instead (GH1267).
    • Direct iteration over and counting a Dataset (e.g., [k for k in ds], ds.keys(), ds.values(), len(ds) and if ds) currently includes all variables, both data and coordinates. For improved usability and consistency with pandas, in the next major version of xarray these will change to only include data variables (GH884). Use ds.variables, ds.data_vars or ds.coords as alternatives.
  • Changes to minimum versions of dependencies:

    • Old numpy < 1.11 and pandas < 0.18 are no longer supported (GH1512). By Keisuke Fujii.
    • The minimum supported version bottleneck has increased to 1.1 (GH1279). By Joe Hamman.

Enhancements

New functions/methods

  • New helper function apply_ufunc() for wrapping functions written to work on NumPy arrays to support labels on xarray objects (GH770). apply_ufunc also support automatic parallelization for many functions with dask. See Wrapping custom computation and Automatic parallelization for details. By Stephan Hoyer.

  • Added new method Dataset.to_dask_dataframe(), convert a dataset into a dask dataframe. This allows lazy loading of data from a dataset containing dask arrays (GH1462). By James Munroe.

  • New function where() for conditionally switching between values in xarray objects, like numpy.where():

    In [7]: import xarray as xr
    
    In [8]: arr = xr.DataArray([[1, 2, 3], [4, 5, 6]], dims=('x', 'y'))
    
    In [9]: xr.where(arr % 2, 'even', 'odd')
    Out[9]: 
    <xarray.DataArray (x: 2, y: 3)>
    array([['even', 'odd', 'even'],
           ['odd', 'even', 'odd']],
          dtype='<U4')
    Dimensions without coordinates: x, y
    

    Equivalently, the where() method also now supports the other argument, for filling with a value other than NaN (GH576). By Stephan Hoyer.

  • Added show_versions() function to aid in debugging (GH1485). By Joe Hamman.

Performance improvements

  • concat() was computing variables that aren’t in memory (e.g. dask-based) multiple times; open_mfdataset() was loading them multiple times from disk. Now, both functions will instead load them at most once and, if they do, store them in memory in the concatenated array/dataset (GH1521). By Guido Imperiale.
  • Speed-up (x 100) of decode_cf_datetime(). By Christian Chwala.

IO related improvements

  • Unicode strings (str on Python 3) are now round-tripped successfully even when written as character arrays (e.g., as netCDF3 files or when using engine='scipy') (GH1638). This is controlled by the _Encoding attribute convention, which is also understood directly by the netCDF4-Python interface. See String encoding for full details. By Stephan Hoyer.

  • Support for data_vars and coords keywords from concat() added to open_mfdataset() (GH438). Using these keyword arguments can significantly reduce memory usage and increase speed. By Oleksandr Huziy.

  • Support for pathlib.Path objects added to open_dataset(), open_mfdataset(), to_netcdf(), and save_mfdataset() (GH799):

    In [10]: from pathlib import Path  # In Python 2, use pathlib2!
    
    In [11]: data_dir = Path("data/")
    
    In [12]: one_file = data_dir / "dta_for_month_01.nc"
    
    In [13]: xr.open_dataset(one_file)
    Out[13]: 
    <xarray.Dataset>
    [...]
    

    By Willi Rath.

  • You can now explicitly disable any default _FillValue (NaN for floating point values) by passing the enconding {'_FillValue': None} (GH1598). By Stephan Hoyer.

  • More attributes available in attrs dictionary when raster files are opened with open_rasterio(). By Greg Brener.

  • Support for NetCDF files using an _Unsigned attribute to indicate that a a signed integer data type should be interpreted as unsigned bytes (GH1444). By Eric Bruning.

  • Support using an existing, opened netCDF4 Dataset with NetCDF4DataStore. This permits creating an Dataset from a netCDF4 Dataset that has been opened using other means (GH1459). By Ryan May.

  • Changed PydapDataStore to take a Pydap dataset. This permits opening Opendap datasets that require authentication, by instantiating a Pydap dataset with a session object. Also added xarray.backends.PydapDataStore.open() which takes a url and session object (GH1068). By Philip Graae.

  • Support reading and writing unlimited dimensions with h5netcdf (GH1636). By Joe Hamman.

Other improvements

  • Added _ipython_key_completions_ to xarray objects, to enable autocompletion for dictionary-like access in IPython, e.g., ds['tem + tab -> ds['temperature'] (GH1628). By Keisuke Fujii.
  • Support passing keyword arguments to load, compute, and persist methods. Any keyword arguments supplied to these methods are passed on to the corresponding dask function (GH1523). By Joe Hamman.
  • Encoding attributes are now preserved when xarray objects are concatenated. The encoding is copied from the first object (GH1297). By Joe Hamman and Gerrit Holl.
  • Support applying rolling window operations using bottleneck’s moving window functions on data stored as dask arrays (GH1279). By Joe Hamman.
  • Experimental support for the Dask collection interface (GH1674). By Matthew Rocklin.

Bug fixes

Bug fixes after rc1

  • Suppress warning in IPython autocompletion, related to the deprecation of .T attributes (GH1675). By Keisuke Fujii.
  • Fix a bug in lazily-indexing netCDF array. (GH1688) By Keisuke Fujii.
  • (Internal bug) MemoryCachedArray now supports the orthogonal indexing. Also made some internal cleanups around array wrappers (GH1429). By Keisuke Fujii.
  • (Internal bug) MemoryCachedArray now always wraps np.ndarray by NumpyIndexingAdapter. (GH1694) By Keisuke Fujii.
  • Fix importing xarray when running Python with -OO (GH1706). By Stephan Hoyer.
  • Saving a netCDF file with a coordinates with a spaces in its names now raises an appropriate warning (GH1689). By Stephan Hoyer.
  • Fix two bugs that were preventing dask arrays from being specified as coordinates in the DataArray constructor (GH1684). By Joe Hamman.
  • Fixed apply_ufunc with dask='parallelized' for scalar arguments (GH1697). By Stephan Hoyer.
  • Fix “Chunksize cannot exceed dimension size” error when writing netCDF4 files loaded from disk (GH1225). By Stephan Hoyer.
  • Validate the shape of coordinates with names matching dimensions in the DataArray constructor (GH1709). By Stephan Hoyer.
  • Raise NotImplementedError when attempting to save a MultiIndex to a netCDF file (GH1547). By Stephan Hoyer.
  • Remove netCDF dependency from rasterio backend tests. By Matti Eskelinen

Bug fixes after rc2

  • Fixed unexpected behavior in Dataset.set_index() and DataArray.set_index() introduced by Pandas 0.21.0. Setting a new index with a single variable resulted in 1-level pandas.MultiIndex instead of a simple pandas.Index (GH1722). By Benoit Bovy.
  • Fixed unexpected memory loading of backend arrays after print. (GH1720). By Keisuke Fujii.

v0.9.6 (8 June 2017)

This release includes a number of backwards compatible enhancements and bug fixes.

Enhancements

Bug fixes

  • Fix error from repeated indexing of datasets loaded from disk (GH1374). By Stephan Hoyer.
  • Fix a bug where .isel_points wrongly assigns unselected coordinate to data_vars. By Keisuke Fujii.
  • Tutorial datasets are now checked against a reference MD5 sum to confirm successful download (GH1392). By Matthew Gidden.
  • DataArray.chunk() now accepts dask specific kwargs like Dataset.chunk() does. By Fabien Maussion.
  • Support for engine='pydap' with recent releases of Pydap (3.2.2+), including on Python 3 (GH1174).

Documentation

Testing

  • Fix test suite failure caused by changes to pandas.cut function (GH1386). By Ryan Abernathey.
  • Enhanced tests suite by use of @network decorator, which is controlled via --run-network-tests command line argument to py.test (GH1393). By Matthew Gidden.

v0.9.5 (17 April, 2017)

Remove an inadvertently introduced print statement.

v0.9.3 (16 April, 2017)

This minor release includes bug-fixes and backwards compatible enhancements.

Enhancements

Bug fixes

  • Fix .where() with drop=True when arguments do not have indexes (GH1350). This bug, introduced in v0.9, resulted in xarray producing incorrect results in some cases. By Stephan Hoyer.
  • Fixed writing to file-like objects with to_netcdf() (GH1320). Stephan Hoyer.
  • Fixed explicitly setting engine='scipy' with to_netcdf when not providing a path (GH1321). Stephan Hoyer.
  • Fixed open_dataarray does not pass properly its parameters to open_dataset (GH1359). Stephan Hoyer.
  • Ensure test suite works when runs from an installed version of xarray (GH1336). Use @pytest.mark.slow instead of a custom flag to mark slow tests. By Stephan Hoyer

v0.9.2 (2 April 2017)

The minor release includes bug-fixes and backwards compatible enhancements.

Enhancements

  • rolling on Dataset is now supported (GH859).
  • .rolling() on Dataset is now supported (GH859). By Keisuke Fujii.
  • When bottleneck version 1.1 or later is installed, use bottleneck for rolling var, argmin, argmax, and rank computations. Also, rolling median now accepts a min_periods argument (GH1276). By Joe Hamman.
  • When .plot() is called on a 2D DataArray and only one dimension is specified with x= or y=, the other dimension is now guessed (GH1291). By Vincent Noel.
  • Added new method assign_attrs() to DataArray and Dataset, a chained-method compatible implementation of the dict.update method on attrs (GH1281). By Henry S. Harrison.
  • Added new autoclose=True argument to open_mfdataset() to explicitly close opened files when not in use to prevent occurrence of an OS Error related to too many open files (GH1198). Note, the default is autoclose=False, which is consistent with previous xarray behavior. By Phillip J. Wolfram.
  • The repr() of Dataset and DataArray attributes uses a similar format to coordinates and variables, with vertically aligned entries truncated to fit on a single line (GH1319). Hopefully this will stop people writing data.attrs = {} and discarding metadata in notebooks for the sake of cleaner output. The full metadata is still available as data.attrs. By Zac Hatfield-Dodds.
  • Enhanced tests suite by use of @slow and @flaky decorators, which are controlled via --run-flaky and --skip-slow command line arguments to py.test (GH1336). By Stephan Hoyer and Phillip J. Wolfram.
  • New aggregation on rolling objects DataArray.rolling(...).count() which providing a rolling count of valid values (GH1138).

Bug fixes

v0.9.1 (30 January 2017)

Renamed the “Unindexed dimensions” section in the Dataset and DataArray repr (added in v0.9.0) to “Dimensions without coordinates” (GH1199).

v0.9.0 (25 January 2017)

This major release includes five months worth of enhancements and bug fixes from 24 contributors, including some significant changes that are not fully backwards compatible. Highlights include:

Breaking changes

  • Index coordinates for each dimensions are now optional, and no longer created by default GH1017. You can identify such dimensions without coordinates by their appearance in list of “Dimensions without coordinates” in the Dataset or DataArray repr:

    In [14]: xr.Dataset({'foo': (('x', 'y'), [[1, 2]])})
    Out[14]: 
    <xarray.Dataset>
    Dimensions:  (x: 1, y: 2)
    Dimensions without coordinates: x, y
    Data variables:
        foo      (x, y) int64 1 2
    

    This has a number of implications:

    • align() and reindex() can now error, if dimensions labels are missing and dimensions have different sizes.
    • Because pandas does not support missing indexes, methods such as to_dataframe/from_dataframe and stack/unstack no longer roundtrip faithfully on all inputs. Use reset_index() to remove undesired indexes.
    • Dataset.__delitem__ and drop() no longer delete/drop variables that have dimensions matching a deleted/dropped variable.
    • DataArray.coords.__delitem__ is now allowed on variables matching dimension names.
    • .sel and .loc now handle indexing along a dimension without coordinate labels by doing integer based indexing. See Missing coordinate labels for an example.
    • indexes is no longer guaranteed to include all dimensions names as keys. The new method get_index() has been added to get an index for a dimension guaranteed, falling back to produce a default RangeIndex if necessary.
  • The default behavior of merge is now compat='no_conflicts', so some merges will now succeed in cases that previously raised xarray.MergeError. Set compat='broadcast_equals' to restore the previous default. See Merging with ‘no_conflicts’ for more details.

  • Reading values no longer always caches values in a NumPy array GH1128. Caching of .values on variables read from netCDF files on disk is still the default when open_dataset() is called with cache=True. By Guido Imperiale and Stephan Hoyer.

  • Pickling a Dataset or DataArray linked to a file on disk no longer caches its values into memory before pickling (GH1128). Instead, pickle stores file paths and restores objects by reopening file references. This enables preliminary, experimental use of xarray for opening files with dask.distributed. By Stephan Hoyer.

  • Coordinates used to index a dimension are now loaded eagerly into pandas.Index objects, instead of loading the values lazily. By Guido Imperiale.

  • Automatic levels for 2d plots are now guaranteed to land on vmin and vmax when these kwargs are explicitly provided (GH1191). The automated level selection logic also slightly changed. By Fabien Maussion.

  • DataArray.rename() behavior changed to strictly change the DataArray.name if called with string argument, or strictly change coordinate names if called with dict-like argument. By Markus Gonser.

  • By default to_netcdf() add a _FillValue = NaN attributes to float types. By Frederic Laliberte.

  • repr on DataArray objects uses an shortened display for NumPy array data that is less likely to overflow onto multiple pages (GH1207). By Stephan Hoyer.

  • xarray no longer supports python 3.3, versions of dask prior to v0.9.0, or versions of bottleneck prior to v1.0.

Deprecations

  • Renamed the Coordinate class from xarray’s low level API to IndexVariable. Variable.to_variable and Variable.to_coord have been renamed to to_base_variable() and to_index_variable().
  • Deprecated supplying coords as a dictionary to the DataArray constructor without also supplying an explicit dims argument. The old behavior encouraged relying on the iteration order of dictionaries, which is a bad practice (GH727).
  • Removed a number of methods deprecated since v0.7.0 or earlier: load_data, vars, drop_vars, dump, dumps and the variables keyword argument to Dataset.
  • Removed the dummy module that enabled import xray.

Enhancements

Bug fixes

  • groupby_bins now restores empty bins by default (GH1019). By Ryan Abernathey.
  • Fix issues for dates outside the valid range of pandas timestamps (GH975). By Mathias Hauser.
  • Unstacking produced flipped array after stacking decreasing coordinate values (GH980). By Stephan Hoyer.
  • Setting dtype via the encoding parameter of to_netcdf failed if the encoded dtype was the same as the dtype of the original array (GH873). By Stephan Hoyer.
  • Fix issues with variables where both attributes _FillValue and missing_value are set to NaN (GH997). By Marco Zühlke.
  • .where() and .fillna() now preserve attributes (GH1009). By Fabien Maussion.
  • Applying broadcast() to an xarray object based on the dask backend won’t accidentally convert the array from dask to numpy anymore (GH978). By Guido Imperiale.
  • Dataset.concat() now preserves variables order (GH1027). By Fabien Maussion.
  • Fixed an issue with pcolormesh (GH781). A new infer_intervals keyword gives control on whether the cell intervals should be computed or not. By Fabien Maussion.
  • Grouping over an dimension with non-unique values with groupby gives correct groups. By Stephan Hoyer.
  • Fixed accessing coordinate variables with non-string names from .coords. By Stephan Hoyer.
  • rename() now simultaneously renames the array and any coordinate with the same name, when supplied via a dict (GH1116). By Yves Delley.
  • Fixed sub-optimal performance in certain operations with object arrays (GH1121). By Yves Delley.
  • Fix .groupby(group) when group has datetime dtype (GH1132). By Jonas Sølvsteen.
  • Fixed a bug with facetgrid (the norm keyword was ignored, GH1159). By Fabien Maussion.
  • Resolved a concurrency bug that could cause Python to crash when simultaneously reading and writing netCDF4 files with dask (GH1172). By Stephan Hoyer.
  • Fix to make .copy() actually copy dask arrays, which will be relevant for future releases of dask in which dask arrays will be mutable (GH1180). By Stephan Hoyer.
  • Fix opening NetCDF files with multi-dimensional time variables (GH1229). By Stephan Hoyer.

Performance improvements

  • isel_points() and sel_points() now use vectorised indexing in numpy and dask (GH1161), which can result in several orders of magnitude speedup. By Jonathan Chambers.

v0.8.2 (18 August 2016)

This release includes a number of bug fixes and minor enhancements.

Breaking changes

Enhancements

Bug fixes

  • Ensure xarray works with h5netcdf v0.3.0 for arrays with dtype=str (GH953). By Stephan Hoyer.
  • Dataset.__dir__() (i.e. the method python calls to get autocomplete options) failed if one of the dataset’s keys was not a string (GH852). By Maximilian Roos.
  • Dataset constructor can now take arbitrary objects as values (GH647). By Maximilian Roos.
  • Clarified copy argument for reindex() and align(), which now consistently always return new xarray objects (GH927).
  • Fix open_mfdataset with engine='pynio' (GH936). By Stephan Hoyer.
  • groupby_bins sorted bin labels as strings (GH952). By Stephan Hoyer.
  • Fix bug introduced by v0.8.0 that broke assignment to datasets when both the left and right side have the same non-unique index values (GH956).

v0.8.1 (5 August 2016)

Bug fixes

  • Fix bug in v0.8.0 that broke assignment to Datasets with non-unique indexes (GH943). By Stephan Hoyer.

v0.8.0 (2 August 2016)

This release includes four months of new features and bug fixes, including several breaking changes.

Breaking changes

  • Dropped support for Python 2.6 (GH855).
  • Indexing on multi-index now drop levels, which is consistent with pandas. It also changes the name of the dimension / coordinate when the multi-index is reduced to a single index (GH802).
  • Contour plots no longer add a colorbar per default (GH866). Filled contour plots are unchanged.
  • DataArray.values and .data now always returns an NumPy array-like object, even for 0-dimensional arrays with object dtype (GH867). Previously, .values returned native Python objects in such cases. To convert the values of scalar arrays to Python objects, use the .item() method.

Enhancements

  • Groupby operations now support grouping over multidimensional variables. A new method called groupby_bins() has also been added to allow users to specify bins for grouping. The new features are described in Multidimensional Grouping and Working with Multidimensional Coordinates. By Ryan Abernathey.
  • DataArray and Dataset method where() now supports a drop=True option that clips coordinate elements that are fully masked. By Phillip J. Wolfram.
  • New top level merge() function allows for combining variables from any number of Dataset and/or DataArray variables. See Merge for more details. By Stephan Hoyer.
  • DataArray and Dataset method resample() now supports the keep_attrs=False option that determines whether variable and dataset attributes are retained in the resampled object. By Jeremy McGibbon.
  • Better multi-index support in DataArray and Dataset sel() and loc() methods, which now behave more closely to pandas and which also accept dictionaries for indexing based on given level names and labels (see Multi-level indexing). By Benoit Bovy.
  • New (experimental) decorators register_dataset_accessor() and register_dataarray_accessor() for registering custom xarray extensions without subclassing. They are described in the new documentation page on xarray Internals. By Stephan Hoyer.
  • Round trip boolean datatypes. Previously, writing boolean datatypes to netCDF formats would raise an error since netCDF does not have a bool datatype. This feature reads/writes a dtype attribute to boolean variables in netCDF files. By Joe Hamman.
  • 2D plotting methods now have two new keywords (cbar_ax and cbar_kwargs), allowing more control on the colorbar (GH872). By Fabien Maussion.
  • New Dataset method filter_by_attrs(), akin to netCDF4.Dataset.get_variables_by_attributes, to easily filter data variables using its attributes. Filipe Fernandes.

Bug fixes

  • Attributes were being retained by default for some resampling operations when they should not. With the keep_attrs=False option, they will no longer be retained by default. This may be backwards-incompatible with some scripts, but the attributes may be kept by adding the keep_attrs=True option. By Jeremy McGibbon.
  • Concatenating xarray objects along an axis with a MultiIndex or PeriodIndex preserves the nature of the index (GH875). By Stephan Hoyer.
  • Fixed bug in arithmetic operations on DataArray objects whose dimensions are numpy structured arrays or recarrays GH861, GH837. By Maciek Swat.
  • decode_cf_timedelta now accepts arrays with ndim >1 (GH842).
    This fixes issue GH665. Filipe Fernandes.
  • Fix a bug where xarray.ufuncs that take two arguments would incorrectly use to numpy functions instead of dask.array functions (GH876). By Stephan Hoyer.
  • Support for pickling functions from xarray.ufuncs (GH901). By Stephan Hoyer.
  • Variable.copy(deep=True) no longer converts MultiIndex into a base Index (GH769). By Benoit Bovy.
  • Fixes for groupby on dimensions with a multi-index (GH867). By Stephan Hoyer.
  • Fix printing datasets with unicode attributes on Python 2 (GH892). By Stephan Hoyer.
  • Fixed incorrect test for dask version (GH891). By Stephan Hoyer.
  • Fixed dim argument for isel_points/sel_points when a pandas.Index is passed. By Stephan Hoyer.
  • contour() now plots the correct number of contours (GH866). By Fabien Maussion.

v0.7.2 (13 March 2016)

This release includes two new, entirely backwards compatible features and several bug fixes.

Enhancements

  • New DataArray method DataArray.dot() for calculating the dot product of two DataArrays along shared dimensions. By Dean Pospisil.

  • Rolling window operations on DataArray objects are now supported via a new DataArray.rolling() method. For example:

    In [15]: import xarray as xr; import numpy as np
    
    In [16]: arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5),
                               dims=('x', 'y'))
    
    In [17]: arr
    Out[17]: 
    <xarray.DataArray (x: 3, y: 5)>
    array([[ 0. ,  0.5,  1. ,  1.5,  2. ],
           [ 2.5,  3. ,  3.5,  4. ,  4.5],
           [ 5. ,  5.5,  6. ,  6.5,  7. ]])
    Coordinates:
      * x        (x) int64 0 1 2
      * y        (y) int64 0 1 2 3 4
    
    In [18]: arr.rolling(y=3, min_periods=2).mean()
    Out[18]: 
    <xarray.DataArray (x: 3, y: 5)>
    array([[  nan,  0.25,  0.5 ,  1.  ,  1.5 ],
           [  nan,  2.75,  3.  ,  3.5 ,  4.  ],
           [  nan,  5.25,  5.5 ,  6.  ,  6.5 ]])
    Coordinates:
      * x        (x) int64 0 1 2
      * y        (y) int64 0 1 2 3 4
    

    See Rolling window operations for more details. By Joe Hamman.

Bug fixes

  • Fixed an issue where plots using pcolormesh and Cartopy axes were being distorted by the inference of the axis interval breaks. This change chooses not to modify the coordinate variables when the axes have the attribute projection, allowing Cartopy to handle the extent of pcolormesh plots (GH781). By Joe Hamman.
  • 2D plots now better handle additional coordinates which are not DataArray dimensions (GH788). By Fabien Maussion.

v0.7.1 (16 February 2016)

This is a bug fix release that includes two small, backwards compatible enhancements. We recommend that all users upgrade.

Enhancements

  • Numerical operations now return empty objects on no overlapping labels rather than raising ValueError (GH739).
  • Series is now supported as valid input to the Dataset constructor (GH740).

Bug fixes

  • Restore checks for shape consistency between data and coordinates in the DataArray constructor (GH758).
  • Single dimension variables no longer transpose as part of a broader .transpose. This behavior was causing pandas.PeriodIndex dimensions to lose their type (GH749)
  • Dataset labels remain as their native type on .to_dataset. Previously they were coerced to strings (GH745)
  • Fixed a bug where replacing a DataArray index coordinate would improperly align the coordinate (GH725).
  • DataArray.reindex_like now maintains the dtype of complex numbers when reindexing leads to NaN values (GH738).
  • Dataset.rename and DataArray.rename support the old and new names being the same (GH724).
  • Fix from_dataset() for DataFrames with Categorical column and a MultiIndex index (GH737).
  • Fixes to ensure xarray works properly after the upcoming pandas v0.18 and NumPy v1.11 releases.

Acknowledgments

The following individuals contributed to this release:

  • Edward Richards
  • Maximilian Roos
  • Rafael Guedes
  • Spencer Hill
  • Stephan Hoyer

v0.7.0 (21 January 2016)

This major release includes redesign of DataArray internals, as well as new methods for reshaping, rolling and shifting data. It includes preliminary support for pandas.MultiIndex, as well as a number of other features and bug fixes, several of which offer improved compatibility with pandas.

New name

The project formerly known as “xray” is now “xarray”, pronounced “x-array”! This avoids a namespace conflict with the entire field of x-ray science. Renaming our project seemed like the right thing to do, especially because some scientists who work with actual x-rays are interested in using this project in their work. Thanks for your understanding and patience in this transition. You can now find our documentation and code repository at new URLs:

To ease the transition, we have simultaneously released v0.7.0 of both xray and xarray on the Python Package Index. These packages are identical. For now, import xray still works, except it issues a deprecation warning. This will be the last xray release. Going forward, we recommend switching your import statements to import xarray as xr.

Breaking changes

  • The internal data model used by DataArray has been rewritten to fix several outstanding issues (GH367, GH634, this stackoverflow report). Internally, DataArray is now implemented in terms of ._variable and ._coords attributes instead of holding variables in a Dataset object.

    This refactor ensures that if a DataArray has the same name as one of its coordinates, the array and the coordinate no longer share the same data.

    In practice, this means that creating a DataArray with the same name as one of its dimensions no longer automatically uses that array to label the corresponding coordinate. You will now need to provide coordinate labels explicitly. Here’s the old behavior:

    In [19]: xray.DataArray([4, 5, 6], dims='x', name='x')
    Out[19]: 
    <xray.DataArray 'x' (x: 3)>
    array([4, 5, 6])
    Coordinates:
      * x        (x) int64 4 5 6
    

    and the new behavior (compare the values of the x coordinate):

    In [20]: xray.DataArray([4, 5, 6], dims='x', name='x')
    Out[20]: 
    <xray.DataArray 'x' (x: 3)>
    array([4, 5, 6])
    Coordinates:
      * x        (x) int64 0 1 2
    
  • It is no longer possible to convert a DataArray to a Dataset with xray.DataArray.to_dataset() if it is unnamed. This will now raise ValueError. If the array is unnamed, you need to supply the name argument.

Enhancements

  • Basic support for MultiIndex coordinates on xray objects, including indexing, stack() and unstack():

    In [21]: df = pd.DataFrame({'foo': range(3),
       ....:                    'x': ['a', 'b', 'b'],
       ....:                    'y': [0, 0, 1]})
       ....: 
    
    In [22]: s = df.set_index(['x', 'y'])['foo']
    
    In [23]: arr = xray.DataArray(s, dims='z')
    
    In [24]: arr
    Out[24]: 
    <xray.DataArray 'foo' (z: 3)>
    array([0, 1, 2])
    Coordinates:
      * z        (z) object ('a', 0) ('b', 0) ('b', 1)
    
    In [25]: arr.indexes['z']
    Out[25]: 
    MultiIndex(levels=[[u'a', u'b'], [0, 1]],
               labels=[[0, 1, 1], [0, 0, 1]],
               names=[u'x', u'y'])
    
    In [26]: arr.unstack('z')
    Out[26]: 
    <xray.DataArray 'foo' (x: 2, y: 2)>
    array([[  0.,  nan],
           [  1.,   2.]])
    Coordinates:
      * x        (x) object 'a' 'b'
      * y        (y) int64 0 1
    
    In [27]: arr.unstack('z').stack(z=('x', 'y'))
    Out[27]: 
    <xray.DataArray 'foo' (z: 4)>
    array([  0.,  nan,   1.,   2.])
    Coordinates:
      * z        (z) object ('a', 0) ('a', 1) ('b', 0) ('b', 1)
    

    See Stack and unstack for more details.

    Warning

    xray’s MultiIndex support is still experimental, and we have a long to- do list of desired additions (GH719), including better display of multi-index levels when printing a Dataset, and support for saving datasets with a MultiIndex to a netCDF file. User contributions in this area would be greatly appreciated.

  • Support for reading GRIB, HDF4 and other file formats via PyNIO. See Formats supported by PyNIO for more details.

  • Better error message when a variable is supplied with the same name as one of its dimensions.

  • Plotting: more control on colormap parameters (GH642). vmin and vmax will not be silently ignored anymore. Setting center=False prevents automatic selection of a divergent colormap.

  • New shift() and roll() methods for shifting/rotating datasets or arrays along a dimension:

    In [28]: array = xray.DataArray([5, 6, 7, 8], dims='x')
    
    In [29]: array.shift(x=2)
    Out[29]: 
    <xarray.DataArray (x: 4)>
    array([nan, nan,  5.,  6.])
    Dimensions without coordinates: x
    
    In [30]: array.roll(x=2)
    Out[30]: 
    <xarray.DataArray (x: 4)>
    array([7, 8, 5, 6])
    Dimensions without coordinates: x
    

    Notice that shift moves data independently of coordinates, but roll moves both data and coordinates.

  • Assigning a pandas object directly as a Dataset variable is now permitted. Its index names correspond to the dims of the Dataset, and its data is aligned.

  • Passing a pandas.DataFrame or pandas.Panel to a Dataset constructor is now permitted.

  • New function broadcast() for explicitly broadcasting DataArray and Dataset objects against each other. For example:

    In [31]: a = xray.DataArray([1, 2, 3], dims='x')
    
    In [32]: b = xray.DataArray([5, 6], dims='y')
    
    In [33]: a
    Out[33]: 
    <xarray.DataArray (x: 3)>
    array([1, 2, 3])
    Dimensions without coordinates: x
    
    In [34]: b
    Out[34]: 
    <xarray.DataArray (y: 2)>
    array([5, 6])
    Dimensions without coordinates: y
    
    In [35]: a2, b2 = xray.broadcast(a, b)
    
    In [36]: a2
    Out[36]: 
    <xarray.DataArray (x: 3, y: 2)>
    array([[1, 1],
           [2, 2],
           [3, 3]])
    Dimensions without coordinates: x, y
    
    In [37]: b2
    Out[37]: 
    <xarray.DataArray (x: 3, y: 2)>
    array([[5, 6],
           [5, 6],
           [5, 6]])
    Dimensions without coordinates: x, y
    

Bug fixes

  • Fixes for several issues found on DataArray objects with the same name as one of their coordinates (see Breaking changes for more details).
  • DataArray.to_masked_array always returns masked array with mask being an array (not a scalar value) (GH684)
  • Allows for (imperfect) repr of Coords when underlying index is PeriodIndex (GH645).
  • Fixes for several issues found on DataArray objects with the same name as one of their coordinates (see Breaking changes for more details).
  • Attempting to assign a Dataset or DataArray variable/attribute using attribute-style syntax (e.g., ds.foo = 42) now raises an error rather than silently failing (GH656, GH714).
  • You can now pass pandas objects with non-numpy dtypes (e.g., categorical or datetime64 with a timezone) into xray without an error (GH716).

Acknowledgments

The following individuals contributed to this release:

  • Antony Lee
  • Fabien Maussion
  • Joe Hamman
  • Maximilian Roos
  • Stephan Hoyer
  • Takeshi Kanmae
  • femtotrader

v0.6.1 (21 October 2015)

This release contains a number of bug and compatibility fixes, as well as enhancements to plotting, indexing and writing files to disk.

Note that the minimum required version of dask for use with xray is now version 0.6.

API Changes

  • The handling of colormaps and discrete color lists for 2D plots in plot() was changed to provide more compatibility with matplotlib’s contour and contourf functions (GH538). Now discrete lists of colors should be specified using colors keyword, rather than cmap.

Enhancements

  • Faceted plotting through FacetGrid and the plot() method. See Faceting for more details and examples.

  • sel() and reindex() now support the tolerance argument for controlling nearest-neighbor selection (GH629):

    In [38]: array = xray.DataArray([1, 2, 3], dims='x')
    
    In [39]: array.reindex(x=[0.9, 1.5], method='nearest', tolerance=0.2)
    Out[39]: 
    <xray.DataArray (x: 2)>
    array([  2.,  nan])
    Coordinates:
      * x        (x) float64 0.9 1.5
    

    This feature requires pandas v0.17 or newer.

  • New encoding argument in to_netcdf() for writing netCDF files with compression, as described in the new documentation section on Writing encoded data.

  • Add real and imag attributes to Dataset and DataArray (GH553).

  • More informative error message with from_dataframe() if the frame has duplicate columns.

  • xray now uses deterministic names for dask arrays it creates or opens from disk. This allows xray users to take advantage of dask’s nascent support for caching intermediate computation results. See GH555 for an example.

Bug fixes

  • Forwards compatibility with the latest pandas release (v0.17.0). We were using some internal pandas routines for datetime conversion, which unfortunately have now changed upstream (GH569).
  • Aggregation functions now correctly skip NaN for data for complex128 dtype (GH554).
  • Fixed indexing 0d arrays with unicode dtype (GH568).
  • name() and Dataset keys must be a string or None to be written to netCDF (GH533).
  • where() now uses dask instead of numpy if either the array or other is a dask array. Previously, if other was a numpy array the method was evaluated eagerly.
  • Global attributes are now handled more consistently when loading remote datasets using engine='pydap' (GH574).
  • It is now possible to assign to the .data attribute of DataArray objects.
  • coordinates attribute is now kept in the encoding dictionary after decoding (GH610).
  • Compatibility with numpy 1.10 (GH617).

Acknowledgments

The following individuals contributed to this release:

  • Ryan Abernathey
  • Pete Cable
  • Clark Fitzgerald
  • Joe Hamman
  • Stephan Hoyer
  • Scott Sinclair

v0.6.0 (21 August 2015)

This release includes numerous bug fixes and enhancements. Highlights include the introduction of a plotting module and the new Dataset and DataArray methods isel_points(), sel_points(), where() and diff(). There are no breaking changes from v0.5.2.

Enhancements

  • Plotting methods have been implemented on DataArray objects plot() through integration with matplotlib (GH185). For an introduction, see Plotting.

  • Variables in netCDF files with multiple missing values are now decoded as NaN after issuing a warning if open_dataset is called with mask_and_scale=True.

  • We clarified our rules for when the result from an xray operation is a copy vs. a view (see copies vs views for more details).

  • Dataset variables are now written to netCDF files in order of appearance when using the netcdf4 backend (GH479).

  • Added isel_points() and sel_points() to support pointwise indexing of Datasets and DataArrays (GH475).

    In [40]: da = xray.DataArray(np.arange(56).reshape((7, 8)),
       ....:                     coords={'x': list('abcdefg'),
       ....:                             'y': 10 * np.arange(8)},
       ....:                     dims=['x', 'y'])
       ....: 
    
    In [41]: da
    Out[41]: 
    <xray.DataArray (x: 7, y: 8)>
    array([[ 0,  1,  2,  3,  4,  5,  6,  7],
           [ 8,  9, 10, 11, 12, 13, 14, 15],
           [16, 17, 18, 19, 20, 21, 22, 23],
           [24, 25, 26, 27, 28, 29, 30, 31],
           [32, 33, 34, 35, 36, 37, 38, 39],
           [40, 41, 42, 43, 44, 45, 46, 47],
           [48, 49, 50, 51, 52, 53, 54, 55]])
    Coordinates:
    * y        (y) int64 0 10 20 30 40 50 60 70
    * x        (x) |S1 'a' 'b' 'c' 'd' 'e' 'f' 'g'
    
    # we can index by position along each dimension
    In [42]: da.isel_points(x=[0, 1, 6], y=[0, 1, 0], dim='points')
    Out[42]: 
    <xray.DataArray (points: 3)>
    array([ 0,  9, 48])
    Coordinates:
        y        (points) int64 0 10 0
        x        (points) |S1 'a' 'b' 'g'
      * points   (points) int64 0 1 2
    
    # or equivalently by label
    In [43]: da.sel_points(x=['a', 'b', 'g'], y=[0, 10, 0], dim='points')
    Out[43]: 
    <xray.DataArray (points: 3)>
    array([ 0,  9, 48])
    Coordinates:
        y        (points) int64 0 10 0
        x        (points) |S1 'a' 'b' 'g'
      * points   (points) int64 0 1 2
    
  • New where() method for masking xray objects according to some criteria. This works particularly well with multi-dimensional data:

    In [44]: ds = xray.Dataset(coords={'x': range(100), 'y': range(100)})
    
    In [45]: ds['distance'] = np.sqrt(ds.x ** 2 + ds.y ** 2)
    
    In [46]: ds.distance.where(ds.distance < 100).plot()
    Out[46]: <matplotlib.collections.QuadMesh at 0x7f60e209a358>
    
    _images/where_example.png
  • Added new methods DataArray.diff and Dataset.diff for finite difference calculations along a given axis.

  • New to_masked_array() convenience method for returning a numpy.ma.MaskedArray.

    In [47]: da = xray.DataArray(np.random.random_sample(size=(5, 4)))
    
    In [48]: da.where(da < 0.5)
    Out[48]: 
    <xarray.DataArray (dim_0: 5, dim_1: 4)>
    array([[0.12697 ,      nan, 0.260476,      nan],
           [0.37675 , 0.336222, 0.451376,      nan],
           [0.123102,      nan, 0.373012, 0.447997],
           [0.129441,      nan,      nan, 0.352054],
           [0.228887,      nan,      nan, 0.137554]])
    Dimensions without coordinates: dim_0, dim_1
    
    In [49]: da.where(da < 0.5).to_masked_array(copy=True)
    Out[49]: 
    masked_array(
      data=[[0.12696983303810094, --, 0.26047600586578334, --],
            [0.37674971618967135, 0.33622174433445307, 0.45137647047539964, --],
            [0.12310214428849964, --, 0.37301222522143085, 0.4479968246859435],
            [0.12944067971751294, --, --, 0.35205353914802473],
            [0.2288873043216132, --, --, 0.1375535565632705]],
      mask=[[False,  True, False,  True],
            [False, False, False,  True],
            [False,  True, False, False],
            [False,  True,  True, False],
            [False,  True,  True, False]],
      fill_value=1e+20)
    
  • Added new flag “drop_variables” to open_dataset() for excluding variables from being parsed. This may be useful to drop variables with problems or inconsistent values.

Bug fixes

  • Fixed aggregation functions (e.g., sum and mean) on big-endian arrays when bottleneck is installed (GH489).
  • Dataset aggregation functions dropped variables with unsigned integer dtype (GH505).
  • .any() and .all() were not lazy when used on xray objects containing dask arrays.
  • Fixed an error when attempting to saving datetime64 variables to netCDF files when the first element is NaT (GH528).
  • Fix pickle on DataArray objects (GH515).
  • Fixed unnecessary coercion of float64 to float32 when using netcdf3 and netcdf4_classic formats (GH526).

v0.5.2 (16 July 2015)

This release contains bug fixes, several additional options for opening and saving netCDF files, and a backwards incompatible rewrite of the advanced options for xray.concat.

Backwards incompatible changes

  • The optional arguments concat_over and mode in concat() have been removed and replaced by data_vars and coords. The new arguments are both more easily understood and more robustly implemented, and allowed us to fix a bug where concat accidentally loaded data into memory. If you set values for these optional arguments manually, you will need to update your code. The default behavior should be unchanged.

Enhancements

  • open_mfdataset() now supports a preprocess argument for preprocessing datasets prior to concatenaton. This is useful if datasets cannot be otherwise merged automatically, e.g., if the original datasets have conflicting index coordinates (GH443).

  • open_dataset() and open_mfdataset() now use a global thread lock by default for reading from netCDF files with dask. This avoids possible segmentation faults for reading from netCDF4 files when HDF5 is not configured properly for concurrent access (GH444).

  • Added support for serializing arrays of complex numbers with engine=’h5netcdf’.

  • The new save_mfdataset() function allows for saving multiple datasets to disk simultaneously. This is useful when processing large datasets with dask.array. For example, to save a dataset too big to fit into memory to one file per year, we could write:

    In [50]: years, datasets = zip(*ds.groupby('time.year'))
    
    In [51]: paths = ['%s.nc' % y for y in years]
    
    In [52]: xray.save_mfdataset(datasets, paths)
    

Bug fixes

  • Fixed min, max, argmin and argmax for arrays with string or unicode types (GH453).
  • open_dataset() and open_mfdataset() support supplying chunks as a single integer.
  • Fixed a bug in serializing scalar datetime variable to netCDF.
  • Fixed a bug that could occur in serialization of 0-dimensional integer arrays.
  • Fixed a bug where concatenating DataArrays was not always lazy (GH464).
  • When reading datasets with h5netcdf, bytes attributes are decoded to strings. This allows conventions decoding to work properly on Python 3 (GH451).

v0.5.1 (15 June 2015)

This minor release fixes a few bugs and an inconsistency with pandas. It also adds the pipe method, copied from pandas.

Enhancements

  • Added pipe(), replicating the new pandas method in version 0.16.2. See Transforming datasets for more details.
  • assign() and assign_coords() now assign new variables in sorted (alphabetical) order, mirroring the behavior in pandas. Previously, the order was arbitrary.

Bug fixes

  • xray.concat fails in an edge case involving identical coordinate variables (GH425)
  • We now decode variables loaded from netCDF3 files with the scipy engine using native endianness (GH416). This resolves an issue when aggregating these arrays with bottleneck installed.

v0.5 (1 June 2015)

Highlights

The headline feature in this release is experimental support for out-of-core computing (data that doesn’t fit into memory) with dask. This includes a new top-level function open_mfdataset() that makes it easy to open a collection of netCDF (using dask) as a single xray.Dataset object. For more on dask, read the blog post introducing xray + dask and the new documentation section Parallel computing with dask.

Dask makes it possible to harness parallelism and manipulate gigantic datasets with xray. It is currently an optional dependency, but it may become required in the future.

Backwards incompatible changes

  • The logic used for choosing which variables are concatenated with concat() has changed. Previously, by default any variables which were equal across a dimension were not concatenated. This lead to some surprising behavior, where the behavior of groupby and concat operations could depend on runtime values (GH268). For example:

    In [53]: ds = xray.Dataset({'x': 0})
    
    In [54]: xray.concat([ds, ds], dim='y')
    Out[54]: 
    <xray.Dataset>
    Dimensions:  ()
    Coordinates:
        *empty*
    Data variables:
        x        int64 0
    

    Now, the default always concatenates data variables:

    In [55]: xray.concat([ds, ds], dim='y')
    Out[55]: 
    <xarray.Dataset>
    Dimensions:  (y: 2)
    Dimensions without coordinates: y
    Data variables:
        x        (y) int64 0 0
    

    To obtain the old behavior, supply the argument concat_over=[].

Enhancements

  • New to_array() and enhanced to_dataset() methods make it easy to switch back and forth between arrays and datasets:

    In [56]: ds = xray.Dataset({'a': 1, 'b': ('x', [1, 2, 3])},
       ....:                   coords={'c': 42}, attrs={'Conventions': 'None'})
       ....: 
    
    In [57]: ds.to_array()
    Out[57]: 
    <xarray.DataArray (variable: 2, x: 3)>
    array([[1, 1, 1],
           [1, 2, 3]])
    Coordinates:
        c         int64 42
      * variable  (variable) <U1 'a' 'b'
    Dimensions without coordinates: x
    Attributes:
        Conventions:  None
    
    In [58]: ds.to_array().to_dataset(dim='variable')
    Out[58]: 
    <xarray.Dataset>
    Dimensions:  (x: 3)
    Coordinates:
        c        int64 42
    Dimensions without coordinates: x
    Data variables:
        a        (x) int64 1 1 1
        b        (x) int64 1 2 3
    Attributes:
        Conventions:  None
    
  • New fillna() method to fill missing values, modeled off the pandas method of the same name:

    In [59]: array = xray.DataArray([np.nan, 1, np.nan, 3], dims='x')
    
    In [60]: array.fillna(0)
    Out[60]: 
    <xarray.DataArray (x: 4)>
    array([0., 1., 0., 3.])
    Dimensions without coordinates: x
    

    fillna works on both Dataset and DataArray objects, and uses index based alignment and broadcasting like standard binary operations. It also can be applied by group, as illustrated in Fill missing values with climatology.

  • New assign() and assign_coords() methods patterned off the new DataFrame.assign method in pandas:

    In [61]: ds = xray.Dataset({'y': ('x', [1, 2, 3])})
    
    In [62]: ds.assign(z = lambda ds: ds.y ** 2)
    Out[62]: 
    <xarray.Dataset>
    Dimensions:  (x: 3)
    Dimensions without coordinates: x
    Data variables:
        y        (x) int64 1 2 3
        z        (x) int64 1 4 9
    
    In [63]: ds.assign_coords(z = ('x', ['a', 'b', 'c']))
    Out[63]: 
    <xarray.Dataset>
    Dimensions:  (x: 3)
    Coordinates:
        z        (x) <U1 'a' 'b' 'c'
    Dimensions without coordinates: x
    Data variables:
        y        (x) int64 1 2 3
    

    These methods return a new Dataset (or DataArray) with updated data or coordinate variables.

  • sel() now supports the method parameter, which works like the paramter of the same name on reindex(). It provides a simple interface for doing nearest-neighbor interpolation:

    In [64]: ds.sel(x=1.1, method='nearest')
    Out[64]: 
    <xray.Dataset>
    Dimensions:  ()
    Coordinates:
        x        int64 1
    Data variables:
        y        int64 2
    
    In [65]: ds.sel(x=[1.1, 2.1], method='pad')
    Out[65]: 
    <xray.Dataset>
    Dimensions:  (x: 2)
    Coordinates:
      * x        (x) int64 1 2
    Data variables:
        y        (x) int64 2 3
    

    See Nearest neighbor lookups for more details.

  • You can now control the underlying backend used for accessing remote datasets (via OPeNDAP) by specifying engine='netcdf4' or engine='pydap'.

  • xray now provides experimental support for reading and writing netCDF4 files directly via h5py with the h5netcdf package, avoiding the netCDF4-Python package. You will need to install h5netcdf and specify engine='h5netcdf' to try this feature.

  • Accessing data from remote datasets now has retrying logic (with exponential backoff) that should make it robust to occasional bad responses from DAP servers.

  • You can control the width of the Dataset repr with xray.set_options. It can be used either as a context manager, in which case the default is restored outside the context:

    In [66]: ds = xray.Dataset({'x': np.arange(1000)})
    
    In [67]: with xray.set_options(display_width=40):
       ....:     print(ds)
       ....: 
    <xarray.Dataset>
    Dimensions:  (x: 1000)
    Coordinates:
      * x        (x) int64 0 1 2 ... 998 999
    Data variables:
        *empty*
    

    Or to set a global option:

    In [68]: xray.set_options(display_width=80)
    

    The default value for the display_width option is 80.

Deprecations

  • The method load_data() has been renamed to the more succinct load().

v0.4.1 (18 March 2015)

The release contains bug fixes and several new features. All changes should be fully backwards compatible.

Enhancements

  • New documentation sections on Time series data and Combining multiple files.

  • resample() lets you resample a dataset or data array to a new temporal resolution. The syntax is the same as pandas, except you need to supply the time dimension explicitly:

    In [69]: time = pd.date_range('2000-01-01', freq='6H', periods=10)
    
    In [70]: array = xray.DataArray(np.arange(10), [('time', time)])
    
    In [71]: array.resample('1D', dim='time')
    Out[71]: 
    <xarray.DataArray (time: 3)>
    array([1.5, 5.5, 8.5])
    Coordinates:
      * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
    

    You can specify how to do the resampling with the how argument and other options such as closed and label let you control labeling:

    In [72]: array.resample('1D', dim='time', how='sum', label='right')
    Out[72]: 
    <xarray.DataArray (time: 3)>
    array([ 6, 22, 17])
    Coordinates:
      * time     (time) datetime64[ns] 2000-01-02 2000-01-03 2000-01-04
    

    If the desired temporal resolution is higher than the original data (upsampling), xray will insert missing values:

    In [73]: array.resample('3H', 'time')
    Out[73]: 
    <xarray.DataArray (time: 19)>
    array([ 0., nan,  1., nan,  2., nan,  3., nan,  4., nan,  5., nan,  6., nan,
            7., nan,  8., nan,  9.])
    Coordinates:
      * time     (time) datetime64[ns] 2000-01-01 ... 2000-01-03T06:00:00
    
  • first and last methods on groupby objects let you take the first or last examples from each group along the grouped axis:

    In [74]: array.groupby('time.day').first()
    Out[74]: 
    <xarray.DataArray (day: 3)>
    array([0, 4, 8])
    Coordinates:
      * day      (day) int64 1 2 3
    

    These methods combine well with resample:

    In [75]: array.resample('1D', dim='time', how='first')
    Out[75]: 
    <xarray.DataArray (time: 3)>
    array([0, 4, 8])
    Coordinates:
      * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
    
  • swap_dims() allows for easily swapping one dimension out for another:

    In [76]: ds = xray.Dataset({'x': range(3), 'y': ('x', list('abc'))})
    
    In [77]: ds
    Out[77]: 
    <xarray.Dataset>
    Dimensions:  (x: 3)
    Coordinates:
      * x        (x) int64 0 1 2
    Data variables:
        y        (x) <U1 'a' 'b' 'c'
    
    In [78]: ds.swap_dims({'x': 'y'})
    Out[78]: 
    <xarray.Dataset>
    Dimensions:  (y: 3)
    Coordinates:
        x        (y) int64 0 1 2
      * y        (y) <U1 'a' 'b' 'c'
    Data variables:
        *empty*
    

    This was possible in earlier versions of xray, but required some contortions.

  • open_dataset() and to_netcdf() now accept an engine argument to explicitly select which underlying library (netcdf4 or scipy) is used for reading/writing a netCDF file.

Bug fixes

  • Fixed a bug where data netCDF variables read from disk with engine='scipy' could still be associated with the file on disk, even after closing the file (GH341). This manifested itself in warnings about mmapped arrays and segmentation faults (if the data was accessed).
  • Silenced spurious warnings about all-NaN slices when using nan-aware aggregation methods (GH344).
  • Dataset aggregations with keep_attrs=True now preserve attributes on data variables, not just the dataset itself.
  • Tests for xray now pass when run on Windows (GH360).
  • Fixed a regression in v0.4 where saving to netCDF could fail with the error ValueError: could not automatically determine time units.

v0.4 (2 March, 2015)

This is one of the biggest releases yet for xray: it includes some major changes that may break existing code, along with the usual collection of minor enhancements and bug fixes. On the plus side, this release includes all hitherto planned breaking changes, so the upgrade path for xray should be smoother going forward.

Breaking changes

  • We now automatically align index labels in arithmetic, dataset construction, merging and updating. This means the need for manually invoking methods like align() and reindex_like() should be vastly reduced.

    For arithmetic, we align based on the intersection of labels:

    In [79]: lhs = xray.DataArray([1, 2, 3], [('x', [0, 1, 2])])
    
    In [80]: rhs = xray.DataArray([2, 3, 4], [('x', [1, 2, 3])])
    
    In [81]: lhs + rhs
    Out[81]: 
    <xarray.DataArray (x: 2)>
    array([4, 6])
    Coordinates:
      * x        (x) int64 1 2
    

    For dataset construction and merging, we align based on the union of labels:

    In [82]: xray.Dataset({'foo': lhs, 'bar': rhs})
    Out[82]: 
    <xarray.Dataset>
    Dimensions:  (x: 4)
    Coordinates:
      * x        (x) int64 0 1 2 3
    Data variables:
        foo      (x) float64 1.0 2.0 3.0 nan
        bar      (x) float64 nan 2.0 3.0 4.0
    

    For update and __setitem__, we align based on the original object:

    In [83]: lhs.coords['rhs'] = rhs
    
    In [84]: lhs
    Out[84]: 
    <xarray.DataArray (x: 3)>
    array([1, 2, 3])
    Coordinates:
      * x        (x) int64 0 1 2
        rhs      (x) float64 nan 2.0 3.0
    
  • Aggregations like mean or median now skip missing values by default:

    In [85]: xray.DataArray([1, 2, np.nan, 3]).mean()
    Out[85]: 
    <xarray.DataArray ()>
    array(2.)
    

    You can turn this behavior off by supplying the keyword arugment skipna=False.

    These operations are lightning fast thanks to integration with bottleneck, which is a new optional dependency for xray (numpy is used if bottleneck is not installed).

  • Scalar coordinates no longer conflict with constant arrays with the same value (e.g., in arithmetic, merging datasets and concat), even if they have different shape (GH243). For example, the coordinate c here persists through arithmetic, even though it has different shapes on each DataArray:

    In [86]: a = xray.DataArray([1, 2], coords={'c': 0}, dims='x')
    
    In [87]: b = xray.DataArray([1, 2], coords={'c': ('x', [0, 0])}, dims='x')
    
    In [88]: (a + b).coords
    Out[88]: 
    Coordinates:
        c        (x) int64 0 0
    

    This functionality can be controlled through the compat option, which has also been added to the Dataset constructor.

  • Datetime shortcuts such as 'time.month' now return a DataArray with the name 'month', not 'time.month' (GH345). This makes it easier to index the resulting arrays when they are used with groupby:

    In [89]: time = xray.DataArray(pd.date_range('2000-01-01', periods=365),
       ....:                       dims='time', name='time')
       ....: 
    
    In [90]: counts = time.groupby('time.month').count()
    
    In [91]: counts.sel(month=2)
    Out[91]: 
    <xarray.DataArray 'time' ()>
    array(29)
    Coordinates:
        month    int64 2
    

    Previously, you would need to use something like counts.sel(**{'time.month': 2}}), which is much more awkward.

  • The season datetime shortcut now returns an array of string labels such ‘DJF’:

    In [92]: ds = xray.Dataset({'t': pd.date_range('2000-01-01', periods=12, freq='M')})
    
    In [93]: ds['t.season']
    Out[93]: 
    <xarray.DataArray 'season' (t: 12)>
    array(['DJF', 'DJF', 'MAM', 'MAM', 'MAM', 'JJA', 'JJA', 'JJA', 'SON', 'SON',
           'SON', 'DJF'], dtype='<U3')
    Coordinates:
      * t        (t) datetime64[ns] 2000-01-31 2000-02-29 ... 2000-11-30 2000-12-31
    

    Previously, it returned numbered seasons 1 through 4.

  • We have updated our use of the terms of “coordinates” and “variables”. What were known in previous versions of xray as “coordinates” and “variables” are now referred to throughout the documentation as “coordinate variables” and “data variables”. This brings xray in closer alignment to CF Conventions. The only visible change besides the documentation is that Dataset.vars has been renamed Dataset.data_vars.

  • You will need to update your code if you have been ignoring deprecation warnings: methods and attributes that were deprecated in xray v0.3 or earlier (e.g., dimensions, attributes`) have gone away.

Enhancements

  • Support for reindex() with a fill method. This provides a useful shortcut for upsampling:

    In [94]: data = xray.DataArray([1, 2, 3], [('x', range(3))])
    
    In [95]: data.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
    Out[95]: 
    <xarray.DataArray (x: 5)>
    array([1, 2, 2, 3, 3])
    Coordinates:
      * x        (x) float64 0.5 1.0 1.5 2.0 2.5
    

    This will be especially useful once pandas 0.16 is released, at which point xray will immediately support reindexing with method=’nearest’.

  • Use functions that return generic ndarrays with DataArray.groupby.apply and Dataset.apply (GH327 and GH329). Thanks Jeff Gerard!

  • Consolidated the functionality of dumps (writing a dataset to a netCDF3 bytestring) into to_netcdf() (GH333).

  • to_netcdf() now supports writing to groups in netCDF4 files (GH333). It also finally has a full docstring – you should read it!

  • open_dataset() and to_netcdf() now work on netCDF3 files when netcdf4-python is not installed as long as scipy is available (GH333).

  • The new Dataset.drop and DataArray.drop methods makes it easy to drop explicitly listed variables or index labels:

    # drop variables
    In [96]: ds = xray.Dataset({'x': 0, 'y': 1})
    
    In [97]: ds.drop('x')
    Out[97]: 
    <xarray.Dataset>
    Dimensions:  ()
    Data variables:
        y        int64 1
    
    # drop index labels
    In [98]: arr = xray.DataArray([1, 2, 3], coords=[('x', list('abc'))])
    
    In [99]: arr.drop(['a', 'c'], dim='x')
    Out[99]: 
    <xarray.DataArray (x: 1)>
    array([2])
    Coordinates:
      * x        (x) <U1 'b'
    
  • broadcast_equals() has been added to correspond to the new compat option.

  • Long attributes are now truncated at 500 characters when printing a dataset (GH338). This should make things more convenient for working with datasets interactively.

  • Added a new documentation example, Calculating Seasonal Averages from Timeseries of Monthly Means. Thanks Joe Hamman!

Bug fixes

  • Several bug fixes related to decoding time units from netCDF files (GH316, GH330). Thanks Stefan Pfenninger!
  • xray no longer requires decode_coords=False when reading datasets with unparseable coordinate attributes (GH308).
  • Fixed DataArray.loc indexing with ... (GH318).
  • Fixed an edge case that resulting in an error when reindexing multi-dimensional variables (GH315).
  • Slicing with negative step sizes (GH312).
  • Invalid conversion of string arrays to numeric dtype (GH305).
  • Fixed``repr()`` on dataset objects with non-standard dates (GH347).

Deprecations

  • dump and dumps have been deprecated in favor of to_netcdf().
  • drop_vars has been deprecated in favor of drop().

Future plans

The biggest feature I’m excited about working toward in the immediate future is supporting out-of-core operations in xray using Dask, a part of the Blaze project. For a preview of using Dask with weather data, read this blog post by Matthew Rocklin. See GH328 for more details.

v0.3.2 (23 December, 2014)

This release focused on bug-fixes, speedups and resolving some niggling inconsistencies.

There are a few cases where the behavior of xray differs from the previous version. However, I expect that in almost all cases your code will continue to run unmodified.

Warning

xray now requires pandas v0.15.0 or later. This was necessary for supporting TimedeltaIndex without too many painful hacks.

Backwards incompatible changes

  • Arrays of datetime.datetime objects are now automatically cast to datetime64[ns] arrays when stored in an xray object, using machinery borrowed from pandas:

    In [100]: from datetime import datetime
    
    In [101]: xray.Dataset({'t': [datetime(2000, 1, 1)]})
    Out[101]: 
    <xarray.Dataset>
    Dimensions:  (t: 1)
    Coordinates:
      * t        (t) datetime64[ns] 2000-01-01
    Data variables:
        *empty*
    
  • xray now has support (including serialization to netCDF) for TimedeltaIndex. datetime.timedelta objects are thus accordingly cast to timedelta64[ns] objects when appropriate.

  • Masked arrays are now properly coerced to use NaN as a sentinel value (GH259).

Enhancements

  • Due to popular demand, we have added experimental attribute style access as a shortcut for dataset variables, coordinates and attributes:

    In [102]: ds = xray.Dataset({'tmin': ([], 25, {'units': 'celcius'})})
    
    In [103]: ds.tmin.units
    Out[103]: 'celcius'
    

    Tab-completion for these variables should work in editors such as IPython. However, setting variables or attributes in this fashion is not yet supported because there are some unresolved ambiguities (GH300).

  • You can now use a dictionary for indexing with labeled dimensions. This provides a safe way to do assignment with labeled dimensions:

    In [104]: array = xray.DataArray(np.zeros(5), dims=['x'])
    
    In [105]: array[dict(x=slice(3))] = 1
    
    In [106]: array
    Out[106]: 
    <xarray.DataArray (x: 5)>
    array([1., 1., 1., 0., 0.])
    Dimensions without coordinates: x
    
  • Non-index coordinates can now be faithfully written to and restored from netCDF files. This is done according to CF conventions when possible by using the coordinates attribute on a data variable. When not possible, xray defines a global coordinates attribute.

  • Preliminary support for converting xray.DataArray objects to and from CDAT cdms2 variables.

  • We sped up any operation that involves creating a new Dataset or DataArray (e.g., indexing, aggregation, arithmetic) by a factor of 30 to 50%. The full speed up requires cyordereddict to be installed.

Bug fixes

  • Fix for to_dataframe() with 0d string/object coordinates (GH287)
  • Fix for to_netcdf with 0d string variable (GH284)
  • Fix writing datetime64 arrays to netcdf if NaT is present (GH270)
  • Fix align silently upcasts data arrays when NaNs are inserted (GH264)

Future plans

  • I am contemplating switching to the terms “coordinate variables” and “data variables” instead of the (currently used) “coordinates” and “variables”, following their use in CF Conventions (GH293). This would mostly have implications for the documentation, but I would also change the Dataset attribute vars to data.
  • I no longer certain that automatic label alignment for arithmetic would be a good idea for xray – it is a feature from pandas that I have not missed (GH186).
  • The main API breakage that I do anticipate in the next release is finally making all aggregation operations skip missing values by default (GH130). I’m pretty sick of writing ds.reduce(np.nanmean, 'time').
  • The next version of xray (0.4) will remove deprecated features and aliases whose use currently raises a warning.

If you have opinions about any of these anticipated changes, I would love to hear them – please add a note to any of the referenced GitHub issues.

v0.3.1 (22 October, 2014)

This is mostly a bug-fix release to make xray compatible with the latest release of pandas (v0.15).

We added several features to better support working with missing values and exporting xray objects to pandas. We also reorganized the internal API for serializing and deserializing datasets, but this change should be almost entirely transparent to users.

Other than breaking the experimental DataStore API, there should be no backwards incompatible changes.

New features

  • Added count() and dropna() methods, copied from pandas, for working with missing values (GH247, GH58).
  • Added DataArray.to_pandas for converting a data array into the pandas object with the same dimensionality (1D to Series, 2D to DataFrame, etc.) (GH255).
  • Support for reading gzipped netCDF3 files (GH239).
  • Reduced memory usage when writing netCDF files (GH251).
  • ‘missing_value’ is now supported as an alias for the ‘_FillValue’ attribute on netCDF variables (GH245).
  • Trivial indexes, equivalent to range(n) where n is the length of the dimension, are no longer written to disk (GH245).

Bug fixes

  • Compatibility fixes for pandas v0.15 (GH262).
  • Fixes for display and indexing of NaT (not-a-time) (GH238, GH240)
  • Fix slicing by label was an argument is a data array (GH250).
  • Test data is now shipped with the source distribution (GH253).
  • Ensure order does not matter when doing arithmetic with scalar data arrays (GH254).
  • Order of dimensions preserved with DataArray.to_dataframe (GH260).

v0.3 (21 September 2014)

New features

  • Revamped coordinates: “coordinates” now refer to all arrays that are not used to index a dimension. Coordinates are intended to allow for keeping track of arrays of metadata that describe the grid on which the points in “variable” arrays lie. They are preserved (when unambiguous) even though mathematical operations.
  • Dataset math Dataset objects now support all arithmetic operations directly. Dataset-array operations map across all dataset variables; dataset-dataset operations act on each pair of variables with the same name.
  • GroupBy math: This provides a convenient shortcut for normalizing by the average value of a group.
  • The dataset __repr__ method has been entirely overhauled; dataset objects now show their values when printed.
  • You can now index a dataset with a list of variables to return a new dataset: ds[['foo', 'bar']].

Backwards incompatible changes

  • Dataset.__eq__ and Dataset.__ne__ are now element-wise operations instead of comparing all values to obtain a single boolean. Use the method equals() instead.

Deprecations

  • Dataset.noncoords is deprecated: use Dataset.vars instead.
  • Dataset.select_vars deprecated: index a Dataset with a list of variable names instead.
  • DataArray.select_vars and DataArray.drop_vars deprecated: use reset_coords() instead.

v0.2 (14 August 2014)

This is major release that includes some new features and quite a few bug fixes. Here are the highlights:

  • There is now a direct constructor for DataArray objects, which makes it possible to create a DataArray without using a Dataset. This is highlighted in the refreshed tutorial.
  • You can perform aggregation operations like mean directly on Dataset objects, thanks to Joe Hamman. These aggregation methods also worked on grouped datasets.
  • xray now works on Python 2.6, thanks to Anna Kuznetsova.
  • A number of methods and attributes were given more sensible (usually shorter) names: labeled -> sel, indexed -> isel, select -> select_vars, unselect -> drop_vars, dimensions -> dims, coordinates -> coords, attributes -> attrs.
  • New load_data() and close() methods for datasets facilitate lower level of control of data loaded from disk.

v0.1.1 (20 May 2014)

xray 0.1.1 is a bug-fix release that includes changes that should be almost entirely backwards compatible with v0.1:

  • Python 3 support (GH53)
  • Required numpy version relaxed to 1.7 (GH129)
  • Return numpy.datetime64 arrays for non-standard calendars (GH126)
  • Support for opening datasets associated with NetCDF4 groups (GH127)
  • Bug-fixes for concatenating datetime arrays (GH134)

Special thanks to new contributors Thomas Kluyver, Joe Hamman and Alistair Miles.

v0.1 (2 May 2014)

Initial release.