Xarray: Crash with Tensorflow when using "to_netcdf"

Created on 29 May 2020  路  6Comments  路  Source: pydata/xarray

Not sure why this issue #3828 was closed (@max-sixty @sjh11556 ). I am getting the same error for exactly the same test code as @sjh11556, so opening the issue with same title as before.

Test Code

import tensorflow as tf
import xarray as xr
import numpy as np
data=xr.DataArray(data=np.zeros([4,5]),dims=['lat','lon'])
data.to_netcdf("test.nc")
print("data has been written to test.nc")

Expected Output

data has been written to test.nc

Problem

>>> import tensorflow as tf
>>> import xarray as xr
>>> import numpy as np
>>> data=xr.DataArray(data=np.zeros([4,5]),dims=['lat','lon'])
>>> data.to_netcdf("test.nc")
Traceback (most recent call last):
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/backends/api.py", line 1089, in to_netcdf
    dataset, store, writer, encoding=encoding, unlimited_dims=unlimited_dims
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/backends/api.py", line 1135, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/backends/common.py", line 298, in store
    variables, check_encoding_set, writer, unlimited_dims=unlimited_dims
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/backends/common.py", line 339, in set_variables
    writer.add(source, target)
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/backends/common.py", line 188, in add
    target[...] = source
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 51, in __setitem__
    data[key] = value
  File "netCDF4/_netCDF4.pyx", line 4950, in netCDF4._netCDF4.Variable.__setitem__
  File "netCDF4/_netCDF4.pyx", line 5229, in netCDF4._netCDF4.Variable._put
  File "netCDF4/_netCDF4.pyx", line 1887, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  **File "<stdin>", line 1, in <module>**
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/core/dataarray.py", line 2353, in to_netcdf
    return dataset.to_netcdf(*args, **kwargs)
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/core/dataset.py", line 1545, in to_netcdf
    invalid_netcdf=invalid_netcdf,
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/backends/api.py", line 1104, in to_netcdf
    store.close()
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 492, in close
    self._manager.close(**kwargs)
  File "/home/box/cleanenv/lib/python3.7/site-packages/xarray/backends/file_manager.py", line 221, in close
    file.close()
  File "netCDF4/_netCDF4.pyx", line 2485, in **netCDF4._netCDF4.Dataset.close**
  File "netCDF4/_netCDF4.pyx", line 2449, in **netCDF4._netCDF4.Dataset._close**
  File "netCDF4/_netCDF4.pyx", line 1887, in **netCDF4._netCDF4._ensure_nc_success**
**RuntimeError: NetCDF: HDF error**

Some observations,

  1. Unlike @sjh11556, using tensorflow==2.1.0 doesn't give any error.
  2. Works fine if Tensorflow is not imported.
  3. Tested with a clean environment by installing only tensorflow==2.0.0, xarray== 0.15.0, netCDF4==1.5.3 .
  4. Here's a pip freeze list of my clean environment.

pip freeze

absl-py==0.9.0
astor==0.8.1
cachetools==4.1.0
certifi==2020.4.5.1
cftime==1.1.3
chardet==3.0.4
gast==0.2.2
google-auth==1.16.0
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.29.0
h5py==2.10.0
idna==2.9
importlib-metadata==1.6.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
Markdown==3.2.2
netCDF4==1.5.3
numpy==1.18.4
oauthlib==3.1.0
opt-einsum==3.2.1
pandas==1.0.4
protobuf==3.12.2
pyasn1==0.4.8
pyasn1-modules==0.2.8
python-dateutil==2.8.1
pytz==2020.1
requests==2.23.0
requests-oauthlib==1.3.0
rsa==4.0
six==1.15.0
tensorboard==2.0.2
tensorflow==2.0.0
tensorflow-estimator==2.0.1
termcolor==1.1.0
urllib3==1.25.9
Werkzeug==1.0.1
wrapt==1.12.1
xarray==0.15.0
zipp==3.1.0

UPDATE

Versions

Output of xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.6 (default, Feb 15 2020, 17:41:03)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 2.6.32-573.12.1.el6.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: en_US.UTF-8
LANG: C
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.3
xarray: 0.15.0
pandas: 1.0.4
numpy: 1.18.4
scipy: 1.4.1
netCDF4: 1.5.3
pydap: None
h5netcdf: None
h5py: 2.10.0
Nio: None
zarr: None
cftime: 1.1.3
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
setuptools: 41.2.0
pip: 19.2.3
conda: None
pytest: None
IPython: None
sphinx: None

Most helpful comment

Tensorflow pulls in h5py, and imports it by default, but pypi builds of h5py and netcdf4-python are incompatible (see https://github.com/Unidata/netcdf4-python/issues/694 and other issues linked there). A work-around is to pip uninstall h5py if you don't need it, or perhaps to use conda.

All 6 comments

@aakash30jan The other issue was closed because it could not be reproduced by @max-sixty because of missing information. There is guidance in the issue template what information should be supplied.

Here the output of xr.show_versions() is missing. Please add all missing information which is requested in the issue template.

there's not too much of the show_versions output that's missing, but we need more information about how your environment was set up to be able to reproduce this. Did you use conda or virtualenv / pip or something else (pipenv, poetry, system package manager, etc.)? What are the versions of non-python libraries? Most important are the versions of libhdf5 and libnetcdf. As pointed out above, we usually ask to provide the output of xr.show_versions() because it provides all that information (it doesn't collect information about tensorflow, though).

Anyway, about the issue: if importing tensorflow changes the way xarray and netcdf work without actually being involved, I suspect that either tensorflow's package alters the hdf5 or netcdf modules or the libhdf5 / libnetcdf libraries (does it fail if you don't import tensorflow?) or the import of tensorflow messes with these libraries.

@kmuehlbauer @keewis
I have updated with information from show_versions

I used simply python3 -m venv mycleanenv to create an environment and then used pip to install the packages. If I don't import TensorFlow, the rest of my sample code does work without any issue. For the probably problematic libraries, the versions are libhdf5==1.10.4 and
libnetcdf==4.6.3 . Is there any way I can get more debugging information on this?

thanks, @aakash30jan, with this I can reproduce your issue. I modified your code sample to

import xarray as xr, numpy as np
import tensorflow
import netCDF4  # if it imported before tensorflow, the error does not occur
xr.DataArray(data=np.zeros([4,5]),dims=['lat','lon']).to_netcdf("test.nc")

which confirms that tensorflow does something weird to either netCDF4 or its dependencies.

To confirm, let's eliminate both xarray and numpy:

import netCDF4 as nc
import tensorflow

filename = "test.nc"
rootgrp = nc.Dataset(filename, "w")
dim = rootgrp.createDimension("dim", 3)
data = rootgrp.createVariable("test", "i4", ("dim",))
data[:] = [0, 1, 2]
rootgrp.close()

and again that error comes up if we import tensorflow before netCDF4. This means that this is actually bug in tensorflow (or their package on PyPI) and I think you should ask this on their issue tracker. Feel free to reuse / modify my code sample if that helps.

I'm closing this issue for now but feel free to reopen if you still have any questions about this.

Tensorflow pulls in h5py, and imports it by default, but pypi builds of h5py and netcdf4-python are incompatible (see https://github.com/Unidata/netcdf4-python/issues/694 and other issues linked there). A work-around is to pip uninstall h5py if you don't need it, or perhaps to use conda.

Thanks @keewis.
Thanks @ihnorton

Was this page helpful?
0 / 5 - 0 ratings