I tried tonight to test the recent xgboost.dask
changes on a dask-kubernetes
cluster on EKS (per https://github.com/dmlc/xgboost/pull/6343#issuecomment-722494918).
Unfortunately, I ran into this error:
AttributeError: /opt/conda/envs/saturn/lib/libxgboost.so: undefined symbol: XGDMatrixSetDenseInfo
training code
I omitted the code I used to create my client (...[CLIENT CODE]...
) because it uses a dask-kubernetes
cluster provisioned with a commercial product. I can see that work is getting scheduled onto that cluster when the DaskDMatrix
is set up and when training starts, so I'm confident that that isn't the issue.
import os
import time
import dask.array as da
import xgboost as xgb
from dask.distributed import Client, wait
from dask_ml.metrics import mean_absolute_error
from dask_saturn import SaturnCluster
.....[CLIENT CODE].....
num_obs = 1e5
num_features = 50
X = da.random.random(
size=(num_obs, num_features),
chunks=(1000, num_features)
)
y = da.random.random(
size=(num_obs, 1),
chunks=(1000, 1)
)
X = X.persist()
_ = wait(X)
y = y.persist()
_ = wait(y)
dtrain = xgb.dask.DaskDMatrix(
client=client,
data=X,
label=y
)
bst = xgb.dask.train(
client=client,
params={
"verbosity": 2,
"tree_method": "hist",
"objective": "reg:squarederror"
},
dtrain=dtrain,
num_boost_round=10,
)
I installed xgboost
by cloning from latest master
(https://github.com/dmlc/xgboost/tree/fcfeb4959c6e361f2fd1cd18c3b61b598dc205ae).
sudo apt update
sudo apt-get install -y cmake build-essential
git clone https://github.com/dmlc/xgboost.git /tmp/xgboost
pushd /tmp/xgboost/python-package
git submodule init
git submodule update
python setup.py install
popd
full stacktrace
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-5-462f078606cf> in <module>
8 dtrain=dtrain,
9 num_boost_round=10,
---> 10 evals=[(dtrain, 'train')]
11 )
/srv/conda/envs/saturn/lib/python3.7/site-packages/xgboost/dask.py in train(client, params, dtrain, evals, early_stopping_rounds, *args, **kwargs)
742 return client.sync(
743 _train_async, client, params, dtrain=dtrain, *args, evals=evals,
--> 744 early_stopping_rounds=early_stopping_rounds, **kwargs)
745
746
/srv/conda/envs/saturn/lib/python3.7/site-packages/distributed/client.py in sync(self, func, asynchronous, callback_timeout, *args, **kwargs)
831 else:
832 return sync(
--> 833 self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
834 )
835
/srv/conda/envs/saturn/lib/python3.7/site-packages/distributed/utils.py in sync(loop, func, callback_timeout, *args, **kwargs)
338 if error[0]:
339 typ, exc, tb = error[0]
--> 340 raise exc.with_traceback(tb)
341 else:
342 return result[0]
/srv/conda/envs/saturn/lib/python3.7/site-packages/distributed/utils.py in f()
322 if callback_timeout is not None:
323 future = asyncio.wait_for(future, callback_timeout)
--> 324 result[0] = yield future
325 except Exception as exc:
326 error[0] = sys.exc_info()
/srv/conda/envs/saturn/lib/python3.7/site-packages/tornado/gen.py in run(self)
733
734 try:
--> 735 value = future.result()
736 except Exception:
737 exc_info = sys.exc_info()
/srv/conda/envs/saturn/lib/python3.7/site-packages/xgboost/dask.py in _train_async(client, params, dtrain, evals, early_stopping_rounds, *args, **kwargs)
705 futures.append(f)
706
--> 707 results = await client.gather(futures)
708 return list(filter(lambda ret: ret is not None, results))[0]
709
/srv/conda/envs/saturn/lib/python3.7/site-packages/distributed/client.py in _gather(self, futures, errors, direct, local_worker)
1849 exc = CancelledError(key)
1850 else:
-> 1851 raise exception.with_traceback(traceback)
1852 raise exc
1853 if errors == "skip":
/srv/conda/envs/saturn/lib/python3.7/site-packages/xgboost/dask.py in dispatched_train()
654 worker = distributed.get_worker()
655 with RabitContext(rabit_args):
--> 656 local_dtrain = _dmatrix_from_list_of_parts(**dtrain_ref)
657 local_evals = []
658 if evals_ref:
/opt/conda/envs/saturn/lib/python3.7/site-packages/xgboost/dask.py in _dmatrix_from_list_of_parts()
605 if is_quantile:
606 return _create_device_quantile_dmatrix(**kwargs)
--> 607 return _create_dmatrix(**kwargs)
608
609
/opt/conda/envs/saturn/lib/python3.7/site-packages/xgboost/dask.py in _create_dmatrix()
595 feature_names=feature_names,
596 feature_types=feature_types,
--> 597 nthread=worker.nthreads)
598 dmatrix.set_info(base_margin=base_margin, weight=weights,
599 label_lower_bound=label_lower_bound,
/opt/conda/envs/saturn/lib/python3.7/site-packages/xgboost/core.py in __init__()
506 self.handle = handle
507
--> 508 self.set_info(label=label, weight=weight, base_margin=base_margin)
509
510 self.feature_names = feature_names
/opt/conda/envs/saturn/lib/python3.7/site-packages/xgboost/core.py in inner_f()
419 for k, arg in zip(sig.parameters, args):
420 kwargs[k] = arg
--> 421 return f(**kwargs)
422
423 return inner_f
/opt/conda/envs/saturn/lib/python3.7/site-packages/xgboost/core.py in set_info()
527 '''Set meta info for DMatrix.'''
528 if label is not None:
--> 529 self.set_label(label)
530 if weight is not None:
531 self.set_weight(weight)
/opt/conda/envs/saturn/lib/python3.7/site-packages/xgboost/core.py in set_label()
656 """
657 from .data import dispatch_meta_backend
--> 658 dispatch_meta_backend(self, label, 'label', 'float')
659
660 def set_weight(self, weight):
/opt/conda/envs/saturn/lib/python3.7/site-packages/xgboost/data.py in dispatch_meta_backend()
663 return
664 if _is_numpy_array(data):
--> 665 _meta_from_numpy(data, name, dtype, handle)
666 return
667 if _is_pandas_df(data):
/opt/conda/envs/saturn/lib/python3.7/site-packages/xgboost/data.py in _meta_from_numpy()
597 ptr = interface['data'][0]
598 ptr = ctypes.c_void_p(ptr)
--> 599 _check_call(_LIB.XGDMatrixSetDenseInfo(
600 handle,
601 c_str(field),
/opt/conda/envs/saturn/lib/python3.7/ctypes/__init__.py in __getattr__()
375 if name.startswith('__') and name.endswith('__'):
376 raise AttributeError(name)
--> 377 func = self.__getitem__(name)
378 setattr(self, name, func)
379 return func
/opt/conda/envs/saturn/lib/python3.7/ctypes/__init__.py in __getitem__()
380
381 def __getitem__(self, name_or_ordinal):
--> 382 func = self._FuncPtr((name_or_ordinal, self))
383 if not isinstance(name_or_ordinal, int):
384 func.__name__ = name_or_ordinal
AttributeError: /opt/conda/envs/saturn/lib/libxgboost.so: undefined symbol: XGDMatrixSetDenseInfo
output of conda info`
active environment : saturn
active env location : /opt/conda/envs/saturn
shell level : 2
user config file : /home/jovyan/.condarc
populated config files : /opt/conda/.condarc
conda version : 4.8.2
conda-build version : not installed
python version : 3.7.7.final.0
virtual packages : __glibc=2.28
base environment : /opt/conda (writable)
channel URLs : https://conda.saturncloud.io/pkgs/linux-64
https://conda.saturncloud.io/pkgs/noarch
https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /opt/conda/pkgs
/home/jovyan/.conda/pkgs
envs directories : /opt/conda/envs
/home/jovyan/.conda/envs
platform : linux-64
user-agent : conda/4.8.2 requests/2.22.0 CPython/3.7.7 Linux/4.14.193-149.317.amzn2.x86_64 debian/10 glibc/2.28
UID:GID : 1000:100
netrc file : None
offline mode : False
I'll try to come up with a reproducible example using dask-cloudprovider
so that' it's 100% reproducible (no redacted code).
I think you have an outdated libxgboost.so
. Could you please check your image?
Yep you're right, I ran find / -name "libxgboost.so"
and found that even this doesn't remove some old libxgboost.so
that I have on PATH
conda uninstall -y xgboost
pip uninstall -y xgboost
I'll remove the old library from my image and try again, thanks.
Thanks for testing! Feel free to let me know if there's anything I can help. I will close this one now as this specific issue is resolved.
@trivialfis I'm very happy to tell you that after I was able to clear out my old libxgboost.so
s, training worked on dask-kubernetes
+ EKS! (using the code snippet I shared above).
Thanks for all the great work!!! :tada:
Most helpful comment
@trivialfis I'm very happy to tell you that after I was able to clear out my old
libxgboost.so
s, training worked ondask-kubernetes
+ EKS! (using the code snippet I shared above).Thanks for all the great work!!! :tada: