rasa_nlu.train - missing package sklearn

Created on 11 May 2017  路  24Comments  路  Source: RasaHQ/rasa

rasa NLU version (e.g. 0.7.3): 0.8.5

Used backend / pipeline (mitie, spacy_sklearn, ...): spacy_sklearn

Operating system (windows, osx, ...): Windows 7 64 bit/Python 2.7

Issue:

the training module fails to run, complains about sklearn not installed
C:\dev\rasa-nlu>python -m rasa_nlu.train -c config_spacy.json Traceback (most recent call last): File "C:\Python27\lib\runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "C:\Python27\lib\runpy.py", line 72, in _run_code exec code in run_globals File "C:\Python27\lib\site-packages\rasa_nlu\train.py", line 83, in <module> do_train(config) File "C:\Python27\lib\site-packages\rasa_nlu\train.py", line 70, in do_train trainer = Trainer(config, component_builder) File "C:\Python27\lib\site-packages\rasa_nlu\model.py", line 125, in __init__ components.validate_requirements(config.pipeline) File "C:\Python27\lib\site-packages\rasa_nlu\components.py", line 103, in validate_requirements " ".join(failed_imports))) Exception: Not all required packages are installed. Please install sklearn

I do have scikit-learn (0.18.1) installed

pip install sklearn got sklearn 0.0 that didn't resolve the problem

Content of configuration file (if used & relevant):

stock config_spacy.json
windows type

All 24 comments

This error indicates that the application couldn't import sklearn. Can you run import sklearn in the same environment as you are using to run rasa NLU? If so can you provide a pip list please.

My pip list:
appdirs (1.4.3)
boto3 (1.4.4)
botocore (1.5.47)
click (6.7)
cloudpickle (0.2.2)
cymem (1.31.2)
cytoolz (0.8.2)
dill (0.2.6)
docutils (0.13.1)
en-core-web-sm (1.2.0)
Flask (0.12.1)
ftfy (4.4.2)
functools32 (3.2.3.post2)
future (0.16.0)
futures (3.1.1)
gevent (1.2.1)
greenlet (0.4.12)
html5lib (0.999999999)
itsdangerous (0.24)
Jinja2 (2.9.6)
jmespath (0.9.2)
jsonschema (2.6.0)
MarkupSafe (1.0)
murmurhash (0.26.4)
numpy (1.12.1)
packaging (16.8)
pathlib (1.0.1)
pip (9.0.1)
plac (0.9.6)
preshed (1.0.0)
pyparsing (2.2.0)
pyreadline (2.1)
python-crfsuite (0.9.2)
python-dateutil (2.6.0)
rasa-nlu (0.8.5)
regex (2017.4.5)
requests (2.14.2)
s3transfer (0.1.10)
scikit-learn (0.18.1)
setuptools (35.0.2)
six (1.10.0)
sklearn (0.0)
spacy (1.8.2)
termcolor (1.1.0)
thinc (6.5.2)
toolz (0.8.2)
tqdm (4.11.2)
typing (3.6.1)
ujson (1.35)
wcwidth (0.1.7)
webencodings (0.5.1)
Werkzeug (0.12.1)
wrapt (1.10.10)

import sklearn shows an error:
" File "C:\Python27\lib\site-packagessklearn\base.py", line 10, in
from scipy import sparse
ImportError: No module named scipy"

I tried pip install scipy and got this error:
`
Collecting scipy
Using cached scipy-0.19.0.zip
Installing collected packages: scipy
Running setup.py install for scipy: started
Running setup.py install for scipy: finished with status 'error'
Complete output from command c:\python27\python.exe -u -c "import setuptools, tokenize;__file__='c:\users\MYUSER\appdata\local\temp\pip-build-lp_eox\scipy\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record c:\users\MYUSER\appdata\local\temp\pip-l8rmty-record\install-record.txt --single-version-externally-managed --compile:

Note: if you need reliable uninstall behavior, then install
with pip instead of using setup.py install:

  - pip install .       (from a git repo or downloaded source
                           release)
  - pip install scipy   (last SciPy release on PyPI)


lapack_opt_info:
lapack_mkl_info:
  libraries mkl_rt not found in ['c:\\python27\\lib', 'C:\\', 'c:\\python27\\libs']
  NOT AVAILABLE

openblas_lapack_info:
  libraries openblas not found in ['c:\\python27\\lib', 'C:\\', 'c:\\python27\\libs']
  NOT AVAILABLE

atlas_3_10_threads_info:
Setting PTATLAS=ATLAS
c:\python27\lib\site-packages\numpy\distutils\system_info.py:1051: UserWarning: Specified path C:\projects\numpy-wheels\windows-wheel-builder\atlas-builds\atlas-3.10.1-sse2-32\lib is invalid.
  pre_dirs = system_info.get_paths(self, section, key)
<class 'numpy.distutils.system_info.atlas_3_10_threads_info'>
  NOT AVAILABLE

atlas_3_10_info:
<class 'numpy.distutils.system_info.atlas_3_10_info'>
  NOT AVAILABLE

atlas_threads_info:
Setting PTATLAS=ATLAS
<class 'numpy.distutils.system_info.atlas_threads_info'>
  NOT AVAILABLE

atlas_info:
<class 'numpy.distutils.system_info.atlas_info'>
  NOT AVAILABLE

c:\python27\lib\site-packages\numpy\distutils\system_info.py:572: UserWarning:
    Atlas (http://math-atlas.sourceforge.net/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [atlas]) or by setting
    the ATLAS environment variable.
  self.calc_info()
lapack_info:
  libraries lapack not found in ['c:\\python27\\lib', 'C:\\', 'c:\\python27\\libs']
  NOT AVAILABLE

c:\python27\lib\site-packages\numpy\distutils\system_info.py:572: UserWarning:
    Lapack (http://www.netlib.org/lapack/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [lapack]) or by setting
    the LAPACK environment variable.
  self.calc_info()
lapack_src_info:
  NOT AVAILABLE

c:\python27\lib\site-packages\numpy\distutils\system_info.py:572: UserWarning:
    Lapack (http://www.netlib.org/lapack/) sources not found.
    Directories to search for the sources can be specified in the
    numpy/distutils/site.cfg file (section [lapack_src]) or by setting
    the LAPACK_SRC environment variable.
  self.calc_info()
  NOT AVAILABLE

Running from scipy source directory.
non-existing path in 'scipy\\integrate': 'quadpack.h'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\users\MYUSER\appdata\local\temp\pip-build-lp_eox\scipy\setup.py", line 416, in <module>
    setup_package()
  File "c:\users\MYUSER\appdata\local\temp\pip-build-lp_eox\scipy\setup.py", line 412, in setup_package
    setup(**metadata)
  File "c:\python27\lib\site-packages\numpy\distutils\core.py", line 135, in setup
    config = configuration()
  File "c:\users\MYUSER\appdata\local\temp\pip-build-lp_eox\scipy\setup.py", line 336, in configuration
    config.add_subpackage('scipy')
  File "c:\python27\lib\site-packages\numpy\distutils\misc_util.py", line 1001, in add_subpackage
    caller_level = 2)
  File "c:\python27\lib\site-packages\numpy\distutils\misc_util.py", line 970, in get_subpackage
    caller_level = caller_level + 1)
  File "c:\python27\lib\site-packages\numpy\distutils\misc_util.py", line 907, in _get_configuration_from_setup_py
    config = setup_module.configuration(*args)
  File "scipy\setup.py", line 15, in configuration
    config.add_subpackage('linalg')
  File "c:\python27\lib\site-packages\numpy\distutils\misc_util.py", line 1001, in add_subpackage
    caller_level = 2)
  File "c:\python27\lib\site-packages\numpy\distutils\misc_util.py", line 970, in get_subpackage
    caller_level = caller_level + 1)
  File "c:\python27\lib\site-packages\numpy\distutils\misc_util.py", line 907, in _get_configuration_from_setup_py
    config = setup_module.configuration(*args)
  File "scipy\linalg\setup.py", line 20, in configuration
    raise NotFoundError('no lapack/blas resources found')
numpy.distutils.system_info.NotFoundError: no lapack/blas resources found

----------------------------------------"`

On windows you should be going with Anaconda otherwise you need to install a lapack/blas package for the computations of scipy.

I install Anaconda and also "conda install scikit-learn". I can see it listed with "conda list".
Yet the original command fails again with the same error. Python fails to import scipy
"ImportError: No module named scipy"

I never used conda. How does the conda scipy installation affects the python runtime?
Should I reinstall Python? or is there a python inside conda?

yes anaconda is a python distribution & comes with its own python. when you conda install that installs into a specific environtment (probably root in your case), and that's the python you need (probably at ~/anaconda3/bin/python or something like that). Check anaconda docs for setting up the environment

Got it.

I installed a py27 conda environment:
conda create -n py27 python=2.7 anaconda

Then activated it:
activate py27

And verified this with:
conda info --envs

Next I installed spacy using pip (inside the activated conda env):
pip install -U spacy

downloaded he spacy english module:
python -m spacy download en

EDIT: That fails on permission issue too
Traceback (most recent call last): File "C:\ProgramData\Anaconda2\envs\py27\lib\runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "C:\ProgramData\Anaconda2\envs\py27\lib\runpy.py", line 72, in _run_code exec code in run_globals File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\spacy\__main__.py", line 133, in <module> plac.Interpreter.call(CLI) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\plac_ext.py", line 1142, in call print(out) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\plac_ext.py", line 914, in __exit__ self.close(exctype, exc, tb) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\plac_ext.py", line 952, in close self._interpreter.throw(exctype, exc, tb) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\plac_ext.py", line 964, in _make_interpreter arglist = yield task File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\plac_ext.py", line 1139, in call raise_(task.etype, task.exc, task.tb) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\plac_ext.py", line 380, in _wrap for value in genobj: File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\plac_ext.py", line 95, in gen_exc raise_(etype, exc, tb) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\plac_ext.py", line 966, in _make_interpreter cmd, result = self.parser.consume(arglist) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\plac_core.py", line 207, in consume return cmd, self.func(*(args + varargs + extraopts), **kwargs) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\spacy\__main__.py", line 33, in download cli_download(model, direct) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\spacy\cli\download.py", line 24, in download link_package(model_name, model, force=True) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\spacy\cli\link.py", line 27, in link_package symlink(model_path, link_name, force) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\spacy\cli\link.py", line 44, in symlink link_path.unlink() File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\pathlib.py", line 1131, in unlink self._accessor.unlink(self) File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\pathlib.py", line 346, in wrapped return strfunc(str(pathobj), *args) WindowsError: [Error 5] Access is denied: 'C:\\ProgramData\\Anaconda2\\envs\\py27\\lib\\site-packages\\spacy\\data\\en'

and installed rasa_nlu:
pip install rasa_nlu

At this point the nlu serve is running on port 500 and returning hello:
python -m rasa_nlu.server

However the training still fails:
Exception: Not all required packages are installed. Please install pycrfsuite

So run the following command:
conda install -c conda-forge python-crfsuite

And now the training runs but fails with an exception:
"(py27) C:\dev\rasa-nlu>python -m rasa_nlu.train -c config_spacy.json
INFO:root:Trying to load spacy model with name 'en'
INFO:root:Added 'nlp_spacy' to component cache. Key 'nlp_spacy-en'.
INFO:root:Training data format at ./data/demo-rasa.json is rasa_nlu
INFO:root:Training data stats:
- intent examples: 39 (4 distinct intents)
- found intents: affirm, goodbye, greet, restaurant_search
- entity examples: 7 (2 distinct entities)
- found entities: cuisine, location

INFO:root:Starting to train component nlp_spacy
INFO:root:Finished training component.
INFO:root:Starting to train component ner_crf
Traceback (most recent call last):
File "C:\ProgramData\Anaconda2\envs\py27\lib\runpy.py", line 174, in _run_module_as_main
"__main__", fname, loader, pkg_name)
File "C:\ProgramData\Anaconda2\envs\py27\lib\runpy.py", line 72, in _run_code
exec code in run_globals
File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\rasa_nlu\train.py", line 83, in
do_train(config)
File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\rasa_nlu\train.py", line 73, in do_train
interpreter = trainer.train(training_data)
File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\rasa_nlu\model.py", line 157, in train
updates = component.train(*args)
File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\rasa_nlu\extractors\crf_entity_extractor.py", line 80, in t
rain
self._train_model(dataset)
File "C:\ProgramData\Anaconda2\envs\py27\lib\site-packages\rasa_nlu\extractors\crf_entity_extractor.py", line 312, in
_train_model
self.ent_tagger.open(self.crf_file.name)
File "pycrfsuite/_pycrfsuite.pyx", line 571, in pycrfsuite._pycrfsuite.Tagger.open (pycrfsuite/_pycrfsuite.cpp:7731)
File "pycrfsuite/_pycrfsuite.pyx", line 717, in pycrfsuite._pycrfsuite.Tagger._check_model (pycrfsuite/_pycrfsuite.cpp
:10037)
IOError: [Errno 13] Permission denied: 'c:\users\USERNAME\appdata\local\temp\tmpttlwte'"

So a few things:

  1. The current documentation is very partial, making assumptions and missing many parts
  2. It seems that there are missing dependencies in the rasa_nlu packages

I don't mind modifying and fixing the documentation to actually show a noob rasa user how to install it correctly step by step.

How do I resolve this permission issue please?

@bigman73 If you could modify and improve the documentation that would be awesome and especially since I am not using windows it is hard for me / us to make the documentation really good there. So any improvement there would be really appreciated :+1:

Concerning the permission denied error, this is a known issue #357 that needs fixing.

I tried to switch to sklearn-crfsuite (uninstalled python-crfsuite first) but python-crfsuite is a dependency and it got installed again. Error remains.
Also the spacy downloading of the en model has a permission issue. (see above)
I understand that some of these issues are purely 3rd party but the bottom line is that on Windows it is impossible for me to run rasa_nlu -> show stopper
Yes, I can create probably a linux vm just for rasa, but that is not a viable solution beyond the development phase.

Right, so the best solution is using docker. I will have a look at the crfsuite later, but it is not just done by installing the other one as this is a code dependency.

If there is an issue with the installation of spacy models, please get in contact with the spacy guys as that is nothing that is specific to rasa NLU.

Thanks, I'll try Docker.

Got the same problem on a RPi3 with python 3.6.1 after installing everything according to the manual but I installed rasa_nlu in a virtual environment. I chose spaCy and sklearn and used conda to install it.
I used the demo config.json and put it into ~. The path in the config file has been changed to ./demo-rasa.json as I run it from ~

I tried installing the packages with pip which changed the missing packages until I ended up with this:
Exception: Not all required packages are installed. Please install pycrfsuite

@tedstriker did you try to install the pycrfsuite package?

To install sklearn with pip, scipy is needed.
For scipy to be installed, several packages and libs need to exist. -> sudo apt-get install liblapack-dev gfortran
After installing them manually scipy could be installed as well and the training went through successfully.
For unknown reason pycrfsuite was not missing anymore...

So up and running, @tedstriker? Would be awesome if you amend the documentation where you feel it has shortcomings :+1:

@tmbo the demo is running like a charm now. Thanks for that.
It's actually the anaconda recommendation I'd omit as it's not working as expected.
Installing scipy, spacy, and sklearn via pip seemed to actually do the trick but I don't remember the sequence.
sudo apt-get install liblapack-dev gfortran are the packages I had to install beforehand, but I can't tell if this is all what's necessary. I might have had installed other package already.

Despite getting it up and running I don't know what changes / installations finally contributed to the aim.

But when I do a fresh install on a new Pi (rasa uses way too much ram to run it next to other services) I'll note what I installed and when and propose an update for the installation docs.

@tedstriker concerning the resources: there is a rather small language model from spacy that should reduce the amount of memory used (but still its python - so there is always some resource overhead there)

hey @tmbo can you tell how can i solve this problem.i installed spacy,rasa-nlu,scikit and all the reuirements. but i found an error during i run my python file trainer.py ,here it is:
from rasa_nlu.converters import load_data
from rasa_nlu.config import RasaNLUConfig
from rasa_nlu.model import Trainer

training_data = load_data('C:\Users\ABC\Desktop\project\rasa_nlu-master\rasa_nlu-master\data\examples\rasa\demo-rasa.json')
trainer = Trainer(RasaNLUConfig("C:\Users\ABC\Desktop\project\rasa_nlu-master\rasa_nlu-master\config_spacy.json"))
trainer.train(training_data)
model_directory = trainer.persist('./models/') # Returns the directory the model is stored in
and the error is:
C:\Users\ABC\Desktop\project>python trainer.py
Traceback (most recent call last):
File "trainer.py", line 6, in
trainer = Trainer(RasaNLUConfig("C:\Users\ABC\Desktop\project\rasa_nlu-master\rasa_nlu-master\config_spacy.json"))
File "C:\Users\ABC\Miniconda3\lib\site-packages\rasa_nlu\model.py", line 125, in __init__
components.validate_requirements(config.pipeline)
File "C:\Users\ABC\Miniconda3\lib\site-packages\rasa_nlu\components.py", line 103, in validate_requirements
" ".join(failed_imports)))
Exception: Not all required packages are installed. Please install spacy

This worked for me. I'm not using conda.

pip install python-crfsuite scipy

I am moving my working model from my mac to an virtualenv and I face the exact same problem
Exception: Failed to validate at component 'ner_crf'. Missing property: 'tokens'
I tried all sorts of orders for the installation and nothing helped.

does any of you have any idea?

Thanks you

You have probably trained the model with an older version (you can check the version in the model folder in the metadata.json). The easiest solution is to either install the version the model was trained with or retrain the model with the new version.

@tmbo
Hi,
I am trying to re-run the code that trains the model.
I cloned the most updated version from github and I can not run it

If you provide me with a bit more information I might be able to help you. This includes:

  • your previous Rasa NLU version
  • the error
  • what you are executing (code, command, ...)

And please create a separate issue.

Was this page helpful?
0 / 5 - 0 ratings