Rasa version: 1.10.0
Python version: Python 3.7.3
Operating system (windows, osx, ...): osx
Issue: HFTransformersNLP dose not work with pretrained Japanese BERT models.
Error (including full traceback):
Training NLU model...
2020-04-29 14:00:19 INFO transformers.file_utils - TensorFlow version 2.1.0 available.
2020-04-29 14:00:20 INFO transformers.tokenization_utils - Model name 'bert-base-japanese-whole-word-masking' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). Assuming 'bert-base-japanese-whole-word-masking' is a path, a model identifier, or url to a directory containing tokenizer files.
Traceback (most recent call last):
File "/Users/atsushiharada/.pyenv/versions/transformer-on-rasa/bin/rasa", line 10, in <module>
sys.exit(main())
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/__main__.py", line 91, in main
cmdline_arguments.func(cmdline_arguments)
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/cli/train.py", line 140, in train_nlu
persist_nlu_training_data=args.persist_nlu_data,
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/train.py", line 414, in train_nlu
persist_nlu_training_data,
File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/train.py", line 453, in _train_nlu_async
persist_nlu_training_data=persist_nlu_training_data,
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/train.py", line 482, in _train_nlu_with_validated_data
persist_nlu_training_data=persist_nlu_training_data,
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/train.py", line 75, in train
trainer = Trainer(nlu_config, component_builder)
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/model.py", line 145, in __init__
self.pipeline = self._build_pipeline(cfg, component_builder)
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/model.py", line 157, in _build_pipeline
component = component_builder.create_component(component_cfg, cfg)
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/components.py", line 769, in create_component
component = registry.create_component_by_config(component_config, cfg)
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/registry.py", line 246, in create_component_by_config
return component_class.create(component_config, config)
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/components.py", line 483, in create
return cls(component_config)
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/utils/hugging_face/hf_transformers.py", line 47, in __init__
self._load_model()
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/utils/hugging_face/hf_transformers.py", line 81, in _load_model
self.model_weights, cache_dir=self.cache_dir
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 393, in from_pretrained
return cls._from_pretrained(*inputs, **kwargs)
File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 497, in _from_pretrained
list(cls.vocab_files_names.values()),
OSError: Model name 'bert-base-japanese-whole-word-masking' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed 'bert-base-japanese-whole-word-masking' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.
Command or request that led to error:
$ rasa train nlu
Content of configuration file (config.yml) (if relevant):
# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: ja_ginza
pipeline:
# - name: "SpacyNLP"
# - name: "SpacyTokenizer"
- name: HFTransformersNLP
model_name: "bert"
model_weights: "bert-base-japanese-whole-word-masking"
- name: "LanguageModelTokenizer"
- name: "CRFEntityExtractor"
# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
- name: MemoizationPolicy
- name: MappingPolicy
there is a discussion in the linked PR