Rasa: HFTransformersNLP dose not work with pretrained Japanese BERT models

Created on 29 Apr 2020  路  2Comments  路  Source: RasaHQ/rasa

Rasa version: 1.10.0

Python version: Python 3.7.3

Operating system (windows, osx, ...): osx

Issue: HFTransformersNLP dose not work with pretrained Japanese BERT models.

Error (including full traceback):

Training NLU model...
2020-04-29 14:00:19 INFO     transformers.file_utils  - TensorFlow version 2.1.0 available.
2020-04-29 14:00:20 INFO     transformers.tokenization_utils  - Model name 'bert-base-japanese-whole-word-masking' not found in model shortcut name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). Assuming 'bert-base-japanese-whole-word-masking' is a path, a model identifier, or url to a directory containing tokenizer files.
Traceback (most recent call last):
  File "/Users/atsushiharada/.pyenv/versions/transformer-on-rasa/bin/rasa", line 10, in <module>
    sys.exit(main())
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/__main__.py", line 91, in main
    cmdline_arguments.func(cmdline_arguments)
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/cli/train.py", line 140, in train_nlu
    persist_nlu_training_data=args.persist_nlu_data,
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/train.py", line 414, in train_nlu
    persist_nlu_training_data,
  File "uvloop/loop.pyx", line 1456, in uvloop.loop.Loop.run_until_complete
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/train.py", line 453, in _train_nlu_async
    persist_nlu_training_data=persist_nlu_training_data,
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/train.py", line 482, in _train_nlu_with_validated_data
    persist_nlu_training_data=persist_nlu_training_data,
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/train.py", line 75, in train
    trainer = Trainer(nlu_config, component_builder)
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/model.py", line 145, in __init__
    self.pipeline = self._build_pipeline(cfg, component_builder)
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/model.py", line 157, in _build_pipeline
    component = component_builder.create_component(component_cfg, cfg)
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/components.py", line 769, in create_component
    component = registry.create_component_by_config(component_config, cfg)
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/registry.py", line 246, in create_component_by_config
    return component_class.create(component_config, config)
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/components.py", line 483, in create
    return cls(component_config)
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/utils/hugging_face/hf_transformers.py", line 47, in __init__
    self._load_model()
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/rasa/nlu/utils/hugging_face/hf_transformers.py", line 81, in _load_model
    self.model_weights, cache_dir=self.cache_dir
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 393, in from_pretrained
    return cls._from_pretrained(*inputs, **kwargs)
  File "/Users/atsushiharada/.pyenv/versions/3.7.3/envs/transformer-on-rasa/lib/python3.7/site-packages/transformers/tokenization_utils.py", line 497, in _from_pretrained
    list(cls.vocab_files_names.values()),
OSError: Model name 'bert-base-japanese-whole-word-masking' was not found in tokenizers model name list (bert-base-uncased, bert-large-uncased, bert-base-cased, bert-large-cased, bert-base-multilingual-uncased, bert-base-multilingual-cased, bert-base-chinese, bert-base-german-cased, bert-large-uncased-whole-word-masking, bert-large-cased-whole-word-masking, bert-large-uncased-whole-word-masking-finetuned-squad, bert-large-cased-whole-word-masking-finetuned-squad, bert-base-cased-finetuned-mrpc, bert-base-german-dbmdz-cased, bert-base-german-dbmdz-uncased, bert-base-finnish-cased-v1, bert-base-finnish-uncased-v1, bert-base-dutch-cased). We assumed 'bert-base-japanese-whole-word-masking' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.txt'] but couldn't find such vocabulary files at this path or url.

Command or request that led to error:

$ rasa train nlu

Content of configuration file (config.yml) (if relevant):

# Configuration for Rasa NLU.
# https://rasa.com/docs/rasa/nlu/components/
language: ja_ginza
pipeline:
  # - name: "SpacyNLP"
  # - name: "SpacyTokenizer"
  - name: HFTransformersNLP
    model_name: "bert"
    model_weights: "bert-base-japanese-whole-word-masking"
  - name: "LanguageModelTokenizer"
  - name: "CRFEntityExtractor"

# Configuration for Rasa Core.
# https://rasa.com/docs/rasa/core/policies/
policies:
  - name: MemoizationPolicy
  - name: MappingPolicy
area type

All 2 comments

Thanks for the issue, @Ghostvv will get back to you about it soon!

You may find help in the docs and the forum, too 馃

there is a discussion in the linked PR

Was this page helpful?
0 / 5 - 0 ratings