Issue: I am trying to create an arabic chatbot but my model can't extract the entities. I get this error when I test my model on an utterance that contains entities to extract:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-4: ordinal not inrange(128)`
This is my configuration file:
pipeline:
- name: "nlp_spacy"
- name: "tokenizer_spacy"
- name: "intent_entity_featurizer_regex"
- name: "intent_featurizer_spacy"
- name: "ner_crf"
- name: "ner_synonyms"
- name: "intent_featurizer_count_vectors"
- name: "intent_classifier_tensorflow_embedding"
intent_tokenization_flag: true
intent_split_symbol: "+"
Could you post what NLU version you're using and the full error trace please?
I actually no longer get that error (I have no idea what happened), but I am getting a different one now.
I still have the issue with the entity extractions, my NLU model works fine, now the problem is with my dialgue management model. I am trying to train my model by providing the domain file.
My NLU version is 0.12.3
My domain file is:
slots:
horo:
type:text
intents:
- 爻賱丕賲
- 胤賱亘
- 亘乇噩
- 賵丿丕毓+卮賰乇
entities:
- horo
actions:
- actions.Selem
- actions.Goodbye
- actions.Horoscope
- actions.AskSign
`````
and the error I get is:
Traceback (most recent call last):
File "C:/Users/Asus/Desktop/eng_intern/arabicBot/train_init.py", line 21, in
agent = Agent('domain.yml', policies=[MemoizationPolicy(max_history=2), KerasPolicy()])
File "C:\Users\Asus\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_core\agent.py", line 51, in __init__
self.domain = self._create_domain(domain)
File "C:\Users\Asus\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_core\agent.py", line 390, in _create_domain
return TemplateDomain.load(domain)
File "C:\Users\Asus\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_core\domain.py", line 402, in load
return cls.load_from_yaml(read_file(filename), action_factory=action_factory)
File "C:\Users\Asus\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_core\domain.py", line 411, in load_from_yaml
slots = cls.collect_slots(data.get("slots", {}))
File "C:\Users\Asus\AppData\Local\Programs\Python\Python36\lib\site-packages\rasa_core\domain.py", line 452, in collect_slots
slot_class = Slot.resolve_by_type(slot_dict[slot_name].get("type"))
AttributeError: 'str' object has no attribute 'get'
```
I am thinking maybe it's a problem with the type of the slot I have specified in my domain file, but I can't think of what else can I put.
i'd suggest also naming your intents in english, but i can't really see anything wrong with your domain file otherwise. What version of rasa_core are you using?
My Rasa core version is 0.10.1
I had no problems with the intents, I have actually tried removing the entity and working only with intents, the chatbot worked perfectly.
I am using ner-crf to extract the entities, maybe it doesn't work with the arabic language?
After long long researches, I finally found something. I tried to add - name: "ner_duckling" to my configuration file but I get this error when training my NLU model:
jpype._jvmfinder.JVMNotFoundException: No JVM shared library file (jvm.dll) found. Try setting up the JAVA_HOME environment variable properly.
What does Java have to do here?!
SOLVED, it was a very silly mistake (missing space in my domain file costed me 1 week of search! )
what was the mistake?
I should have put a space before text in my domain file, so it should have been:
Type: text
as silly as that ;)
It works perfectly now, no errors
ah cool!
Most helpful comment
I should have put a space before
textin my domain file, so it should have been:Type: textas silly as that ;)
It works perfectly now, no errors