Hi! The documentation here:
https://rasa-nlu.readthedocs.io/en/latest/dataformat.html#markdown-format
says that "training data can be used in the following markdown format":
## intent:check_balance
- what is my balance <!-- no entity -->
- how much do I have on my [savings](source_account) <!-- entity "source_account" has value "savings" -->
- how much do I have on my [my savings account](source_account:savings) <!-- synonyms, method 1-->
## intent:greet
- hey
- hello
## synonym:savings <!-- synonyms, method 2 -->
- pink pig
However, it's not clear to me how to train using this data. I tried:
python -m rasa_nlu.train -c config_spacy_test.json
using the following config file:
{
"pipeline": "spacy_sklearn",
"path" : "./models",
"data" : "./data/examples/rasa/test.md"
}
and got:
INFO:rasa_nlu.utils.spacy_utils:Trying to load spacy model with name 'en'
INFO:rasa_nlu.components:Added 'nlp_spacy' to component cache. Key 'nlp_spacy-en'.
Traceback (most recent call last):
File "/home/ax02211/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/ax02211/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ax02211/anaconda3/lib/python3.6/site-packages/rasa_nlu/train.py", line 88, in <module>
do_train(config)
File "/home/ax02211/anaconda3/lib/python3.6/site-packages/rasa_nlu/train.py", line 77, in do_train
training_data = load_data(config['data'])
File "/home/ax02211/anaconda3/lib/python3.6/site-packages/rasa_nlu/converters.py", line 288, in load_data
fformat = guess_format(files)
File "/home/ax02211/anaconda3/lib/python3.6/site-packages/rasa_nlu/converters.py", line 258, in guess_format
file_data = json.loads(f.read())
File "/home/ax02211/anaconda3/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/home/ax02211/anaconda3/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/home/ax02211/anaconda3/lib/python3.6/json/decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
How can I train using data in markdown format? Thanks!
Did you install from GitHub or pip? What's your Rasa NLU version?
pip, if I recall correctly. Why should it matter?
The markdown training format is a part of latest, which is only available when installing directly from GitHub. I don't know when it will get pushed to pypi.
Need to fix a couple more things before we release the next version - but if you install from github you should be good to go with the markdown format
For the benefit of anyone else that encounters this behaviour: I can confirm that installing from github did indeed resolve the issue. Thanks!
Most helpful comment
Need to fix a couple more things before we release the next version - but if you install from github you should be good to go with the markdown format