Rasa: Json formatted data for NLU training using http API

Created on 12 Jul 2019  路  12Comments  路  Source: RasaHQ/rasa

Description of Problem:
The current default api (--enable-api) doesn't allow json as an input (only markdown) for training data (NLU) when the old one (0.x) did.

difficulty help wanted low type

All 12 comments

Thanks for raising this issue, we'll look into it!

You can train with JSON data using the API. The server will save your data as 'nlu.md' but the loader doesn't rely on the file extension for the content type so it will work.

I would like to work on this

Please go ahead! Looking forward to seeing your contribution

Hi can someone provide more details regarding this issue? as per I know --enable-api will spawn a server once the bot is trained & ready to run. How is it related to JSON formatted data ? Is it during training it requires json formatted data if so how will it relate to --enable-api ? Can someone elaborate ?
Thanks :)

Hi @Archish27, Once the server is up you have access to an Http API. One of the endpoints is /model/train where you can include training data and configs in the payload of your POST request. The documentation asks for the nlu and stories data to be in markdown format. This is in line with the server code which will save the data in files with .md extension.

If you like we can team up on this one.

I tried with the sample payload hitting endpoint api http://localhost:5005/model/train

{
"config": "language: en\npipeline: supervised_embeddings\npolicies:\n  - name: MemoizationPolicy\n  - name: KerasPolicy",
"nlu": "{\n  \"rasa_nlu_data\": {\n    \"regex_features\": [\n      {\n        \"name\": \"greet\",\n        \"pattern\": \"hey[^\\\\s]*\"\n      }\n    ],\n    \"entity_synonyms\": [\n      {\n        \"value\": \"chinese\",\n        \"synonyms\": [\"Chinese\", \"Chines\", \"chines\"]\n      },\n      {\n        \"value\": \"vegetarian\",\n        \"synonyms\": [\"veggie\", \"vegg\"]\n      }\n    ],\n    \"common_examples\": [\n      {\n        \"text\": \"i m looking for a place to eat\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": []\n      },\n      {\n        \"text\": \"I want to grab lunch\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": []\n      },\n      {\n        \"text\": \"I am searching for a dinner spot\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": []\n      },\n      {\n        \"text\": \"i m looking for a place in the north of town\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": [\n          {\n            \"start\": 31,\n            \"end\": 36,\n            \"value\": \"north\",\n            \"entity\": \"location\"\n          }\n        ]\n      },\n      {\n        \"text\": \"show me chinese restaurants\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": [\n          {\n            \"start\": 8,\n            \"end\": 15,\n            \"value\": \"chinese\",\n            \"entity\": \"cuisine\"\n          }\n        ]\n      },\n      {\n        \"text\": \"show me chines restaurants in the north\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": [\n          {\n            \"start\": 8,\n            \"end\": 14,\n            \"value\": \"chinese\",\n            \"entity\": \"cuisine\"\n          },\n          {\n            \"start\": 34,\n            \"end\": 39,\n            \"value\": \"north\",\n            \"entity\": \"location\"\n          }\n        ]\n      },\n      {\n        \"text\": \"show me a mexican place in the centre\", \n        \"intent\": \"restaurant_search\", \n        \"entities\": [\n          {\n            \"start\": 31, \n            \"end\": 37, \n            \"value\": \"centre\", \n            \"entity\": \"location\"\n          }, \n          {\n            \"start\": 10, \n            \"end\": 17, \n            \"value\": \"mexican\", \n            \"entity\": \"cuisine\"\n          }\n        ]\n      },\n      {\n        \"text\": \"i am looking for an indian spot called olaolaolaolaolaola\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": [\n          {\n            \"start\": 20,\n            \"end\": 26,\n            \"value\": \"indian\",\n            \"entity\": \"cuisine\"\n          }\n        ]\n      },     {\n        \"text\": \"search for restaurants\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": []\n      },\n      {\n        \"text\": \"anywhere in the west\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": [\n          {\n            \"start\": 16,\n            \"end\": 20,\n            \"value\": \"west\",\n            \"entity\": \"location\"\n          }\n        ]\n      },\n      {\n        \"text\": \"anywhere near 18328\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": [\n          {\n            \"start\": 14,\n            \"end\": 19,\n            \"value\": \"18328\",\n            \"entity\": \"location\"\n          }\n        ]\n      },\n      {\n        \"text\": \"I am looking for asian fusion food\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": [\n          {\n            \"start\": 17,\n            \"end\": 29,\n            \"value\": \"asian fusion\",\n            \"entity\": \"cuisine\"\n          }\n        ]\n      },\n      {\n        \"text\": \"I am looking a restaurant in 29432\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": [\n          {\n            \"start\": 29,\n            \"end\": 34,\n            \"value\": \"29432\",\n            \"entity\": \"location\"\n          }\n        ]\n      },\n      {\n        \"text\": \"I am looking for mexican indian fusion\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": [\n          {\n            \"start\": 17,\n            \"end\": 38,\n            \"value\": \"mexican indian fusion\",\n            \"entity\": \"cuisine\"\n          }\n        ]\n      },\n      {\n        \"text\": \"central indian restaurant\",\n        \"intent\": \"restaurant_search\",\n        \"entities\": [\n          {\n            \"start\": 0,\n            \"end\": 7,\n            \"value\": \"central\",\n            \"entity\": \"location\"\n          },\n          {\n            \"start\": 8,\n            \"end\": 14,\n            \"value\": \"indian\",\n            \"entity\": \"cuisine\"\n          }\n        ]\n      }\n    ]\n  }\n}",
"force": false,
"save_to_default_model_directory": true
}

NLU got successfully trained.

Is it that instead of md file being generated in temp instead JSON should be generated ?

@PierreFG could you clarify the description of the problem and if it persists?

@erohmensing
The issue is with the /model/train API endpoint.

In 1.x it requires the NLU training data (in the nlu property) to be specified as a Markdown-formatted string.

The equivalent 0.x API endpoint allowed NLU training data to be provided in JSON format. Anyone upgrading from 0.x to 1.x therefore needs to completely rewrite their client-side API function - and it is more complex because Markdown is not a typical data format for APIs. JSON is much more convenient and typical for HTTP APIs.

@Polarisation that was my interpretation as well, but @Archish27 said that providing it in json format seems to have worked fine. I haven't had time to try it out myself but that's why I wanted to ask @PierreFG if the problem still persisted.

@erohmensing It might be that passing JSON does already work as reported by @Archish27 (I haven't tried myself), but even still I think it should be documented here before the issue is closed.

I faced he same problem and I thought it was from JSON, I digged in the code to find that my problem was wrong config format then wrong JSON string the string consisted of '\t' instead of spaces.

_json_format_heuristics = {
    WIT: lambda js, fn: "data" in js and isinstance(js.get("data"), list),
    LUIS: lambda js, fn: "luis_schema_version" in js,
    RASA: lambda js, fn: "rasa_nlu_data" in js,
    DIALOGFLOW_AGENT: lambda js, fn: "supportedLanguages" in js,
    DIALOGFLOW_PACKAGE: lambda js, fn: "version" in js and len(js) == 1,
    DIALOGFLOW_INTENT: lambda js, fn: "responses" in js,
    DIALOGFLOW_ENTITIES: lambda js, fn: "isEnum" in js,
    DIALOGFLOW_INTENT_EXAMPLES: lambda js, fn: "_usersays_" in fn,
    DIALOGFLOW_ENTITY_ENTRIES: lambda js, fn: "_entries_" in fn,
}
def guess_format(filename: Text) -> Text:
    """Applies heuristics to guess the data format of a file.

    Args:
        filename: file whose type should be guessed

    Returns:
        Guessed file format.
    """
    guess = UNK

    content = ""
    try:
        content = io_utils.read_file(filename)
        js = json.loads(content)
    except ValueError:
        if any([marker in content for marker in _markdown_section_markers]):
            guess = MARKDOWN
        elif _is_nlg_story_format(content):
            guess = MARKDOWN_NLG
    else:
        for fformat, format_heuristic in _json_format_heuristics.items():
            if format_heuristic(js, filename):
                guess = fformat
                break

    logger.debug("Training data format of '{}' is '{}'.".format(filename, guess))

    return guess

that's the code that guesses the format of NLU data, it does load JSON fine.

Was this page helpful?
0 / 5 - 0 ratings