Rasa: Trying to understand if RASA is a good fit for the FAQBot I would like to develop

Created on 5 Apr 2018  路  13Comments  路  Source: RasaHQ/rasa

Hi, the application my team is developing is a so called FAQBot. According to the questions customers will ask we should be able to recognize 44 different intents which would map to answers to frequently asked questions. As of now there is a total of around 200 sentences mapping to 44 intents. The default scikit-learn pipeline of RASA_NLU is performing very poorly. At the time of launch I guess I will have around 500 sentences mapping to around 30 intents.
In the meantime I am trying to come up with a pipeline which could perform better than the default one of RASA_NLU. There is already a knowledge base of rules based on keyword matching, that I would like to integrate in the Pipeline. I was thinking of creating a pipeline, for instance, like that:
Original Sentence --> Sentence Tokenizer --> Tokenizer --> Spell Checker --> Replace word synonyms --> Remove unrelevant Parts Of Speech --> Lemmatizer --> Keyword Classifier --> Feature Extractor --> Scikit Intent Classifier --> Intent Found
I would like to try different Feature Extractors and Classifier from the ones RASA uses by default and may not be the best fit for the dataset I have. I feel that a simple TFIDF Vectorizer with Latent Semantic Indexing would work better, and I also would like to try other classifiers apart from SVM. I would like to have the keyword classifier in the pipeline as well so it can profit from the preprocessing.
There are some features in RASA_NLU that are comfortable for the development of a chatbot ( integration with Rasa Core, a clear way to train and evaluate a model, easy integration with AWS). On the other hand, it seems to be a very opinionated framework which does not make it easy at all to create custom pipelines. Actually, the documentation says that currently the creation of custom components is not supported - do I understand it right, isn't there a simple way to plug a TFIDF classifier with Latent Semantic Index, possibly followed by a Random Tree Forest classifier, in RASA_NLU ? Considering that the greatest part of my project in the end is text classification, that is, recognition of intents, I find it hard to justify sticking with a framework that does not offer a clear path to tuning pipelines - as of now, I would rather be inclined to build and tune a text classifier using scikit-learn, maybe integrating some gensim and spacy.
Could you guys give me some opinion on how I could approach this problem and whether RASA_NLU is a good fit for the problem I am trying to solve ?
Thank you in advance

type

All 13 comments

please try out the newly merged tensorflow_embedding pipeline (you will have to clone master branch) - it will likely do better at the task you have in mind.

where in the docs did you read that it's not posssible to have custom components? so long as your components implement the correct api's, you can just add them to your pipeline by model path, e.g.:
(in the new yaml config format)

pipeline:
  - tokenizer_whitespace
  - custom_components.MyCustomClassifier

Thank you amn41, I tried the tensorflow_embedding and in fact it worked much better. Thank you for the tip. I will also go on and see if I manage to implement custom components.

The documentation I am referring is here : https://github.com/RasaHQ/rasa_nlu/blob/master/docs/pipeline.rst, it says "Currently you need to rely on the components that are shipped with rasa NLU, but we plan to add the possibility to create your own components in your code. " . I guess a task would be to update the documentation.

Let me rephrase, as I think I may have come out too strong. Regrettably, I find myself in the position of having to decide very soon whether to use RASA_NLU for a FAQ Bot that is supposed to go on production in two months. My gut feeling as of now is that RASA_NLU is a cutting-edge project with great potential I would also like to contribute to, but may not be stable and mature enough to be used for a production project. Just the fact that I have to checkout the master to get the feature that I need, the undocumented features, the changes in the config format, are kind of putting me off using RASA_NLU for a project at work, and I am rather keen on building a text classification engine relying on somewhat more "stable" libraries like scikit-learn and/or tensorflow, spacy and/or gensim. On the other end I am planning to use RASA_NLU for a hobby project of mine and offer my help to improve the tool.

Any thought ? Thank you in advance.

great to hear that the new pipeline works much better! I will make an issue regarding the docs adding custom components.

I understand your concerns. The feature you want is in the master branch and not yet in a stable release, although we're working hard to push out v0.12 soon.

this feature notwithstanding, Rasa NLU is quite a stable product, and is run in production by a number of large companies. As you can see there is quite comprehensive test coverage, and the web server API has been stable for a long time. You can rely on a specific version to work well, but since we are pre-1.0 release we do not guarantee backwards compatibility when you upgrade. If you want to keep using the latest models and features you will occasionally have to endure breaking changes, such as the new config format. Trust me when I say that none of these features are developed 'just for fun', we have limited resources and can only focus on building features which significantly enhance the product.

Thank you amn41. Taking into account how quick your feedback was I am now more inclined to go on and stick with RASA_NLU, but I will have to think it over for a while.
Regards

@diegoami I am going to close this, but please let me know if it needs re-opened. The Rasa team and community are more than happy to help any way we can!

@diegoami Did you end up using Rasa NLU for your FAQBot? Did you end up writing a custom component, e.g. TFIDF Vectorizer? I am curious about your findings and insights since I am facing the same challenge ATM.

Regrettably I found out that the tensorflow_embedding pipeline tended to overfit and as of now we have moved to a rule-based approach, without RASA_NLU. I think that the hard reality is that I don't have enough data to make machine learning work, no matter what model or framework I choose or how much I bend the pipeline.

@diegoami would you be willing to share your dataset? (privately) - I wonder if increasing the regularisation helps here

C:UsersSivasaiAppDataLocalContinuumanaconda3libsite-packagestwistedinternetdefer.py:1386 in _inlineCallbacks
1385 else:
1386 result = g.send(result)
1387 except StopIteration as e:

I am getting this Error In HttpWebResponse.Can you please tell me Where I did Mistake

@karthikhadoop unfortunately without more details it's hard to say. What command are you running? Which python version? (3.x it looks like)

The best would be if you could create a minimal example which lets someone else reproduce this exact issue.

@amn41 Hey I have a similar problem. I am in the process of developing faq bot with around 250 unique QNAs. I am open to providing the dataset privately to get the best possible pipeline. I am also not sure if rasa is a good fit for faq bots but I am a big advocate of rasa framework in general, So I would really want to make this work. Any help will be greatly appreciated. Thanks.

please don't cross post your issue, we'll keep the discussion in the new issue you created

Was this page helpful?
0 / 5 - 0 ratings