Description of Problem:
Rasa currently has no native support for composite entities (entities that consist of multiple sub-entities). This feature is available in competitors like wit.ai and dialogflow.
Overview of the Solution:
A while ago, I've implemented composite entities as a custom component. This has worked for my use-cases. However, having this functionality as a separate component has some serious drawbacks:
My proposal is now that I take the functionality from this component and put it in a pull request to make it available throughout rasa. Definitions of composite entities would then be first class training data, meaning that they are defined the same way as e.g. lookup tables.
For example a markdown file containing NLU training data could contain another category defining composites of entities (a "@" is used to mark entity names).
## composite:car
- @color @brand
I'm open to discussion about the specifics of my implementation. Is there any argument against proceeding with this? One argument I could imagine is not wanting to increase the number of fields in the training files further.
Hi @BeWe11 thanks for the suggestion! We actually decided against merging a feature like this a while ago: https://github.com/RasaHQ/rasa/pull/1475
That's not to say we won't consider merging yours -- could you provide some more details to your approach?
Hey @akelad, thank you for your response and the link, I'm aware of the previous pull request.
The approach of the closed pull request had various assumptions about the training data built in, e.g. about time and date entities, it used lookup tables to find entities etc.
My approach is a simple regex check and it only kicks in after all other entity extractors have been applied. It just checks whether a predefined pattern (say @color @product with @extras) matches a parsed sentence structure (say red shoes with shoelaces where red, shoes and shoelaces have been recognized as entities of type color, brand and extras). I'd like to refer you to the Github page that I've linked for a full example.
This is very similar to the dialogflow implementation. There is no complicated interaction to other extractors and it doesn't require any restructuring of training data. The only thing required is a definition of some named patterns. The way I'm imagining it, these definitions would be placed alongside other "extra" definitions like lookup tables, for example:
{
"rasa_nlu_data": {
"common_examples": [],
"regex_features" : [],
"lookup_tables" : [],
"entity_synonyms": [],
"composite_patterns": [
{
"name": "product",
"patterns": [
"@color @brand with @extras"
]
}
]
}
}
I can understand that you might not want to include yet another category in the training data. I'd like to argue that the feature of composite entities is so basic that there might be a case for making the data available natively.
Ok cool, thanks for the detailed info! We'll discuss and let you know
Hi @BeWe11,
I think your composite_patterns definition is a very intuitive implementation, however does it generalise? i.e. in this example does every composite entity need to contain the with token in order to be recognised?
Hi @MetcalfeTom ,
I have been using @BeWe11 's composite_entity_extractor for a while and everything works great for me. From my understanding and experience, I think the with token is not necessary for every composite entity. The patterns is the list of all combinations of entities that you want to be recognized as the name (that includes both entity and text string). In this example
{
"name": "product_with_attributes",
"patterns": [
"@color @product with @pattern",
"@pattern @color @product"
]
}
Both pattern @color @product with @pattern and @pattern @color @product will be recognized as product_with_attributes, where in the first case "with" is a string. So it is also possible to add another pattern as @color @product of @pattern and it will also be recognized as the same composite entity.
Thanks for considering this repo. Composite entity is a feature that we used a lot so really hope rasa could support it natively.
Hey @MetcalfeTom,
with patterns being regexes, the composites generalize to everything that can be expressed as a regex. For example, the with could be made optional by using
@color @product (?:with )?@pattern
Alternatively, you could just define multiple patterns for the same composite entity as in @BrianYing's example.
Using raw regexes for pattern definitions provides a lot of flexibility. The downside is that defining regexes for complex use-cases can be quite tricky. Defining patterns is kind of a "set-and-forget" task though, so the trade-off might be justified.
hey @BeWe11 - thanks for that. I created a draft PR with an alternative approach where we train a second CRF. This might be more flexible because it could allow for entity roles as well as compound entities. Very much WIP though https://github.com/RasaHQ/rasa/pull/3889
Hey @amn41,
from skimming over your code, I'm not quite sure what these entity roles are and how they are implemented. Would roles be given in the training examples, i.e. one would have to tag individual entities as well as groups of entities per example sentence?
haven't thought too much about the data format yet, but added some notes about the general idea to the PR description
I think next step is to better evaluate @amn41's proposal. @BeWe11 / @BrianYing do you have any data we could use to test this approach? (if you do not want to share it publicly but could share it under NDA, please write me a mail)
decision: we want this, question is how the detection is implemented (e.g. does an ML based approach like the one proposed by @amn41 work).
Hi @tmbo @JustinaPetr can my organization leverage our existing partnership with rasa to work together on this for a custom solution?
Hey @cyrilthank which organization are you with?
Hi @akelad I am with http://aiware.ai/ and rasa partnership with http://cleareye.ai/
@cyrilthank as far as I'm aware we don't have any partnerships with either of those companies. In any case though, the next step at the moment is still to evaluate alans approach to this - @amn41 do we have any updates on that?
Thank you @akelad for your reply
Can you please advise the mail id i can send the partnership credentials/documentation to confirm that we do have a rasa partnership
my mail id is cyril.[email protected]
Hi @cyrilthank yes my bad, there was some miscommunication about that. Please contact us via email if you want to discuss anything regarding our partnership, as we would prefer not to do this in the community
Thank you @akelad for patiently checking this. I understand completely 'multiple entities' causing......
Appreciate it if you could drop me a mail at cyril.[email protected] on how i may contact rasa-partnership team by email
Thank you for all your help
cyril
Hi @cyrilthank, @akelad, @amn41,
actually I am interested in solving this too. If additional ressources are needed, you might want to consider me/us in - in case you dont want to do it internally.
Regards
Julian
Hi @akelad please share the partnership related engagement information to
since we want to explore other use-cases too where we can collaborate to best leverage the partnership agreement for both our organizations
@cyrilthank you can get in touch with us via: [email protected]
@amn41 what's the update with your PR? Would be good to see if it makes sense for a community member to work on this :)
sorry - I have an idea for a simpler way to achieve this but need to work out the details. Will discuss with @tabergma @Ghostvv
Thanks @akelad i have reached out to [email protected]
Hi team, is it something you are still working on? Our organization really needs this and I saw that the other PR #3889 was closed.
yes, we are! waiting on a couple of other NLU pieces to come together but working on a good solution
Can you please share which version this is targeted for ?
Thank youCyrilÂ
On Monday, November 25, 2019, 11:56:57 PM GMT+5:30, Alan Nichol notifications@github.com wrote:
yes, we are! waiting on a couple of other NLU pieces to come together but working on a good solution
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
yes, we are! waiting on a couple of other NLU pieces to come together but working on a good solution
That's awesome :-) Looking forward to it! In the meantime, we'll use @BeWe11's solution as a workaround
@nbeuchat @BeWe11 We have a first working version of the composite entity feature ready. If you are still interested, I can share the feature with you so that you can test it before the actual release. Just let me know. Thanks.
Thanks for the update @tabergma ! I'd love to check it out :-)
I am interested as well, @tabergma !
@tabergma I am struggling with a similar use case too! If possible I would love to try using it and test it out
@shubhamnatraj We actually released the feature yesterday, see here. Would be great if you could share some feedback with us once you tested it. Thanks.
@tabergma Thank you! I will try it out and share my feedback on the thread itself.
I would close this issue - feel free to still leave any feedback you have here or on the forum. And if there's any issues/enhancement requests with the current feature, please open a new issue