Rasa: Add (bi-)LSTM encoder as an alternative to the Transformer encoder for DIET

Created on 1 Apr 2020 · 4Comments · Source: RasaHQ/rasa

Description of Problem:

DIET uses a Transformer encoder, however it is perfectly possible to use a different encoder instead. In my experiments (see below) both encoders achieve comparable performance, with the LSTM typically being slightly weaker than the Transformer. Nonetheless, our experiments are not exhaustive and it might be that some users find the LSTM encoder performing better than the Transformer. In terms of training time, the Transformer based model trains slightly faster than the LSTM based one, but the difference isn't dramatic.

Preliminary experimental results:

| Dataset | LSTM | Transformer |
|-------------------|---------------------------------|---------------------------------|
| ATIS - Intent | 95.53 (Accuracy) | 96.61 (Accuracy) |
| ATIS - Entities | 94.83 (micro-avg F1) | 95.37 (micro-avg F1) |
| SNIPS - Intent | 97.71 (Accuracy) | 98.03 (Accuracy) |
| SNIPS - Entities | 95.62 (micro-avg F1) | 95.10 (micro-avg F1) |
| HERMIT - Intent | 90.77 (+/- 0.72) (micro-avg F1) | 89.89 (+/- 0.43) (micro-avg F1) |
| HERMIT - Entities | 81.47 (+/- 1.27) (micro-avg F1) | 87.38 (+/- 0.64) (micro-avg F1) |

Overview of the Solution:

There are some upsides and some downsides to adding this option to rasa OSS

_Pro_:

An additional option for the users
Some of the DIET code will be refactored, making future changes - i.e. replacing the Transformer by whatever new thing - easier
Harmonising some of the pipeline keys (e.g. currently num_transformer_layers vs. num_lstm_layers) could become num_encoder_layers for parameters with shared semantics.

_Con_:

No clear benefit result wise (on the datasets we've used) or runtime wise
More code = more maintainable code
Additional complexity for pipeline configuration if not harmonised (e.g. num_transformer_layers vs num_lstm_layers)

An initial implementation is done, integration into rasa OSS will involve at least a little bit of refactoring in the DIET model + optionally harmonising some of the config keys.

Whats your thoughts @tabergma @Ghostvv @amn41 @tmbo?

Definition of Done:

type

Source

tttthomasssss

All 4 comments

which version of Rasa is it? Is it before I reintroduced scale_loss?

Ghostvv on 1 Apr 2020

yes, its branched off 1.8, so doesn't include that fix yet

tttthomasssss on 1 Apr 2020

👍1

@tttthomasssss The current results above don't give a lot of motivation to include LSTM as a configurable option IMO. Do you have any results with pipelines that don't have ConveRT featurizer in them? Curious if LSTM is better for other feature combination.