Rasa: Add (bi-)LSTM encoder as an alternative to the Transformer encoder for DIET

Created on 1 Apr 2020  路  4Comments  路  Source: RasaHQ/rasa

Description of Problem:

DIET uses a Transformer encoder, however it is perfectly possible to use a different encoder instead. In my experiments (see below) both encoders achieve comparable performance, with the LSTM typically being slightly weaker than the Transformer. Nonetheless, our experiments are not exhaustive and it might be that some users find the LSTM encoder performing better than the Transformer. In terms of training time, the Transformer based model trains slightly faster than the LSTM based one, but the difference isn't dramatic.

Preliminary experimental results:

| Dataset | LSTM | Transformer |
|-------------------|---------------------------------|---------------------------------|
| ATIS - Intent | 95.53 (Accuracy) | 96.61 (Accuracy) |
| ATIS - Entities | 94.83 (micro-avg F1) | 95.37 (micro-avg F1) |
| SNIPS - Intent | 97.71 (Accuracy) | 98.03 (Accuracy) |
| SNIPS - Entities | 95.62 (micro-avg F1) | 95.10 (micro-avg F1) |
| HERMIT - Intent | 90.77 (+/- 0.72) (micro-avg F1) | 89.89 (+/- 0.43) (micro-avg F1) |
| HERMIT - Entities | 81.47 (+/- 1.27) (micro-avg F1) | 87.38 (+/- 0.64) (micro-avg F1) |

Overview of the Solution:

There are some upsides and some downsides to adding this option to rasa OSS

_Pro_:

  • An additional option for the users
  • Some of the DIET code will be refactored, making future changes - i.e. replacing the Transformer by whatever new thing - easier
  • Harmonising some of the pipeline keys (e.g. currently num_transformer_layers vs. num_lstm_layers) could become num_encoder_layers for parameters with shared semantics.

_Con_:

  • No clear benefit result wise (on the datasets we've used) or runtime wise
  • More code = more maintainable code
  • Additional complexity for pipeline configuration if not harmonised (e.g. num_transformer_layers vs num_lstm_layers)

An initial implementation is done, integration into rasa OSS will involve at least a little bit of refactoring in the DIET model + optionally harmonising some of the config keys.

Whats your thoughts @tabergma @Ghostvv @amn41 @tmbo?

Definition of Done:

type

All 4 comments

which version of Rasa is it? Is it before I reintroduced scale_loss?

yes, its branched off 1.8, so doesn't include that fix yet

@tttthomasssss The current results above don't give a lot of motivation to include LSTM as a configurable option IMO. Do you have any results with pipelines that don't have ConveRT featurizer in them? Curious if LSTM is better for other feature combination.

Results with sparse feature have been substantially worse. I haven't run many experiments with GloVe, though I can certainly give it a go.

Was this page helpful?
0 / 5 - 0 ratings