Rasa: Add set.seed ability to CLI when splitting NLU data

Created on 21 Oct 2019  ·  4Comments  ·  Source: RasaHQ/rasa

Description of Problem:
The user is not able to set a seed when using the CLI to split NLU data into test and training data sets (_rasa split data nlu_ command). This feature will enhance reproducibility when required.

Overview of the Solution:
Add set.seed() to split_nlu_data.py or train_test_split.py

Definition of Done:

  • [ ] Tests are added
  • [ ] Feature described the docs
  • [ ] Feature mentioned in the changelog
help wanted type

Most helpful comment

@joaorobson Sure, feel free to work on it. Thanks! Let me know if you need help/if you have questions.

All 4 comments

Thanks for submitting this feature request 🚀@Ghostvv will get back to you about it soon!✨

Caw! 🐧

Great idea 👍 We could add an argument like --random-seed to rasa split data nlu to set the seed. Needs to be added here: https://github.com/RasaHQ/rasa/blob/master/rasa/cli/data.py. And the seed should be forwarded to https://github.com/RasaHQ/rasa/blob/master/rasa/nlu/training_data/training_data.py#L400 in order to set it.

@neelkes Do you want to work on this feature yourself and submit a PR?

I found this enhancement pretty useful. Can I be assigned to this issue? (Supposing that @neelkes is not working with this feature anymore)

@joaorobson Sure, feel free to work on it. Thanks! Let me know if you need help/if you have questions.

Was this page helpful?
0 / 5 - 0 ratings