Rasa version:
Rasa core 0.14.0
Rasa nlu 0.14.4
Python version:
3.6.8
Operating system (windows, osx, ...):
osx
Issue:
rasa nlu model creation takes the duckling URL from the config.yml file and puts it into the metadata.json file of the trained model.
we use docker-compose for local testing and k8s for cloud test/prod.
docker and k8s use different way to network between containers; docker uses named containers eg duckling and k8s uses localhost. So we need different duckling url in local vs cloud testing.
we've separated the URL's in environment files but the Rasa training puts the URL into the metadata.json file of the model. This means that the model has to be retrained between local (docker-compose) and cloud (k8s-docker) testing. It makes more sense to have the URL outside of the model in a config file that can be controlled with environment and build processes so that the trained model can be copied rather than retrained (for no reason other than URL change due to environment).
eg.
for docker-compose "url": "http://duckling:8000",
for k8s "url": "http://localhost:8000",
Content of configuration file (config.yml):
for docker-compose:
pipeline:
# other stuff
- name: ner_duckling_http
url: http://duckling:8000
for cloud k8s:
pipeline:
# other stuff
- name: ner_duckling_http
url: http://localhost:8000
Content of domain file (domain.yml) (if used & relevant):
not relevant
Thanks for raising this issue, @MetcalfeTom will get back to you about it soon.
Hey @cmcc13, does this help?
In addition to setting the default ``url`` of your duckling server in the
configuration, you can also change the url of your duckling server (without
needing to re-train your model) by setting the ``RASA_DUCKLING_HTTP_URL``
environment variable.
See relevant issue here
This might be a work around. But I think the URL should not be put into the model in the first place. The URL should be read from a YML file (config, .env or endpoints). We'll give the environment variable a go. Thanks.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.
I found the same problem - if i change duckling http url in the cnfig file it requires a complete retrain. Please consider fixing this as its very un-intuitive - spent a lot of time trying to figure out why the url change is not getting picked up before stumbling on this.
I agree. Not sure how best to handle this, either the URL should be part of the endpoints.yml (but would still need to be able to define in config for NLU only models? 馃 ) or its value shouldn't influence the fingerprinting.
@wochinge in progress but no assignee?
Thanks, fixed it :-)
@wochinge why are we not fixing that?
Because
So basically the relation between benefit and effort is very bad.
We have to change the duckling url regularly because dev and prod environments are different. So frequency of needing to change this is daily.
for docker-compose "url": "http://duckling:8000",
for k8s "url": "http://localhost:8000"
Actually @cmcc13 the prod url is now "duckling.default.svc.cluster.local.:8000" and that could change later if we do more advanced GKE service stuff. But in dev we just want to docker-compose up and let docker sort out all the networking. So @wochinge it's a bit of a headache for our team to not have this all in a config file.
@Yoomtah
As far as I understand it, you have two setups, correct? One docker-compose and one K8s? And are they completely separate or are you sharing the trained models between them? Because if you are not sharing the models between these two deployments, then you have to retrain either way.
For each chatbot that we have (I believe its 5), we have two model files: model-dev and model-prod. These models are identical except that they were trained with different duckling URLs. Depending on our environment we then build a Rasa docker container with one of these files.
The duckling URL is the only thing necessitating two training runs and managing two model files for each bot. We have a different action URL for dev and prod as well but this is easily changed in the endpoints.yml file.
Ah, I think I'm getting it now.
1) You train a model in the dev environment and decide it's worth to promote it to the prod environment
2) You can't promote it to the prod environment, because the duckling url is diferent and then you have to retrain it, right?
Would an easy workaround to add an alias for duckling to your hosts file? (https://en.wikipedia.org/wiki/Hosts_(file))
This problem still exists and is very uninituitv. Every endpoint can be configured in the endpoints.yml except the duckling part. Makes the automate deployment e.g. via helm very messy.