Rasa version: 1.1.6-full (docker image)
Rasa X version (if used & relevant):
Python version:
Operating system (windows, osx, ...): docker
Issue: seems to timeout between rasa and action server. My action takes around 10sec. I don't know why it's timeout, because DEFAULT_REQUEST_TIMEOUT is around 5 minutes.
My action is well executed from the actions server.
Error (including full traceback):
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/core/processor.py", line 439, in _run_action
events = await action.run(output_channel, nlg, tracker, self.domain)
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/core/actions/action.py", line 399, in run
json=json_body, method="post", timeout=DEFAULT_REQUEST_TIMEOUT
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/utils/endpoints.py", line 144, in request
**kwargs
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 1005, in __aenter__
self._resp = await self._coro
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 497, in _request
await resp.start(conn)
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 844, in start
message, payload = await self._protocol.read() # type: ignore # noqa
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/streams.py", line 588, in read
await self._waiter
concurrent.futures._base.CancelledError
Command or request that led to error:
Content of configuration file (config.yml) (if relevant):
Content of domain file (domain.yml) (if relevant):
Thanks for raising this issue, @federicotdn will get back to you about it soon.
Hey @tormath1, could you provide some more details on how to reproduce this issue? Did you use any specific configuration? I would also need to know what command you ran exactly. Thanks.
Hey @federicotdn , thanks for your time.
To reproduce this issue,
def run(self, dispatcher, tracker, domain):
...
time.sleep(10)
dispatcher.utter_message(f"action is over")
return []
action-server) is running into a container (inside a kubernetes cluster) with this command: command:
- rasa
- run
- actions
- -vv
command:
- rasa
- run
- --enable-api
- --model
- 20190718-144925.tar.gz
- --log-file
- out.log
- --remote-storage
- gcs
- --credentials
- credentials.yaml
- --debug
and this config
action_endpoint:
url: http://action-server:5055/webhook
Configure Slack as a channel AND this is the "most important": it works by using rasa shell but not by using this configuration.
Let's chat.
Check logs:
$ kubectl logs -f pod -c action-server
...
127.0.0.1 - - [2019-07-19 08:26:38] "POST /webhook HTTP/1.1" 200 382 0.005504
2019-07-19 08:26:38 DEBUG rasa_sdk.executor - Received request to run 'my_action'
GOT MY HTTP LOGS HERE
2019-07-19 08:26:58 DEBUG rasa_sdk.executor - Finished running 'my_action'
$ kubectl logs -f pod -c chabot
2019-07-19 08:26:38 DEBUG rasa.core.processor - Predicted next action 'my_action' with confidence 1.00.
2019-07-19 08:26:38 DEBUG rasa.core.actions.action - Calling action endpoint to run action 'my_action'.
2019-07-19 08:26:41 ERROR rasa.core.processor - Encountered an exception while running action 'my_action Bot will continue, but the actions events are lost. Make sure to fix the exception in your custom code.
2019-07-19 08:26:41 DEBUG rasa.core.processor -
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/core/processor.py", line 439, in _run_action
events = await action.run(output_channel, nlg, tracker, self.domain)
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/core/actions/action.py", line 399, in run
json=json_body, method="post", timeout=DEFAULT_REQUEST_TIMEOUT
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/utils/endpoints.py", line 144, in request
**kwargs
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 1005, in __aenter__
self._resp = await self._coro
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 497, in _request
await resp.start(conn)
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 844, in start
message, payload = await self._protocol.read() # type: ignore # noqa
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/streams.py", line 588, in read
await self._waiter
concurrent.futures._base.CancelledError
2019-07-19 08:26:41 DEBUG rasa.core.processor - Action 'my_action' ended with events '[]'
Hey there @tormath1, with regards to
Configure Slack as a channel AND this is the "most important": it works by using rasa shell but not by using this configuration.
Do you know if your whole kubernetes set up works on another channel other than slack? If so, I think the error is coming from the Slack issue you've already commented on -- although your DEFAULT_REQUEST_TIMEOUT is 5 minutes, we've already noticed that there's currently a slack timeout issue that a community member is working on fixing (and if that fix doesn't come quickly enough I will take over) -- this is because slack has its own timeout and number of retries, which are likely getting all used up within your 5 minutes and then throwing an error. What do you think?
Hi @erohmensing ,
If you mention this: https://github.com/RasaHQ/rasa/issues/3882#issuecomment-510117975, my container is running with this fix (from my fork: https://github.com/tormath1/rasa/commit/7b68556233667fe96c9be0671436563d3ac20ec1).
Do you know if your whole kubernetes set up works on another channel other than slack?
Good point, I'm going to try it.
@erohmensing, I've tried with RestInput connector and it works like a charm. No errors and expected behavior.
Well, I suppose I'm going to ask to people to send a POST request to the action server when they want to perform this custom action.
Great UX :joy:
Hahaha. I will make sure that slack bug gets sorted, it's been causing lots of problems 馃槄If that's it, then I say we close this one
Well, I don't know. As I said, my current rasa version has this fix https://github.com/RasaHQ/rasa/issues/3882#issuecomment-510117975 and it does not seem to work any better. I mean, yes: Rasa is ignoring Slack retry so we do not have duplicate messages. But we have this error on the other hand :disappointed_relieved:
What's your credentials file look like? do you have an errors_ignore_retry?
This is my credentials.yaml
rest:
slack:
slack_token: "xoxb-mytoken"
rasa:
url: "http://localhost:5002/api"
And I don't have an errors_ignore_retry because I'm using the default one.
I see. And do you run into any DEFAULT_REQUEST_TIMEOUT issues when running your fork locally, or does this just happen when deployed in kubernetes?
Thanks for all your debugging help btw 馃槂
Thanks for your precious time too.
RestInput connectorDEFAULT_REQUEST_TIMEOUT)To me, the DEFAULT_REQUEST_TIMEOUT is present between the chatbot and the action server. This is why I do not undestand the differents behavior between RestInput and slack.
Okay yeah, this is pretty perplexing. I think the best route here is to get that guy鈥檚 fix merged in, and then I鈥檒l go in and check if everything works with the timeout stuff. Once we know that slack is working again, then we can correctly debug this issue
To me, the DEFAULT_REQUEST_TIMEOUT is present between the chatbot and the action server.
That鈥檚 what I understand too. Very weird.
Hi @erohmensing,
If you check my logs:
2019-07-24 09:22:50 WARNING rasa.core.channels.slack - Received retry #1 request from slack due to http_timeout
You can see that I'm using https://github.com/RasaHQ/rasa/pull/4081 patch. IMO the error is deeper, around aiohttp, but I still can't figure out why there is no errors with RestInput.
edit: Oh, maybe I have something: with RestInput, I'm using curl to send message to the connector. When arrive the long action, curl freezes, because it needs a return from its request.
With Slack, we do not have this behavior, I mean, Slack does not wait for a response. So there is a cancelledasync somewhere.
According to the doc, CancelledError occurs when asyncio Tasks are cancelled.
I got this error too. This is creating headache for our Twilio & Slack integration :(
@visopsys this happens with twilio too? Or just slack?
@erohmensing it happens with both.
Hi, I got this issue with twilio too. This is the log
rasa-production_1 | 2019-08-07 15:58:38 ERROR rasa.core.channels.twilio - Exception when trying to handle message.
rasa-production_1 | 2019-08-07 15:58:38 DEBUG rasa.core.channels.twilio -
rasa-production_1 | Traceback (most recent call last):
rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/channels/twilio.py", line 114, in message
rasa-production_1 | text, out_channel, sender, input_channel=self.name()
rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
rasa-production_1 | return self.gen.throw(type, value, traceback)
rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/channels/channel.py", line 75, in handler
rasa-production_1 | await app.agent.handle_message(*args, **kwargs)
rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
rasa-production_1 | return self.gen.throw(type, value, traceback)
rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/agent.py", line 432, in handle_message
rasa-production_1 | async with lock:
rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/locks.py", line 79, in __aenter__
rasa-production_1 | yield from self.acquire()
rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
rasa-production_1 | return self.gen.throw(type, value, traceback)
rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/utils.py", line 474, in acquire
rasa-production_1 | return await super(LockCounter, self).acquire() # type: ignore
rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/locks.py", line 181, in acquire
rasa-production_1 | yield from fut
rasa-production_1 | concurrent.futures._base.CancelledError
rasa-production_1 | 2019-08-07 15:58:38 DEBUG rasa.core.agent - Deleted lock for conversation '+16263191836' (unused)
rasa-production_1 | 2019-08-07 15:58:38 ERROR rasa.core.channels.twilio - Exception when trying to handle message.
rasa-production_1 | 2019-08-07 15:58:38 DEBUG rasa.core.channels.twilio -
rasa-production_1 | Traceback (most recent call last):
rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/channels/twilio.py", line 114, in message
rasa-production_1 | text, out_channel, sender, input_channel=self.name()
rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
rasa-production_1 | return self.gen.throw(type, value, traceback)
rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/channels/channel.py", line 75, in handler
rasa-production_1 | await app.agent.handle_message(*args, **kwargs)
rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
rasa-production_1 | return self.gen.throw(type, value, traceback)
rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/agent.py", line 432, in handle_message
rasa-production_1 | async with lock:
rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/locks.py", line 79, in __aenter__
rasa-production_1 | yield from self.acquire()
rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
rasa-production_1 | return self.gen.throw(type, value, traceback)
rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/utils.py", line 474, in acquire
rasa-production_1 | return await super(LockCounter, self).acquire() # type: ignore
rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/locks.py", line 181, in acquire
rasa-production_1 | yield from fut
rasa-production_1 | concurrent.futures._base.CancelledError
Any update on this issue? Any estimate on when it will be fixed or any workaround in the meantime?
@erohmensing any more ideas about this?
I've tried to replicate it on slack with no luck (the timeout ignoring itself is working fine with an action of a 30s sleep). Since it seems to be a cancelled async thing (or maybe another request coming in while one is still processing), maybe @tmbo has an idea?
@erohmensing Hi, my question maybe stupid, but can you explain why do we use "async" and "await" here? What is the reason, benefit from it? Some explanation of how you guys design this function could give us some clues to think about, or a work-around solution maybe?
Any update? We are considering moving away from Rasa because of this bug.
We've just replaced the lock implementation with a new solution from @ricwo in https://github.com/RasaHQ/rasa/pull/3933 . I am not sure if this issue is fixed by this (not released yet, only on master) but the error message should at least be different.
I am going to close this for the moment. Until confirmed that this is still an issue.
When do you plan to release this bug fix?
I dont see any changes in the twilio integration. Does the changes that you mentioned will fix the twilio integration too?
we are planning to cut the next minor later this week
Hi @tmbo , I tested with the current master branch.
I still have the same error message:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/rasa-1.3.0a2-py3.6.egg/rasa/core/processor.py", line 461, in _run_action
events = await action.run(output_channel, nlg, tracker, self.domain)
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/rasa-1.3.0a2-py3.6.egg/rasa/core/actions/action.py", line 457, in run
json=json_body, method="post", timeout=DEFAULT_REQUEST_TIMEOUT
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/rasa-1.3.0a2-py3.6.egg/rasa/utils/endpoints.py", line 144, in request
**kwargs
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 1005, in __aenter__
self._resp = await self._coro
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 497, in _request
await resp.start(conn)
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 844, in start
message, payload = await self._protocol.read() # type: ignore # noqa
File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
return self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.6/site-packages/aiohttp/streams.py", line 588, in read
await self._waiter
concurrent.futures._base.CancelledError
:disappointed_relieved:
For what it is worth: although similar, it is a different error (happens during custom action execution, previously it happened in the input channel). can you please create a new issue with this stacktrace and some more detail on when exactly this happens?
rasa-x is still pointing to rasa 1.2.8. This rasa version does not have the fix mentioned above. We have to wait till new version of rasa-x to be release to use it.
Seems to be still an issue: https://forum.rasa.com/t/cancellederror-in-random-calls-throughout-the-day/19966
i would guess we reopen this then?
Agreed. This should be opened. The bug is not fixed yet
Hello, we have got the same problem:
Rasa 1.4.0
Tested with WRK utility:
./wrk --latency -t2 -c20 -d20s -s ./scripts/post.lua http://localhost:5005/webhooks/rest/webhook
Lua script:
math.randomseed(os.clock()*100000000000)
function request()
math.randomseed(os.clock()*100000000000)
wrk.method = "POST"
wrk.body = '{"sender": "' .. math.random(1, 1000000) .. '", "message": "/videos"}'
wrk.headers["Content-Type"] = "application/json"
return wrk.format()
end
Results:
Running 20s test @ http://localhost:5005/webhooks/rest/webhook
2 threads and 10 connections
^C Thread Stats Avg Stdev Max +/- Stdev
Latency 350.92ms 120.48ms 1.43s 95.68%
Req/Sec 15.56 9.07 39.00 64.71%
Latency Distribution
50% 333.49ms
75% 360.45ms
90% 388.25ms
99% 1.04s
137 requests in 4.98s, 360.88KB read
Requests/sec: 27.51
Transfer/sec: 72.47KB
Rasa server log:
2019-10-23 12:02:57 DEBUG rasa.core.policies.memoization - Current tracker state [{'prev_action_videos_show': 1.0, 'intent_videos': 1.0}]
2019-10-23 12:02:57 DEBUG rasa.core.policies.memoization - There is no memorised next action
2019-10-23 12:02:57 DEBUG rasa.core.policies.mapping_policy - The mapped action, 'action_videos_show', for this intent, 'videos', was executed last so MappingPolicy is returning to action_listen.
2019-10-23 12:02:57 DEBUG rasa.core.policies.form_policy - There is no active form
2019-10-23 12:02:57 DEBUG rasa.core.policies.ensemble - Predicted next action using policy_2_MappingPolicy
2019-10-23 12:02:57 DEBUG rasa.core.processor - Predicted next action 'action_listen' with confidence 1.00.
2019-10-23 12:02:57 DEBUG rasa.core.processor - Action 'action_listen' ended with events '[]'
2019-10-23 12:02:57 DEBUG rasa.core.lock_store - Deleted lock for conversation '944515'.
2019-10-23 12:02:57 ERROR rasa.core.processor - Encountered an exception while running action 'action_videos_show'. Bot will continue, but the actions events are lost. Please check the logs of your action server for more information.
2019-10-23 12:02:57 DEBUG rasa.core.processor -
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/rasa/core/processor.py", line 495, in _run_action
events = await action.run(output_channel, nlg, tracker, self.domain)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/rasa/core/actions/action.py", line 457, in run
json=json_body, method="post", timeout=DEFAULT_REQUEST_TIMEOUT
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/rasa/utils/endpoints.py", line 144, in request
**kwargs,
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client.py", line 1012, in __aenter__
self._resp = await self._coro
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client.py", line 504, in _request
await resp.start(conn)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client_reqrep.py", line 847, in start
message, payload = await self._protocol.read() # type: ignore # noqa
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/streams.py", line 591, in read
await self._waiter
concurrent.futures._base.CancelledError
2019-10-23 12:02:57 DEBUG rasa.core.processor - Action 'action_videos_show' ended with events '[]'
2019-10-23 12:02:57 DEBUG rasa.core.processor - Current slot values:
answer: None
car: None
city: None
name: None
2019-10-23 12:02:57 DEBUG rasa.core.policies.memoization - Current tracker state [{'prev_action_videos_show': 1.0, 'intent_videos': 1.0}]
2019-10-23 12:02:57 DEBUG rasa.core.policies.memoization - There is no memorised next action
I had a same problem and in my case we had wrong proxy config. It wasn't an aiohttp problem.
my schema
web-client <-> nginx proxy <-> our.aiohttp.app <-> external.web.service
after set this parameters to nginx.conf it started work correct and CancelledError didn't come again
keepalive_timeout 3600;
proxy_read_timeout 3600;
And check your http-client request timeout
I meet the same problem.I write my custom channel for wechat and the wechat client will disconnect me if it doesn't receive message after 5 seconds.If my action takes longer than 5s,I got the problem above.
I was facing the same error and after spending some time found out that Slack is not disconnecting the session but Rasa server is doing it after 60s.
So I have increased Sanic Server RESPONSE_TIMEOUT and REQUEST_TIMEOUT. Now it is working fine (tested with 120s sleep in custom action).
Easy way to set those env variable is -
export SANIC_RESPONSE_TIMEOUT=360
export SANIC_ REQUEST_TIMEOUT=360
@tormath1 , please try this if you haven't resolved the issue and let me know.
Hey @pritampan, I moved to another things since the time I opened this issue. I hope your answer will be helfpful for other users :wink:
Facing the same issue through the RestInput channel as well. Was anyone able to solve this?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.
This is still an open issue reported here
In my case this apprated when using botframework channel, Rasa 1.10.8 running on docker images in Centos OS virtual machine in VirtualBox on Windows 10.
Problem is also with the latest 1.10.10-full release
Any update on this issue? Can the timeout be configured? or strategies to avoid action timeouts?
Most helpful comment
We've just replaced the lock implementation with a new solution from @ricwo in https://github.com/RasaHQ/rasa/pull/3933 . I am not sure if this issue is fixed by this (not released yet, only on master) but the error message should at least be different.