Rasa: got concurrent.futures._base.CancelledError

Created on 19 Jul 2019  路  45Comments  路  Source: RasaHQ/rasa

Rasa version: 1.1.6-full (docker image)

Rasa X version (if used & relevant):

Python version:

Operating system (windows, osx, ...): docker

Issue: seems to timeout between rasa and action server. My action takes around 10sec. I don't know why it's timeout, because DEFAULT_REQUEST_TIMEOUT is around 5 minutes.

My action is well executed from the actions server.

Error (including full traceback):

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/core/processor.py", line 439, in _run_action
    events = await action.run(output_channel, nlg, tracker, self.domain)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/core/actions/action.py", line 399, in run
    json=json_body, method="post", timeout=DEFAULT_REQUEST_TIMEOUT
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/utils/endpoints.py", line 144, in request
    **kwargs
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 1005, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 497, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 844, in start
    message, payload = await self._protocol.read()  # type: ignore  # noqa
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/streams.py", line 588, in read
    await self._waiter
concurrent.futures._base.CancelledError

Command or request that led to error:


Content of configuration file (config.yml) (if relevant):


Content of domain file (domain.yml) (if relevant):


rasa-oschannels 馃摓 type

Most helpful comment

We've just replaced the lock implementation with a new solution from @ricwo in https://github.com/RasaHQ/rasa/pull/3933 . I am not sure if this issue is fixed by this (not released yet, only on master) but the error message should at least be different.

All 45 comments

Thanks for raising this issue, @federicotdn will get back to you about it soon.

Hey @tormath1, could you provide some more details on how to reproduce this issue? Did you use any specific configuration? I would also need to know what command you ran exactly. Thanks.

Hey @federicotdn , thanks for your time.

To reproduce this issue,

  1. Create a custom action with a "long" action, in my case it's a HTTP post request:
    def run(self, dispatcher, tracker, domain):
        ...
        time.sleep(10)
        dispatcher.utter_message(f"action is over")
        return []
  1. My action server (action-server) is running into a container (inside a kubernetes cluster) with this command:
        command:
          - rasa
          - run
          - actions
          - -vv
  1. Chatbot is running inside another container, with this command:
        command:
          - rasa
          - run
          - --enable-api 
          - --model 
          - 20190718-144925.tar.gz
          - --log-file 
          - out.log 
          - --remote-storage
          - gcs
          - --credentials 
          - credentials.yaml
          - --debug

and this config

    action_endpoint:
      url: http://action-server:5055/webhook
  1. Configure Slack as a channel AND this is the "most important": it works by using rasa shell but not by using this configuration.

  2. Let's chat.

  3. Check logs:

  • from the action server:
$ kubectl logs -f pod -c action-server
...
127.0.0.1 - - [2019-07-19 08:26:38] "POST /webhook HTTP/1.1" 200 382 0.005504
2019-07-19 08:26:38 DEBUG    rasa_sdk.executor  - Received request to run 'my_action'
GOT MY HTTP LOGS HERE
2019-07-19 08:26:58 DEBUG    rasa_sdk.executor  - Finished running 'my_action'
  • from the chatbot:
$ kubectl logs -f pod -c chabot
2019-07-19 08:26:38 DEBUG    rasa.core.processor  - Predicted next action 'my_action' with confidence 1.00.
2019-07-19 08:26:38 DEBUG    rasa.core.actions.action  - Calling action endpoint to run action 'my_action'.
2019-07-19 08:26:41 ERROR    rasa.core.processor  - Encountered an exception while running action 'my_action Bot will continue, but the actions events are lost. Make sure to fix the exception in your custom code.
2019-07-19 08:26:41 DEBUG    rasa.core.processor  - 
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/core/processor.py", line 439, in _run_action
    events = await action.run(output_channel, nlg, tracker, self.domain)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/core/actions/action.py", line 399, in run
    json=json_body, method="post", timeout=DEFAULT_REQUEST_TIMEOUT
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/rasa-1.1.6-py3.6.egg/rasa/utils/endpoints.py", line 144, in request
    **kwargs
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 1005, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 497, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 844, in start
    message, payload = await self._protocol.read()  # type: ignore  # noqa
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/streams.py", line 588, in read
    await self._waiter
concurrent.futures._base.CancelledError
2019-07-19 08:26:41 DEBUG    rasa.core.processor  - Action 'my_action' ended with events '[]'

Hey there @tormath1, with regards to

Configure Slack as a channel AND this is the "most important": it works by using rasa shell but not by using this configuration.

Do you know if your whole kubernetes set up works on another channel other than slack? If so, I think the error is coming from the Slack issue you've already commented on -- although your DEFAULT_REQUEST_TIMEOUT is 5 minutes, we've already noticed that there's currently a slack timeout issue that a community member is working on fixing (and if that fix doesn't come quickly enough I will take over) -- this is because slack has its own timeout and number of retries, which are likely getting all used up within your 5 minutes and then throwing an error. What do you think?

Hi @erohmensing ,

If you mention this: https://github.com/RasaHQ/rasa/issues/3882#issuecomment-510117975, my container is running with this fix (from my fork: https://github.com/tormath1/rasa/commit/7b68556233667fe96c9be0671436563d3ac20ec1).

Do you know if your whole kubernetes set up works on another channel other than slack?

Good point, I'm going to try it.

@erohmensing, I've tried with RestInput connector and it works like a charm. No errors and expected behavior.

Well, I suppose I'm going to ask to people to send a POST request to the action server when they want to perform this custom action.
Great UX :joy:

Hahaha. I will make sure that slack bug gets sorted, it's been causing lots of problems 馃槄If that's it, then I say we close this one

Well, I don't know. As I said, my current rasa version has this fix https://github.com/RasaHQ/rasa/issues/3882#issuecomment-510117975 and it does not seem to work any better. I mean, yes: Rasa is ignoring Slack retry so we do not have duplicate messages. But we have this error on the other hand :disappointed_relieved:

What's your credentials file look like? do you have an errors_ignore_retry?

This is my credentials.yaml

rest:
slack:
  slack_token: "xoxb-mytoken"
rasa:
  url: "http://localhost:5002/api"

And I don't have an errors_ignore_retry because I'm using the default one.

I see. And do you run into any DEFAULT_REQUEST_TIMEOUT issues when running your fork locally, or does this just happen when deployed in kubernetes?

Thanks for all your debugging help btw 馃槂

Thanks for your precious time too.

  • fork locally: it works
  • fork remotely (kubernetes):

    • it works by using RestInput connector

    • it does not work with Slack connect (DEFAULT_REQUEST_TIMEOUT)

To me, the DEFAULT_REQUEST_TIMEOUT is present between the chatbot and the action server. This is why I do not undestand the differents behavior between RestInput and slack.

Okay yeah, this is pretty perplexing. I think the best route here is to get that guy鈥檚 fix merged in, and then I鈥檒l go in and check if everything works with the timeout stuff. Once we know that slack is working again, then we can correctly debug this issue

To me, the DEFAULT_REQUEST_TIMEOUT is present between the chatbot and the action server.

That鈥檚 what I understand too. Very weird.

Hi @erohmensing,

If you check my logs:

2019-07-24 09:22:50 WARNING  rasa.core.channels.slack  - Received retry #1 request from slack due to http_timeout

You can see that I'm using https://github.com/RasaHQ/rasa/pull/4081 patch. IMO the error is deeper, around aiohttp, but I still can't figure out why there is no errors with RestInput.

edit: Oh, maybe I have something: with RestInput, I'm using curl to send message to the connector. When arrive the long action, curl freezes, because it needs a return from its request.
With Slack, we do not have this behavior, I mean, Slack does not wait for a response. So there is a cancelledasync somewhere.

According to the doc, CancelledError occurs when asyncio Tasks are cancelled.

I got this error too. This is creating headache for our Twilio & Slack integration :(

@visopsys this happens with twilio too? Or just slack?

@erohmensing it happens with both.

Hi, I got this issue with twilio too. This is the log

rasa-production_1 | 2019-08-07 15:58:38 ERROR rasa.core.channels.twilio - Exception when trying to handle message. rasa-production_1 | 2019-08-07 15:58:38 DEBUG rasa.core.channels.twilio - rasa-production_1 | Traceback (most recent call last): rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/channels/twilio.py", line 114, in message rasa-production_1 | text, out_channel, sender, input_channel=self.name() rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw rasa-production_1 | return self.gen.throw(type, value, traceback) rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/channels/channel.py", line 75, in handler rasa-production_1 | await app.agent.handle_message(*args, **kwargs) rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw rasa-production_1 | return self.gen.throw(type, value, traceback) rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/agent.py", line 432, in handle_message rasa-production_1 | async with lock: rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/locks.py", line 79, in __aenter__ rasa-production_1 | yield from self.acquire() rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw rasa-production_1 | return self.gen.throw(type, value, traceback) rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/utils.py", line 474, in acquire rasa-production_1 | return await super(LockCounter, self).acquire() # type: ignore rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/locks.py", line 181, in acquire rasa-production_1 | yield from fut rasa-production_1 | concurrent.futures._base.CancelledError rasa-production_1 | 2019-08-07 15:58:38 DEBUG rasa.core.agent - Deleted lock for conversation '+16263191836' (unused) rasa-production_1 | 2019-08-07 15:58:38 ERROR rasa.core.channels.twilio - Exception when trying to handle message. rasa-production_1 | 2019-08-07 15:58:38 DEBUG rasa.core.channels.twilio - rasa-production_1 | Traceback (most recent call last): rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/channels/twilio.py", line 114, in message rasa-production_1 | text, out_channel, sender, input_channel=self.name() rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw rasa-production_1 | return self.gen.throw(type, value, traceback) rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/channels/channel.py", line 75, in handler rasa-production_1 | await app.agent.handle_message(*args, **kwargs) rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw rasa-production_1 | return self.gen.throw(type, value, traceback) rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/agent.py", line 432, in handle_message rasa-production_1 | async with lock: rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/locks.py", line 79, in __aenter__ rasa-production_1 | yield from self.acquire() rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw rasa-production_1 | return self.gen.throw(type, value, traceback) rasa-production_1 | File "/usr/local/lib/python3.6/site-packages/rasa/core/utils.py", line 474, in acquire rasa-production_1 | return await super(LockCounter, self).acquire() # type: ignore rasa-production_1 | File "/usr/local/lib/python3.6/asyncio/locks.py", line 181, in acquire rasa-production_1 | yield from fut rasa-production_1 | concurrent.futures._base.CancelledError

Any update on this issue? Any estimate on when it will be fixed or any workaround in the meantime?

@erohmensing any more ideas about this?

I've tried to replicate it on slack with no luck (the timeout ignoring itself is working fine with an action of a 30s sleep). Since it seems to be a cancelled async thing (or maybe another request coming in while one is still processing), maybe @tmbo has an idea?

@erohmensing Hi, my question maybe stupid, but can you explain why do we use "async" and "await" here? What is the reason, benefit from it? Some explanation of how you guys design this function could give us some clues to think about, or a work-around solution maybe?

Any update? We are considering moving away from Rasa because of this bug.

We've just replaced the lock implementation with a new solution from @ricwo in https://github.com/RasaHQ/rasa/pull/3933 . I am not sure if this issue is fixed by this (not released yet, only on master) but the error message should at least be different.

I am going to close this for the moment. Until confirmed that this is still an issue.

When do you plan to release this bug fix?
I dont see any changes in the twilio integration. Does the changes that you mentioned will fix the twilio integration too?

we are planning to cut the next minor later this week

Hi @tmbo , I tested with the current master branch.
I still have the same error message:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/rasa-1.3.0a2-py3.6.egg/rasa/core/processor.py", line 461, in _run_action
    events = await action.run(output_channel, nlg, tracker, self.domain)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/rasa-1.3.0a2-py3.6.egg/rasa/core/actions/action.py", line 457, in run
    json=json_body, method="post", timeout=DEFAULT_REQUEST_TIMEOUT
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/rasa-1.3.0a2-py3.6.egg/rasa/utils/endpoints.py", line 144, in request
    **kwargs
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 1005, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/client.py", line 497, in _request
    await resp.start(conn)
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/client_reqrep.py", line 844, in start
    message, payload = await self._protocol.read()  # type: ignore  # noqa
  File "/usr/local/lib/python3.6/asyncio/coroutines.py", line 129, in throw
    return self.gen.throw(type, value, traceback)
  File "/usr/local/lib/python3.6/site-packages/aiohttp/streams.py", line 588, in read
    await self._waiter
concurrent.futures._base.CancelledError

:disappointed_relieved:

For what it is worth: although similar, it is a different error (happens during custom action execution, previously it happened in the input channel). can you please create a new issue with this stacktrace and some more detail on when exactly this happens?

rasa-x is still pointing to rasa 1.2.8. This rasa version does not have the fix mentioned above. We have to wait till new version of rasa-x to be release to use it.

i would guess we reopen this then?

Agreed. This should be opened. The bug is not fixed yet

Hello, we have got the same problem:

Rasa 1.4.0

Tested with WRK utility:
./wrk --latency -t2 -c20 -d20s -s ./scripts/post.lua http://localhost:5005/webhooks/rest/webhook

Lua script:

math.randomseed(os.clock()*100000000000)

function request()
    math.randomseed(os.clock()*100000000000)
    wrk.method = "POST"
    wrk.body   = '{"sender": "' .. math.random(1, 1000000) .. '", "message": "/videos"}'
    wrk.headers["Content-Type"] = "application/json"
    return wrk.format()
 end

Results:

Running 20s test @ http://localhost:5005/webhooks/rest/webhook
  2 threads and 10 connections
^C  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   350.92ms  120.48ms   1.43s    95.68%
    Req/Sec    15.56      9.07    39.00     64.71%
  Latency Distribution
     50%  333.49ms
     75%  360.45ms
     90%  388.25ms
     99%    1.04s
  137 requests in 4.98s, 360.88KB read
Requests/sec:     27.51
Transfer/sec:     72.47KB

Rasa server log:

2019-10-23 12:02:57 DEBUG    rasa.core.policies.memoization  - Current tracker state [{'prev_action_videos_show': 1.0, 'intent_videos': 1.0}]
2019-10-23 12:02:57 DEBUG    rasa.core.policies.memoization  - There is no memorised next action
2019-10-23 12:02:57 DEBUG    rasa.core.policies.mapping_policy  - The mapped action, 'action_videos_show', for this intent, 'videos', was executed last so MappingPolicy is returning to action_listen.
2019-10-23 12:02:57 DEBUG    rasa.core.policies.form_policy  - There is no active form
2019-10-23 12:02:57 DEBUG    rasa.core.policies.ensemble  - Predicted next action using policy_2_MappingPolicy
2019-10-23 12:02:57 DEBUG    rasa.core.processor  - Predicted next action 'action_listen' with confidence 1.00.
2019-10-23 12:02:57 DEBUG    rasa.core.processor  - Action 'action_listen' ended with events '[]'
2019-10-23 12:02:57 DEBUG    rasa.core.lock_store  - Deleted lock for conversation '944515'.
2019-10-23 12:02:57 ERROR    rasa.core.processor  - Encountered an exception while running action 'action_videos_show'. Bot will continue, but the actions events are lost. Please check the logs of your action server for more information.
2019-10-23 12:02:57 DEBUG    rasa.core.processor  -
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/rasa/core/processor.py", line 495, in _run_action
    events = await action.run(output_channel, nlg, tracker, self.domain)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/rasa/core/actions/action.py", line 457, in run
    json=json_body, method="post", timeout=DEFAULT_REQUEST_TIMEOUT
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/rasa/utils/endpoints.py", line 144, in request
    **kwargs,
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client.py", line 1012, in __aenter__
    self._resp = await self._coro
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client.py", line 504, in _request
    await resp.start(conn)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/client_reqrep.py", line 847, in start
    message, payload = await self._protocol.read()  # type: ignore  # noqa
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/aiohttp/streams.py", line 591, in read
    await self._waiter
concurrent.futures._base.CancelledError
2019-10-23 12:02:57 DEBUG    rasa.core.processor  - Action 'action_videos_show' ended with events '[]'
2019-10-23 12:02:57 DEBUG    rasa.core.processor  - Current slot values:
    answer: None
    car: None
    city: None
    name: None
2019-10-23 12:02:57 DEBUG    rasa.core.policies.memoization  - Current tracker state [{'prev_action_videos_show': 1.0, 'intent_videos': 1.0}]
2019-10-23 12:02:57 DEBUG    rasa.core.policies.memoization  - There is no memorised next action

I had a same problem and in my case we had wrong proxy config. It wasn't an aiohttp problem.

my schema

web-client <-> nginx proxy <-> our.aiohttp.app <-> external.web.service

after set this parameters to nginx.conf it started work correct and CancelledError didn't come again

keepalive_timeout 3600;
proxy_read_timeout 3600;

And check your http-client request timeout

I meet the same problem.I write my custom channel for wechat and the wechat client will disconnect me if it doesn't receive message after 5 seconds.If my action takes longer than 5s,I got the problem above.

I was facing the same error and after spending some time found out that Slack is not disconnecting the session but Rasa server is doing it after 60s.

So I have increased Sanic Server RESPONSE_TIMEOUT and REQUEST_TIMEOUT. Now it is working fine (tested with 120s sleep in custom action).

Easy way to set those env variable is -

export SANIC_RESPONSE_TIMEOUT=360
export SANIC_ REQUEST_TIMEOUT=360

@tormath1 , please try this if you haven't resolved the issue and let me know.

Hey @pritampan, I moved to another things since the time I opened this issue. I hope your answer will be helfpful for other users :wink:

Facing the same issue through the RestInput channel as well. Was anyone able to solve this?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This issue has been automatically closed due to inactivity. Please create a new issue if you need more help.

This is still an open issue reported here

In my case this apprated when using botframework channel, Rasa 1.10.8 running on docker images in Centos OS virtual machine in VirtualBox on Windows 10.

Problem is also with the latest 1.10.10-full release

Any update on this issue? Can the timeout be configured? or strategies to avoid action timeouts?

Was this page helpful?
0 / 5 - 0 ratings