Rasa version: 1.1.4
Rasa X version: 0.20.0
Issue:
If the user clicks on the Train button and the training fails, there is no indication that the training failed nor any error message to help the user track down what went wrong. The only indication that there was a problem is that the model doesn't show up.
Running under the docker setup, I spent some time trying to determine what went wrong and found the error in browser network tab.
There's also no indication with the training has completed. A progress bar would be nice.
The Rasa X user interface needs to display training error status messages. It would also be nice to have a log page to view the Rasa X and underlying docker logs if the user is running under a docker setup.
Same issue here.
It would be great to have access to the training logs in a specific page. It could give details on the current status (when training) and clues it the training fail.
@abhilasharoy do we already have this as part of any design?
@tmbo In the past, we designed toast options for each of the stages of training.
A lot of this was moved out of priority, and we have only implemented toasts for the "training in progress" and "training complete" scenarios. cc @Saladdin
Here are some of the options:

This is meant to be tackled by our websockets implementation but we've deprioritised it for other things at the moment. But we do have firm plans for this
I've encountered this same issue in the Stories UI where an API error response is not reported. I add a new story, click Save and nothing happens. The Save & Cancel options are still available.
I open the network tab in the browser and see that the /api/stories endpoint returned an error 500.
I then go to the server log and find this message:
rasa.core.domain.InvalidDomain: Duplicate actions in domain. These actions occur more than once in the domain: 'dynamic_form'
There are really two issues going on here. One is the UI not reporting the error but there's also an issue with the API and the 500 error. To troubleshoot this error, the user needs the error message that can be found only in the server side log. The API needs to return that error message.
I'm also running into a new training issue. When I start training in the UI it fails silently. I check the network tab and see the 500 but at least the API gave details of the failure. I think this message is telling me that I'm out of memory.
{"version":"0.21.3","status":"failure","message":"Failed to train a Rasa model.","reason":"StackTrainingFailed","details":{"args":["500, Internal Server Error, body='b'{\"version\":\"1.3.9\",\"status\":\"failure\",\"message\":\"An unexpected error occurred during training. Error: MemoryError:
...
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.\\n\\n\\t [[IteratorGetNext]]\\nHint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.\\n\",\"reason\":\"TrainingError\",\"details\":{},\"help\":null,\"code\":500}"},"help":null,"code":500}
Hi @rgstephens ,
I am also facing the same issue while doing training from Rasa X UI.
version: "0.21.5", status: "failure", message: "Failed to train a Rasa model.",…}
code: 500
details: {args: [], message: null}
args: []
message: null
help: null
message: "Failed to train a Rasa model."
reason: "StackTrainingFailed"
status: "failure"
version: "0.21.5"
Have you got the solution of this?
Thanks and Regards
Harsh
@kapoorh Have you looked at the Rasa log for your worker to determine the root cause of the training failure.
docker-compose logs rasa-worker
@Saladdin can you update the issue / test
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
@Saladdin Can you please check if this is still a thing or if it can be closed?
@Saladdin May be we should include this one with the Global UI issues?
Let's do it 👍
Is this still an issue? As far as I know, it should be fixed by https://github.com/RasaHQ/rasa-x/pull/2998
Yes, thanks. Failed training is now reported. Will open a separate enhancement request to provide failure details.
{version: "0.31.0", status: "failure", message: "Failed to train a Rasa model.",…}
code: 500
details: {}
help: null
message: "Failed to train a Rasa model."
reason: "StackTrainingFailed"
status: "failure"
version: "0.31.0"
And worker container logs:
/opt/venv/lib/python3.7/site-packages/rasa/core/brokers/pika.py:294: FutureWarning: Your Pika event broker config contains the deprecatedqueuekey. Please use thequeueskey instead. self.queues = self._get_queues_from_args(queues, kwargs)2020-09-16 03:58:53 ERROR pika.adapters.utils.io_services_utils - Socket failed to connect: ; error=111 (Connection refused)2020-09-16 03:58:53 ERROR pika.adapters.utils.connection_workflow - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection refused'); dest=(, , 6, '', ('172.20.0.4', 5672))2020-09-16 03:58:53 ERROR pika.adapters.utils.connection_workflow - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')2020-09-16 03:58:58 ERROR pika.connection - Connection closed while authenticating indicating a probable authentication error2020-09-16 03:58:58 WARNING rasa.core.brokers.pika - Connecting to 'rabbit' failed with error 'ConnectionClosedByBroker: (403) 'ACCESS_REFUSED - Login was refused using authentication mechanism PLAIN. For details see the broker logfile.''. Trying again.2020-09-16 03:58:58 ERROR pika.adapters.utils.connection_workflow - AMQPConnector - reporting failure: AMQPConnectorAMQPHandshakeError: ProbableAuthenticationError: Client was disconnected at a connection stage indicating a probable authentication error: ("ConnectionClosedByBroker: (403) 'ACCESS_REFUSED - Login was refused using authentication mechanism PLAIN. For details see the broker logfile.'",)2020-09-16 03:58:58 ERROR pika.adapters.utils.connection_workflow - AMQPConnectionWorkflow - reporting failure: AMQPConnectionWorkflowFailed: 2 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: ProbableAuthenticationError: Client was disconnected at a connection stage indicating a probable authentication error: ("ConnectionClosedByBroker: (403) 'ACCESS_REFUSED - Login was refused using authentication mechanism PLAIN. For details see the broker logfile.'",); first exception - AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')2020-09-16 03:58:58 ERROR pika.adapters.base_connection - Full-stack connection workflow failed: AMQPConnectionWorkflowFailed: 2 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: ProbableAuthenticationError: Client was disconnected at a connection stage indicating a probable authentication error: ("ConnectionClosedByBroker: (403) 'ACCESS_REFUSED - Login was refused using authentication mechanism PLAIN. For details see the broker logfile.'",); first exception - AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')2020-09-16 03:58:58 ERROR pika.adapters.base_connection - Self-initiated stack bring-up failed: AMQPConnectionWorkflowFailed: 2 exceptions in all; last exception - AMQPConnectorAMQPHandshakeError: ProbableAuthenticationError: Client was disconnected at a connection stage indicating a probable authentication error: ("ConnectionClosedByBroker: (403) 'ACCESS_REFUSED - Login was refused using authentication mechanism PLAIN. For details see the broker logfile.'",); first exception - AMQPConnectorSocketConnectError: ConnectionRefusedError(111, 'Connection refused')2020-09-16 05:28:49.051080: E tensorflow/stream_executor/cuda/cuda_driver.cc:351] failed call to cuInit: UNKNOWN ERROR (303)
0.32.2 anyone know's how to solve this?
/opt/venv/lib/python3.7/site-packages/rasa/core/brokers/pika.py:294: FutureWarning: Your Pika event broker config contains the deprecated `queu
e` key. Please use the `queues` key instead.
rasa-worker_1 | self.queues = self._get_queues_from_args(queues, kwargs)
rasa-worker_1 | 2020-10-15 15:15:23 ERROR pika.adapters.utils.io_services_utils - Socket failed to connect: <socket.socket fd=21, family=AddressFamily.AF_I
NET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('172.26.0.7', 58092)>; error=111 (Connection refused)
rasa-worker_1 | 2020-10-15 15:15:23 ERROR pika.adapters.utils.connection_workflow - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection
refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.26.0.3', 5672))
rasa-worker_1 | 2020-10-15 15:15:23 ERROR pika.adapters.utils.connection_workflow - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: Con
nectionRefusedError(111, 'Connection refused')
rasa-worker_1 | 2020-10-15 15:15:28 ERROR pika.adapters.utils.io_services_utils - Socket failed to connect: <socket.socket fd=25, family=AddressFamily.AF_I
NET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('172.26.0.7', 58114)>; error=111 (Connection refused)
rasa-worker_1 | 2020-10-15 15:15:28 ERROR pika.adapters.utils.connection_workflow - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection
refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.26.0.3', 5672))
rasa-worker_1 | 2020-10-15 15:15:28 ERROR pika.adapters.utils.connection_workflow - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: Con
nectionRefusedError(111, 'Connection refused')
rasa-worker_1 | 2020-10-15 15:15:33 ERROR pika.adapters.utils.io_services_utils - Socket failed to connect: <socket.socket fd=25, family=AddressFamily.AF_I
NET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('172.26.0.7', 58120)>; error=111 (Connection refused)
rasa-worker_1 | 2020-10-15 15:15:33 ERROR pika.adapters.utils.connection_workflow - TCP Connection attempt failed: ConnectionRefusedError(111, 'Connection
refused'); dest=(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('172.26.0.3', 5672))
rasa-worker_1 | 2020-10-15 15:15:33 ERROR pika.adapters.utils.connection_workflow - AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: Con
nectionRefusedError(111, 'Connection refused')
Most helpful comment
Let's do it 👍