Minimal setup to reproduce the issue:
Observations/ Issues to be addressed:
ERROR:tcp accept error: Too many open files (#1119)Sample (short) trace of stdout when the stalling happens:
...
================================================================================
[2019.01.08-22.02.55:704][ 0]LogInit: Display: Game Engine Initialized.
[2019.01.08-22.02.55:704][ 0]LogGameplayTags: Display: UGameplayTagsManager::DoneAddingNativeTags. DelegateIsBound: 0
[2019.01.08-22.02.55:704][ 0]LogStats: UGameplayTagsManager::ConstructGameplayTagTree: Construct from data asset - 0.000 s
[2019.01.08-22.02.55:704][ 0]LogStats: UGameplayTagsManager::ConstructGameplayTagTree: GameplayTagTreeChangedEvent.Broadcast - 0.000 s
[2019.01.08-22.02.55:718][ 0]LogInit: Display: Starting Game.
[2019.01.08-22.02.55:718][ 0]LogNet: Browse: /Game/Carla/Maps/Town01?Name=Player
[2019.01.08-22.02.55:731][ 0]LogLoad: LoadMap: /Game/Carla/Maps/Town01?Name=Player
[2019.01.08-22.02.58:337][ 0]LogAIModule: Creating AISystem for world Town01
[2019.01.08-22.02.59:354][ 0]LogLoad: Game class is 'TheNewCarlaGameMode_C'
[2019.01.08-22.02.59:768][ 0]LogWorld: Bringing World /Game/Carla/Maps/Town01.Town01 up for play (max tick rate 0) at 2019.01.08-17.02.59
[2019.01.08-22.02.59:795][ 0]LogWorld: Bringing up level for play took: 0.437414
[2019.01.08-22.02.59:813][ 0]LogCarlaServer: Initializing rpc-server at port 37382
[2019.01.08-22.02.59:821][ 0]LogCarlaServer: New episode 'Town01' started
[2019.01.08-22.02.59:826][ 0]LogLoad: Took 4.095626 seconds to LoadMap(/Game/Carla/Maps/Town01)
[2019.01.08-22.03.00:880][ 0]LogLoad: (Engine Initialization) Total time: 6.40 seconds
[2019.01.08-22.03.01:057][ 0]LogRenderer: Reallocating scene render targets to support 856x640 Format 10 NumSamples 1 (Frame:1).
[2019.01.08-22.03.01:651][ 1]LogLinux: Setting swap interval to 'Immediate'
[33m[2019.01.08-22.03.01:651][ 1]LogLinux: Warning: Unable to set desired swap interval 'Immediate'
[0m[2019.01.08-22.03.01:652][ 1]LogCarla: Starting AWorldObserver sensor
[33m[2019.01.08-22.03.56:994][626]LogHttp: Warning: 0x7fbb68fd5c80: request failed, libcurl error: 6 (Couldn't resolve host name)
[0m[33m[2019.01.08-22.03.56:994][626]LogHttp: Warning: 0x7fbb68fd5c80: libcurl info message cache 0 (Could not resolve host: datarouter.ol.epicgames.com)
[0m[33m[2019.01.08-22.03.56:994][626]LogHttp: Warning: 0x7fbb68fd5c80: libcurl info message cache 1 (Closing connection 0)
Related Issues: #1
CARLA Version: 0.9.2
OS: Ubuntu
Another reason for a crash:
terminating with uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >: bind: Address already in use
Signal 6 caught.
Malloc Size=131076 LargeMemoryPoolOffset=131092
Malloc Size=65535 LargeMemoryPoolOffset=196655
Malloc Size=51811 LargeMemoryPoolOffset=248483
Aborted (core dumped)
I noticed another run-time error (it may be happening after the server crashes). Looks relevant to #1119 :
rpc::timeout: Timeout of 60000ms while calling RPC function 'destroy_actor'
Another one: RuntimeError: rpc::rpc_error during call in function spawn_actor
This seems irrecoverable once it happens. It will be better if it is raised with some traceable information/reason.
rpc::timeout: Timeout of 60000ms while calling RPC function 'destroy_actor'
This happens because our scripts try to destroy the actors on exit, if the server has crashed they try to connect but the server is no longer there.
RuntimeError: rpc::rpc_error during call in function spawn_actor
This is usually due to collision in the spawn position, if that's the case it's safe to try to spawn again somewhere else, or alternatively, use try_spawn_actor that returns None instead of raising an exception. The lack of information is a known issue, the spawn function returns a string with the cause of failure but there is a problem retrieving this message on the client-side. I'll open another issue with this for people looking for this same message (#1095).
Another reason for a crash:
terminating with uncaught exception of type boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system>_error> > : bind: Address already in use
The bind: Address already in use issue is still present with the 0.9.3 release. The process might just need to try and bind to an unused port or even better, it can bind() to port 0 in which case the OS will allocate an unused port.
@nsubiron If you have a fix in place for the bind: Address already in use issue, can you please push to a branch (some other branch if not to master) ?
@praveen-palanisamy You can find latest build here. Now you can launch the simulator with -carla-streaming-port= to select the streaming port (and -carla-rpc-port= for main port). If the streaming port is set to 0, a random available port is chosen.
@nsubiron Nice! Thanks for the fix. It seems to be working well.
We can probably close this issue once it is merged into master.
Great :)
Most helpful comment
Another reason for a crash: