Apollo: ROS: rostopic list shows nothing & diagnostics.sh does not update while rosbag playing

Created on 19 Jun 2018  ·  9Comments  ·  Source: ApolloAuto/apollo

song@in_dev_docker:/apollo$ bash apollo.sh build_gpu
System check passed. Build continue ...
[WARNING] ESD CAN library supplied by ESD Electronics does not exist. If you need ESD CAN, please refer to third_party/can_card_library/esd_can/README.md.
[INFO] Start building, please wait ...
INFO: Reading 'startup' options from /apollo/tools/bazel.rc: --batch_cpu_scheduling
[INFO] Building on x86_64...
INFO: Reading 'startup' options from /apollo/tools/bazel.rc: --batch_cpu_scheduling
INFO: Analysed 1470 targets (12 packages loaded).
INFO: Found 1470 targets...
INFO: From Executing genrule //modules/drivers/pandora:pandora_genrule:
<command-line>:0:15: warning: ISO C99 requires whitespace after the macro name [enabled by default]
<command-line>:0:15: warning: ISO C99 requires whitespace after the macro name [enabled by default]
INFO: From Executing genrule //modules/perception/cuda_util:cuda_util_genrule:
-- Configuring done
-- Generating done
-- Build files have been written to: /home/song/.cache/bazel/_bazel_song/540135163923dd7d5820f3ee4b306b32/execroot/apollo/modules/perception/cuda_util/cmake_build
[100%] Built target cuda_util
/home/song/.cache/bazel/_bazel_song/540135163923dd7d5820f3ee4b306b32/execroot/apollo
INFO: From Compiling modules/perception/obstacle/camera/visualizer/glfw_fusion_viewer.cc:
modules/perception/obstacle/camera/visualizer/glfw_fusion_viewer.cc: In member function 'bool apollo::perception::lowcostvisualizer::GLFWFusionViewer::draw_analysis_curve()':
modules/perception/obstacle/camera/visualizer/glfw_fusion_viewer.cc:1327:1: warning: no return statement in function returning non-void [-Wreturn-type]
 }
 ^
INFO: Elapsed time: 1918.030s, Critical Path: 1458.57s
INFO: Build completed successfully, 7253 total actions
============================
[ OK ] Build passed!
[INFO] Took 1922 seconds
============================
song@in_dev_docker:/apollo$ bash scripts/bootstrap.sh 
Started supervisord with dev conf
Start roscore...
voice_detector: started
Dreamview is running at http://localhost:8888
song@in_dev_docker:/apollo$ rostopic list

-------NOTHING SHOWS HERE----

song@in_dev_docker:/apollo$ rosnode list

-------NOTHING SHOWS HERE----

song@in_dev_docker:/apollo$ env
CPLUS_INCLUDE_PATH=/usr/local/cuda-8.0/include:
HOSTNAME=in_dev_docker
TERM=xterm
ROS_ROOT=/home/tmp/ros/share/ros
ROS_PACKAGE_PATH=/home/tmp/ros/share:/home/tmp/ros/stacks
APOLLO_BIN_PREFIX=/apollo/bazel-bin
ROS_MASTER_URI=http://localhost:11311
USER=song
LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/home/tmp/ros/lib:/apollo/lib:/apollo/bazel-genfiles/external/caffe/lib:/home/caros/secure_upgrade/depend_lib
LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lz=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:
APOLLO_IN_DOCKER=true
CPATH=/home/tmp/ros/include
ROS_DOMAIN_ID=68321777
PATH=/usr/local/cuda-8.0/bin:/home/tmp/ros/bin:/apollo/scripts:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
C_INCLUDE_PATH=/usr/local/cuda-8.0/include:
DOCKER_GRP=song
PWD=/apollo
DOCKER_GRP_ID=1000
ROSLISP_PACKAGE_DIRECTORIES=
DOCKER_USER_ID=1000
DOCKER_USER=song
SHLVL=1
HOME=/home/song
ROS_DISTRO=indigo
PYTHONPATH=/usr/local/lib/python2.7/dist-packages:/apollo/py_proto:/usr/local/apollo/snowboy/Python:/apollo/modules/tools:/home/tmp/ros/lib/python2.7/dist-packages
PKG_CONFIG_PATH=/home/tmp/ros/lib/pkgconfig
LESSOPEN=| /usr/bin/lesspipe %s
DOCKER_IMG=registry.docker-cn.com/apolloauto/apollo:dev-x86_64-20180413_2000
CMAKE_PREFIX_PATH=/home/tmp/ros
DISPLAY=:0.0
LESSCLOSE=/usr/bin/lesspipe %s %s
APOLLO_BASE_SOURCED=1
ROS_ETC_DIR=/home/tmp/ros/etc/ros
_=/usr/bin/env
OLDPWD=/apollo/third_party/ros_x86_64
song@in_dev_docker:/apollo$ ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  18240   280 pts/23   Ss+  08:47   0:00 /bin/bash
song        51  0.0  0.0  20972  3116 pts/24   Ss   08:47   0:00 /bin/bash
song       119  0.3  0.1  50472  9196 ?        Ss   08:48   0:12 /usr/bin/python /usr/local/bin/supervisord -c /apollo/modules/tools/supervisord/dev.conf
song       127  0.1  0.0 542808  3560 pts/24   Sl   08:48   0:05 /usr/bin/python /home/tmp/ros/bin/roscore
song       133  0.7  0.0 1039604 7372 ?        Sl   08:48   0:28 /apollo/bazel-bin/modules/monitor/monitor --flagfile=/apollo/modules/monitor/conf/monitor.conf
song       152  0.1  0.0 747584  4060 ?        Ssl  08:48   0:06 /usr/bin/python /home/tmp/ros/bin/rosmaster --core -p 11311 -w 3 __log:=/home/song/.ros/log/738498b2-735a-11e8-b711-509a4c2d2f83/master.log
song       169  0.3  0.0 499240   368 ?        Ssl  08:48   0:14 /home/tmp/ros/lib/rosout/rosout __name:=rosout __log:=/home/song/.ros/log/738498b2-735a-11e8-b711-509a4c2d2f83/rosout-1.log
song       185  0.1  0.0 711548  1064 pts/24   Sl   08:48   0:05 python modules/tools/voice_detection/snowboy_detector.py
song       191  2.1  0.0 5045096 1444 ?        Sl   08:48   1:26 /apollo/bazel-bin/modules/dreamview/dreamview --flagfile=/apollo/modules/dreamview/conf/dreamview.conf
song      2422  0.1  0.5 711548 45284 pts/24   Sl   09:47   0:00 python modules/tools/voice_detection/snowboy_detector.py
song      2467  0.0  0.0  15584  2012 pts/24   R+   09:55   0:00 ps aux

CarOS Docker Question

All 9 comments

A week ago, apollo and ros all work very well. no commit on apollo docker images.

bootstap.sh redirects stdout/stderr of execution to various files, so errors do not show up on the console.

Can you please paste the contents of these files (from within the dev docker)?
/apollo/data/log/roscore.out
/tmp/supervisord.start.log

That will give us clues as to what is failing.

BTW, I see roscore as part of your running processes, but /rosout isn't part of your registered topics. This is already pointing to a potential problem with your ROS because /rosout should have appeared under both topics and nodes. If you just run
roscore
manually you should see this:
...
process[master]: started with pid [504]
ROS_MASTER_URI=http://in_dev_docker:11311/

setting /run_id to aebeff98-73e7-11e8-a890-7085c2287315
process[rosout-1]: started with pid [524]
started core service [/rosout]
...
Please share with us what you see.

thanks a lot.

[ OK ] Enjoy!
song@in_dev_docker:/apollo$ uname -a
Linux in_dev_docker 4.2.0-27-generic #32~14.04.1-Ubuntu SMP Fri Jan 22 15:32:26 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
song@in_dev_docker:/apollo$ bash scripts/bootstrap.sh 
Started supervisord with dev conf
Start roscore...
voice_detector: started
Dreamview is running at http://localhost:8888
song@in_dev_docker:/apollo$ cat /tmp/supervisord.start.log 
song@in_dev_docker:/apollo$ cat /apollo/data/log/roscore.out 
song@in_dev_docker:/apollo$ roscore
... logging to /home/song/.ros/log/6d4d2904-7437-11e8-bfa5-509a4c2d2f83/roslaunch-in_dev_docker-452.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://in_dev_docker:36205/
ros_comm version 1.11.21


SUMMARY
========

PARAMETERS
 * /rosdistro: indigo
 * /rosversion: 1.11.21

NODES

roscore cannot run as another roscore/master is already running. 
Please kill other roscore/master processes before relaunching.
The ROS_MASTER_URI is http://in_dev_docker:11311/
The traceback for the exception was written to the log file
song@in_dev_docker:/apollo$ bash scripts/bootstrap.sh stop
dreamview: stopped
voice_detector: stopped
monitor: stopped
roscore: stopped
song@in_dev_docker:/apollo$ roscore
... logging to /home/song/.ros/log/d520bbd6-7437-11e8-8e62-509a4c2d2f83/roslaunch-in_dev_docker-506.log
Checking log directory for disk usage. This may take awhile.
Press Ctrl-C to interrupt
Done checking log file disk usage. Usage is <1GB.

started roslaunch server http://in_dev_docker:40673/
ros_comm version 1.11.21


SUMMARY
========

PARAMETERS
 * /rosdistro: indigo
 * /rosversion: 1.11.21

NODES

auto-starting new master
process[master]: started with pid [521]
ROS_MASTER_URI=http://in_dev_docker:11311/

setting /run_id to d520bbd6-7437-11e8-8e62-509a4c2d2f83
process[rosout-1]: started with pid [538]
started core service [/rosout]


-----IN ANOTHER TERMINAL------
song@in_dev_docker:/apollo$ rosnode list; rostopic list


song@in_dev_docker:/apollo$ exit
exit
song@songPC:~$ docker ps
CONTAINER ID        IMAGE                                                                           COMMAND             CREATED             STATUS              PORTS               NAMES
f9e1e92aa48e        registry.docker-cn.com/apolloauto/apollo:dev-x86_64-20180413_2000               "/bin/bash"         8 minutes ago       Up 8 minutes                            apollo_dev
a9ea6ec560ab        registry.docker-cn.com/apolloauto/apollo:yolo3d_volume-x86_64-latest            "/bin/sh"           8 minutes ago       Up 8 minutes                            apollo_yolo3d_volume
d36b77463c6e        registry.docker-cn.com/apolloauto/apollo:localization_volume-x86_64-latest      "/bin/sh"           8 minutes ago       Up 8 minutes                            apollo_localization_volume
b82bd9b04fd9        registry.docker-cn.com/apolloauto/apollo:map_volume-sunnyvale_loop-latest       "/bin/bash"         8 minutes ago       Up 8 minutes                            apollo_map_volume-sunnyvale_loop
a4a259dbf978        registry.docker-cn.com/apolloauto/apollo:map_volume-sunnyvale_big_loop-latest   "/bin/sh"           8 minutes ago       Up 8 minutes                            apollo_map_volume-sunnyvale_big_loop
song@songPC:~$ 

I also encountered the same problem. It can't be solved by deleting the docker image and the bazel cache files.

Solved. Caused by ROS_DOMAIN_ID conflict.

Solutions 1:
[in_dev_docker] In each bash terminal participate the ROS, try
env |grep -i domain|cut -c21-22
And
export ROS_DOMAIN_ID=`hostname -I | sed 's/[^0-9]//g' | cut -c5-10`"XX"; env |grep -i domain_id

ATTENTION: replace “XX” to number like “88”, ”99” etc, but differs to what you get from env |grep -i domain|cut -c21-22

Solutions 2:
[in_dev_docker] open /home/tmp/ros/setup.sh, in the lasted line, you will find

export ROS_DOMAIN_ID=`hostname -I | sed 's/[^0-9]//g' | cut -c5-10`"77"

Change 77 to number like “88”, ”99” etc.
Commit the docker image with tag “YOURIMAGETAG ” and restart docker with “local” option:
Bash docker/scripts/dev_start.sh -t YOURIMAGETAG -l

These solutions still have chance lead to fail, try to change the number XX and retry.

To Apollo team:
ROS_DOMAIN_ID=`hostname -I | sed 's/[^0-9]//g' | cut -c5-10`"77" cut middle of host IPs as ROS_DOMAIN_ID, possibility exists there are same ROS_DOMAIN_IDs in LAN.
For exp: “hostname -I” may returns IPs like
192.168.3.2 172.17.0.1
Or 192.168.3.217 172.17.0.1
We get the same ROS_DOMAIN_ID 68321777.
Moreover, In apollo-platform/ros/third_party/fast-rtps_x86_64/include/fastrtps/rtps/attributes/RTPSParticipantAttributes.h:

class PortParameters
{
    public:
        PortParameters()
        {
            portBase = 7400;
            participantIDGain = 2;
            domainIDGain = 250;
            offsetd0 = 0;
            offsetd1 = 10;
            offsetd2 = 1;
            offsetd3 = 11;
        };
        virtual ~PortParameters(){}
        /**
         * Get a multicast port based on the domain ID.
         *
         * @param domainId Domain ID.
         * @return Multicast port
         */
        inline uint32_t getMulticastPort(uint32_t domainId)
        {
            return portBase+ domainIDGain * domainId+ offsetd0;
         }
............
}

It’s not guaranteed getMulticastPort() return the distinct MulticastPort in the LAN even the Domain ID distinct to each other.
The permanent solution could be modify the corresponding mentioned source code and republish docker image.

I guess if fomat output as following, it could work better.

hostname -I | sed 's/[^0-9]//g' | cut -c5-10`"77"  #replace this line to get the following output
            ->192.168.3.2 172.17.0.1  #find the idx of space then reverse cut 6 characters back
           ->216832
           ->216832+randNumber+same_randNumber

To Apollo team:
We have also suffered with this equation for deriving ROS_DOMAIN_ID

We had the opposite problem. Machines were on the same subnet, but ended up with different ROS_DOMAIN_ID's

E.g. if the two nodes have short IPs such as:
192.168.1.13
192.168.1.18
The ROS_DOMAIN_IDs using the Apollo original equation end up being:
68113177
68118177
and the devices didn't connect.

It looks like multiple people are being affected by this problem in different ways. To address Song's point, we actually expect that if two machines are on the same subnet (equal except last number), we want them to have same ROS_DOMAIN_ID. He is expecting them to be different. So a clarification on the intent is also important, and then the equation can be fixed to match the intent.

Thanks

@songhanchen @osaman88 Thank you for reporting this issue and providing a detailed analysis. Rtps selects a port to communicate based on the doman id, and sometimes the selected port is occupied, resulting in failure to communicate. We are looking for a more effective mechanism to set the domain id, try to avoid port conflicts.
In addition, if communication is found to fail, you can manually modify the domain id in the following way.

export ROS_DOMAIN_ID=212

Closed. Reopen if you still have questions.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

lyletbjohnson picture lyletbjohnson  ·  3Comments

Triangle001 picture Triangle001  ·  3Comments

poutyface picture poutyface  ·  3Comments

zmsunnyday picture zmsunnyday  ·  3Comments

c-xyli picture c-xyli  ·  3Comments