Apollo: Problem of performance testing between CyberRT and ROS

Created on 12 Mar 2019  路  16Comments  路  Source: ApolloAuto/apollo

I followed the answer of #7152 to test the performance between CyberRT and ROS.

I run these two similar examples in both Apollo ROS and Cyber RT in apollo3.5 docker:
https://github.com/ApolloAuto/apollo-platform/tree/master/ros/ros_tutorials/roscpp_tutorials
https://github.com/ApolloAuto/apollo/tree/master/cyber/examples

Every 100 milliseconds, the talker publish a "Hello World" string message with a timestamp.
When the listener receive the message, it print a timestamp too.

For Apollo ROS, the transport delay is about 0.17 millisecond.
For CyberRT, the transport delay is about 0.25 millisecond.
I tested many times, the result did not meet the expectation, that's weird.

Cyber RT FAQs said it is highly optimized for performance, latency, concurrency and data throughput,
how Apollo team make this conclusion ? Or do I use a wrong testing method?
How can I test the latency, concurreny and data throughput? need your suggestion. @quning78 @natashadsouza

Cyber Question

Most helpful comment

I just tested 480*640 image message in talker/listener example, get the same result. In Cyber, I use the Image message from /apollo/modules/drivers/proto/sensor_image.proto, relative to the sensor_msgs::Image of ROS. The delays are 0.6ms and 0.5ms, ROS wins a bit.

Hi bssung, thanks for your test!

To decrease number of threads, the readable notification mechanism of shared memory was changed in CyberRT. The default mechanism is UDP multicast, and system call(sendto) will cause some latency.

So, to decrease the latency, you can change the mechanism and then try again! The steps are listed as following:

  1. update the CyberRT to the latest version;
  2. uncomment the transport_conf in https://github.com/ApolloAuto/apollo/blob/master/cyber/conf/cyber.pb.conf;
  3. change notifier_type of shm_conf from "multicast" to "condition";
  4. build CyberRT with opt;
  5. run talker and listener;

In my computer, after changing the notifier_type, the average latency decreased from 317us to 153us;
Thank you again, wish you a happy weekend!

All 16 comments

Thanks for providing the results, we will take a look and get back to you soon.

Again, it took a lot of efforts to prepare CyberRT for open source and it is still relative new in our github repo, so there might be some hiccups or more things to be polished in the current release. Thanks for your interests and understanding.

Hi @bssung , could you tell me which command you used to compile Cyber RT, bash apollo.sh build or bash apollo.sh build_opt?

Hello @fengqikai1414, I used bazel build //cyber/...

Hello @bssung, while testing performance, we should use bazel build -c opt //cyber/... to build Cyber RT锛寃hich mean enable compiler optimization with level 2, as same as -O2.
Additionally, make sure the frequency and size are the same before comparison, which have a certain impact on the delay.
Thanks again for your interests.

Hello @fengqikai1414, there's a Segmentation fault when I run the talker/listener example, after using bazel build -c opt //cyber/... . The listener crashed and no messages received.
But when I build without -c opt command, it's ok.

Thanks for your help. I paste the gdb log here:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f3670ff1700 (LWP 8681)]
0x0000000000000002 in ?? ()
(gdb) bt

0 0x0000000000000002 in ?? ()

1 0x00000000019f4688 in ?? ()

2 0x00007f367dcc64dc in std::condition_variable::wait(std::unique_lock&) ()

from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

3 0x00000000004bac40 in apollo::cyber::scheduler::ClassicContext::Wait() ()

4 0x00000000004bb833 in apollo::cyber::scheduler::Processor::Run() ()

5 0x00007f367dcc9a60 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6

6 0x00007f367e22a184 in start_thread (arg=0x7f36707f0700) at pthread_create.c:312

7 0x00007f367d73703d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

@bssung, try this command bazel build -c opt --copt=-fpic //cyber/..., i think it will be ok.

Thanks @fengqikai1414, the build passed and it became faster, but the delay of CyberRT is still a little bit larger than Apollo ROS.
Maybe I should use some larger messages to test delay, such as image messages or point cloud messages.
Do you have some idea of testing the latency, concurreny and data throughput? Thanks a lot.

@bssung , can you provide more detailed information, such as frequency, type and size of the messages?

@fengqikai1414 , ROS message is defined in this file without modifying, rate is 10, msg is "hello world", type is std msg.
https://github.com/ApolloAuto/apollo-platform/blob/master/ros/ros_tutorials/roscpp_tutorials/talker/talker.cpp
Cyber message is defined in this file, type is Chatter, rate is modified to 10 and msg contend is modified to "hello world".
https://github.com/ApolloAuto/apollo/blob/master/cyber/examples/talker.cc

I just tested 480*640 image message in talker/listener example, get the same result. In Cyber, I use the Image message from /apollo/modules/drivers/proto/sensor_image.proto, relative to the sensor_msgs::Image of ROS. The delays are 0.6ms and 0.5ms, ROS wins a bit.

I just tested 480*640 image message in talker/listener example, get the same result. In Cyber, I use the Image message from /apollo/modules/drivers/proto/sensor_image.proto, relative to the sensor_msgs::Image of ROS. The delays are 0.6ms and 0.5ms, ROS wins a bit.

Hi bssung, thanks for your test!

To decrease number of threads, the readable notification mechanism of shared memory was changed in CyberRT. The default mechanism is UDP multicast, and system call(sendto) will cause some latency.

So, to decrease the latency, you can change the mechanism and then try again! The steps are listed as following:

  1. update the CyberRT to the latest version;
  2. uncomment the transport_conf in https://github.com/ApolloAuto/apollo/blob/master/cyber/conf/cyber.pb.conf;
  3. change notifier_type of shm_conf from "multicast" to "condition";
  4. build CyberRT with opt;
  5. run talker and listener;

In my computer, after changing the notifier_type, the average latency decreased from 317us to 153us;
Thank you again, wish you a happy weekend!

Thank you @gruminions. I have tested based on your reply.
In "hello world" string talker/listener example, the latency of CyberRT is 115us, and ROS is 134us. This is a great improvement.
But in 480*640 image message talker/listener example, which is modified by myself based on the string talker/listener example, the latency of CyberRT is still 100us's longer than ROS. I think image transporting senario is important too, maybe this test result has some worth.

@bssung could you share the number from each test, not just the difference?

To add more information, both Apollo ROS and Cyber RT have been developed by our team, and the transportation layer share the same idea and similar designed among them. So it won't be surprise that they have similar performance numbers for transportation in the basic benchmarks.

However, Cyber RT brings more than that, the name a few, configurable user level scheduler, user level coroutine based task, data fusion and other high performance libraries.

Thanks for your interests!

Thanks @quning78, I would appreciate it if you can give me some suggestion of questions below.
could you share the number from each test, not just the difference?
--Yes. The latency of ROS sensor_msgs::Image is 683us, and the latency of CyberRT sensor_image.proto::Image is 914us. The frequency is 10 and image size is 480*640 bgr8. These numbers is that I tested in an Apollo IPC.
To add more information, both Apollo ROS and Cyber RT have been developed by our team, and the transportation layer share the same idea and similar designed among them. So it won't be surprise that they have similar performance numbers for transportation in the basic benchmarks.
-- Do you mean the transportation latency won't represent the advantage of Cyber RT (compare with ROS)?
However, Cyber RT brings more than that, the name a few, configurable user level scheduler, user level coroutine based task, data fusion and other high performance libraries.
-- 1,configurable user level scheduler
I found some cofig files in apollo-r3.5.0\cyber\conf. Do you have any document to explain these config file? Such as what's the rule to set prio?
-- 2,user level coroutine based task
In my unstanding, Cyber RT use coroutine based task is because switching speed of coroutine is faster than thread.
Does "high concurrency" of Cyber RT based on this theory?
Do you have test numbers about the "high concurrency" in Cyber RT is better than ROS?

I just tested 480*640 image message in talker/listener example, get the same result. In Cyber, I use the Image message from /apollo/modules/drivers/proto/sensor_image.proto, relative to the sensor_msgs::Image of ROS. The delays are 0.6ms and 0.5ms, ROS wins a bit.

Thank you @gruminions. I have tested based on your reply.
In "hello world" string talker/listener example, the latency of CyberRT is 115us, and ROS is 134us. This is a great improvement.
But in 480*640 image message talker/listener example, which is modified by myself based on the string talker/listener example, the latency of CyberRT is still 100us's longer than ROS. I think image transporting senario is important too, maybe this test result has some worth.

Hi bssung, thanks for your try!
As you mentioned, you compared sensor_image.proto with sensor_msgs::Image, the serialization and deserialization of them may cause interference.
So, in order to eliminate interference of protobuf, I compared apollo::cyber::message::RawMessage 锛坅 simple wrapper of std::string锛墂ith std_msgs::String. The source code is listed as following:
1. CyberRT:
talker: https://github.com/gruminions/apollo/blob/record/cyber/examples/talker.cc
listener: https://github.com/gruminions/apollo/blob/record/cyber/examples/listener.cc
conf: https://github.com/gruminions/apollo/blob/record/cyber/conf/cyber.pb.conf

2. ApolloROS:
talker: https://github.com/gruminions/apollo-platform/blob/master/ros/ros_tutorials/roscpp_tutorials/talker/talker.cpp
listener: https://github.com/gruminions/apollo-platform/blob/master/ros/ros_tutorials/roscpp_tutorials/listener/listener.cpp

System Information
Machine: memory: 7.7GiB CPU: Intel Core i5-4460 3.2GHz * 4
OS锛歭inux 4.4.0-119-generic 143~14.04.1-Ubuntu

Testing Conditions
Message Content: a foo string(307200 bytes) + timestamp
Message Frequency: 10Hz
Message Number: 10000

Testing Results
Average Latency of CyberRT: 360us
Average Latency of ApolloROS: 924us

Welcome to continue the discussion, thank you again!

Thanks @gruminions, I agree that it is a more reasonable test method than testing Image messages.

Was this page helpful?
0 / 5 - 0 ratings