Javacpp-presets: Need the RTP/NTP time stamps provided in the RTP packets of H.264-encoded video stream

Created on 2 Feb 2017  Â·  43Comments  Â·  Source: bytedeco/javacpp-presets

I have created an RTSP/RTCP/RTP client written in Java, communicating with a remote video server serving H.264-encoded MPEG video through RTP. I do not have any audio tracks to stream. I see in the FFMPEG source code that in libavformat/rtpdec.c::rtp_parse_packet_internal(RTPDemuxContext *s, AVPacket *pkt, const uint8_t *buf, int len), it reads out the RTP time stamp from the RTP packet. This time stamp, when combined with the NTP time stamp sent in the RTCP protocol in the Sender Reports, can be used to determine the exact time the image frame was sampled and sent by the server. Then, in finalize_packet(RTPDemuxContex *s, AVPacket *pkt, uint32_t timestamp), this time stamp is added to the RTPDemuxContext object and used to calculate some other time stamp values.
image
The question is: How do I get access to this RTPDemuxContext->timestamp from the Java side of the JNI? I DON'T want the time provided by javacpp.avutil.av_frame_get_best_effort_timestamp(avFrame), because that is simply a calculation of the time elapsed since the streaming started, based on the frame rate. How can I get the server's RTP time from within Java?

enhancement help wanted

Most helpful comment

@jrobble Sorry it's taking so long to respond. I have not submitted any of this to be pulled into the git repositories, so I have to look through the changes I made and compile them somehow. Here's a start for the FFmpeg/OpenCV modifications. Use at your own risk - this is a hack and I make no guarantees that it doesn't break some other aspect of the open source software.
OpenCV FFmpeg Changes.docx
And here's a document outlining my changes to VLC and VLCJ (and some dependencies) to use VLC with Java:
VLC Changes.docx

All 43 comments

This is not part of the API of FFmpeg. We would first need to update that...

Is this something that you think others would appreciate? As in, should we request this API change in the FFmpeg project? Or should I attempt to create my own patch to the FFmpeg library and then build it and JavaCV around it?

You would have to modify FFmpeg in any case, and when that's done, you might as well try to submit the patch upstream to the developers of FFmpeg. In the meantime, we could always have a branch of the presets with your patch, sure.

To include the modified FFmpeg into JavaCV, should I just replace the downloaded tar.gz of the FFmpeg source code with my own tar.gz of the patched FFmpeg? Then I have to build javacpp and javacpp-presets before building JavaCV, correct?

Just add a patch in the directory here:
https://github.com/bytedeco/javacpp-presets/tree/master/ffmpeg
And apply it at build time from inside the cppbuild.sh script.

How do you calculate the image creation time based on the rtcp ntp timestamp and the rtp timestamp? I have the same problem currently.

The first packet of RTCP should contain both an NTP timestamp (64bit) and the RTP timestamp (middle 32 bits of the 64-bit NTP timestamp) representing the same moment in time. From then on, every RTP packet contains just the RTP timestamp, and you use the synchronization from the first RTCP packet to calculate the full NTP timestamp from that. Make sense?

Thanks for the explanation. Does it make sense — Yes and no or maybe I understood the rfc not correctly. In the rfc it is mentioned that the rtp timestamp of the rtcp packet is not the same which is in the rtp data paket. So i see there a gap how to have a real reference from the rtcp to the rtp data pakets or do i miss something?

Every RTCP packet should come with the NTP timestamp and the RTP timestamp. The RTCP packets come at the beginning of the video stream (Source Desription packet) and regularly during the stream (Sender Reports). The RTP timestamp in the RTCP packet is the same as in the RTP packets.

When i compare the RTP timestamp of the RTCP paket and the RTP timestamp of the RTP data paket in wireshark i see a difference between both. So how do i synchronize the RTP data and RTCP paket?

The packets arrived at different times, so their timestamps should be slightly different. But you use the difference between the NTP and RTP timestamps in the RTCP packets to determine the offset between RTP timestamps and the NTP time.
image
In Wireshark, you can see the "Timestamp, MSW" and "Timestamp, LSW" - these are the NTP timestamp at the time the RTCP packet was sent. You can also see the "RTP timestamp". Now, to determine the RTP offset, just shift the RTP timestamp left 16 bytes and then subtract from the NTP timestamp. Every RTP packet that comes later will have the RTP timestamp with the same offset from the NTP timestamp.
Here is the next RTP timestamp that I got in Wireshark after the RTCP packet shown above:
image

If you're using Java, I suggest using the Apache Commons TimeStamp object:
https://commons.apache.org/proper/commons-net/apidocs/src-html/org/apache/commons/net/ntp/TimeStamp.html

@ryantheseer:

How can I get the server's RTP time?
Is this something that you think others would appreciate? As in, should we request this API change in the FFmpeg project?

This is something we're very interested in too.

Your support issue https://github.com/caprica/vlcj/issues/536 nearly exactly describes what we're looking to do, but we're doing it in C++. We too have looked into OpenCV, FFmpeg, and libVLC, and there wasn't a clear solution. We also just started looking into live555.

Our preference would be to stick with OpenCV and add the ability to get the server RTP time for the current frame position using something like cv::VideoCapture.get(cv::CAP_PROP_POS_SERVER_RTP_MSEC). Since OpenCV wraps FFmpeg, it would require modifying FFmpeg too, as you and @saudet already pointed out.

Did you end up modifying FFmpeg to do what you want?

One problem is still left. Since the first RTCP sender report is sent after 5 sec there is no reference ntp time for the first images. How can one overcome this problem?

@tandresen77 : When you set up the stream using RTCP handshakes, you should request an RTCP Description packet. That will provide the NTP timestamp before you issue the start command to the stream.

@jrobble Yes, at one point I modified FFmpeg, OpenCV, and the OpenCV Java wrappers in order to extract the NTP timestamp through all the layers. We found that OpenCV was very slow with TCP (buffering up to 30 seconds after a minute of streaming RTSP), and UDP streaming was very low quality for some reason, so I ended up modifying libVLC and live555 instead (and Java wrappers for libVLC) in order to accomplish the same thing. Either way, it's a bit of effort. When we went the LibVLC direction, the fix to FFmpeg wasn't necessary, because it ended up using Live555 for RTSP streaming of H264 encoded MP4s. You might be in a different situation, though!

Do you mean the SDES packet? That does not contain a timestamp as far as i can see and the rtcp sr packets are typically sent after the first rtp packet has been sent.

@tandresen77 Sorry, I was wrong about the SDES packet. If you don't want to display or record video until you receive an NTP-RTP sync packet (in a Sender Report), you could just buffer the video and throw away frames until the first Sender Report is received.

@ryantheseer:

Yes, at one point I modified FFmpeg, OpenCV, and the OpenCV Java wrappers in order to extract the NTP timestamp through all the layers

If you would direct us to those FFmpeg and OpenCV modifications we would greatly appreciate it!

so I ended up modifying libVLC and live555 instead

If you would direct us to those modifications too it would save us a lot of work with our OpenMPF project. We too are trying to determine which solution is right for us.

@jrobble Sorry it's taking so long to respond. I have not submitted any of this to be pulled into the git repositories, so I have to look through the changes I made and compile them somehow. Here's a start for the FFmpeg/OpenCV modifications. Use at your own risk - this is a hack and I make no guarantees that it doesn't break some other aspect of the open source software.
OpenCV FFmpeg Changes.docx
And here's a document outlining my changes to VLC and VLCJ (and some dependencies) to use VLC with Java:
VLC Changes.docx

@ryantheseer, very much appreciated! I can see that you put some time into preparing these docs. I'll look them over in the next few weeks to month and get back to you if we have any questions. If we end up going with one of these solutions we may want to consider a way to make them publicly available to the rest of the world via a fork, branch, or pull request to the respective code bases.

@jrobble That would be awesome if you could make an official patch! I would love to hear about it, if you go that route.

The packets arrived at different times, so their timestamps should be slightly different. But you use the difference between the NTP and RTP timestamps in the RTCP packets to determine the offset between RTP timestamps and the NTP time.
image
In Wireshark, you can see the "Timestamp, MSW" and "Timestamp, LSW" - these are the NTP timestamp at the time the RTCP packet was sent. You can also see the "RTP timestamp". Now, to determine the RTP offset, just shift the RTP timestamp left 16 bytes and then subtract from the NTP timestamp. Every RTP packet that comes later will have the RTP timestamp with the same offset from the NTP timestamp.
Here is the next RTP timestamp that I got in Wireshark after the RTCP packet shown above:
image

@ryantheseer : why shift the RTP timestamp by 16 bytes?
In case I have a video packet, and the clock at 90Khz, and the same RTP and NTP timestamps mentioned in above example.
isn't the units for RTP timestamp different as compared to NTP timestamps. In such a case, would just shifting work? How would the conversion of RTP to NTP be done here ?

In case I have a video packet, and the clock at 90Khz, and the same RTP and NTP timestamps mentioned in above example.
isn't the units for RTP timestamp different as compared to NTP timestamps. In such a case, would just shifting work? How would the conversion of RTP to NTP be done here ?

@venkatesh-kuppan : You should check out the RTP Standard, section 5.1.

Is this method still viable for ffmpeg 4 and python bindings for opencv? I am in a bit of a predicament in trying to obtain the timestamps through opencv's videocapture like @jrobble

You could try it with that toolchain, for sure. The concepts should be applicable. The python versus java bindings would be different, of course, but probably similar idea.
Another totally differnent approach that has worked well for me is to use gstreamer instead of FFMPEG and OpenCV. There's a bit of a learning curve, but gstreamer is much more modular and allows you access to any section of the data pipeline from your application.

Yes, i am finding it a bit difficult as to how python bindings interact differently. I found the correct version of opencv serendipitously. I went the opposite though, from gstreamer to ffmpeg.
Actually, the funny thing is I was trying to get some answers related to the DTS, PTS and the buffer. Couldn't quite understand how they interact with each other as I tried designing my own plugin but failed with Gstreamer.
Mind if i ask you a gstreamer question (unrelated to this issue)? Just wanted to know how would you actually calculate the offsets eg. When the camera started to actually record vs when the frames are being captured.

I think we normally only worry about the PTS because that's when you want to "show" the video with meta data streams at the right time. The DTS would only be important if you wanted to inject something into the decoding module.
Your question on time stamps in gstreamer depends on how the server-side gstreamer code is written. Usually I think the camera would write the time stamp right before it writes it to the RTP packets. But it may be written before that. I might not quite understand your question, though.

Oh, this makes perfect sense. One factor would be that my camera isn't hooked up to a NTP server or any time servers (keeps on failing to test and connect) so the time isn't that accurate. As for my question, apparently when the camera turns on, there will be a slight delay before it actually captures the frames. I was just wondering if you could give any advice or tutorials i can read to learn about coding such a thing in gstreamer. (I am basically a nub).

Edit: Managed to resolve the problem! had to fix the calculation a bit here and there but it worked! Thanks a lot @ryantheseer.

Would love to get this timestamp as well. What's currently the most usable implementation to get this RTC timestamp? ffmpeg in C++? Or has this not yet been integrated officially into any library?

@gerardsimons As far as I know, the best option is gstreamer, because you can enter the pipeline at any point and write your own logic. Ffmpeg only uses the RTC timestamp to align the packets, and then replaces the NTP/UTC time stamp with the "time since the beginning of the stream". I was able to hack ffmpeg to do it, but I had to learn how to compile and build ffmpeg from source myself. It does not support this as-is.

Oh, this makes perfect sense. One factor would be that my camera isn't hooked up to a NTP server or any time servers (keeps on failing to test and connect) so the time isn't that accurate. As for my question, apparently when the camera turns on, there will be a slight delay before it actually captures the frames. I was just wondering if you could give any advice or tutorials i can read to learn about coding such a thing in gstreamer. (I am basically a nub).

Edit: Managed to resolve the problem! had to fix the calculation a bit here and there but it worked! Thanks a lot @ryantheseer.

No problem! Sorry I wasn't able to provide more on gstreamer development. It's a bit of a learning curve, for sure! Very powerful, though.

@gerardsimons As far as I know, the best option is gstreamer, because you can enter the pipeline at any point and write your own logic. Ffmpeg only uses the RTC timestamp to align the packets, and then replaces the NTP/UTC time stamp with the "time since the beginning of the stream". I was able to hack ffmpeg to do it, but I had to learn how to compile and build ffmpeg from source myself. It does not support this as-is.

Thanks @ryantheseer ! Greatly appreciate it. An OpenCV solution would be the nicest for our system right now. If you would make the changes in ffmpeg (I guess that's what you outlined in your docx?) would it be trivial to then have OpenCV access these new ffmpeg attributes or is that not easy at all? Was there no interest from the ffmpeg community to integrate your changes somehow?

If you would make the changes in ffmpeg (I guess that's what you outlined in your docx?) would it be trivial to then have OpenCV access these new ffmpeg attributes or is that not easy at all? Was there no interest from the ffmpeg community to integrate your changes somehow?

Yes, I successfully accessed the RTP timestamps using the OpenCV VideoIO component with ffmpeg after I made those changes. I did not try to submit the changes to ffmpeg, because 1) it's a hack and 2) I chose not to go that route once I found that OpenCV VideoIO was very slow and laggy over TCP/IP to the point of not being usable. UDP might have worked, but gstreamer and VLC were still better.

Oh, this makes perfect sense. One factor would be that my camera isn't hooked up to a NTP server or any time servers (keeps on failing to test and connect) so the time isn't that accurate. As for my question, apparently when the camera turns on, there will be a slight delay before it actually captures the frames. I was just wondering if you could give any advice or tutorials i can read to learn about coding such a thing in gstreamer. (I am basically a nub).
Edit: Managed to resolve the problem! had to fix the calculation a bit here and there but it worked! Thanks a lot @ryantheseer.

No problem! Sorry I wasn't able to provide more on gstreamer development. It's a bit of a learning curve, for sure! Very powerful, though.

No problem @ryantheseer . I was actually in a pickle when I was testing the feasibility of your hack. It came as a no go actually. Used gstreamer and got the timestamps from the buffer instead but I don't know if the buffer gives random timestamps or just timestamps from some clock. Might have to go explore more on that.

If you would make the changes in ffmpeg (I guess that's what you outlined in your docx?) would it be trivial to then have OpenCV access these new ffmpeg attributes or is that not easy at all? Was there no interest from the ffmpeg community to integrate your changes somehow?

Yes, I successfully accessed the RTP timestamps using the OpenCV VideoIO component with ffmpeg after I made those changes. I did not try to submit the changes to ffmpeg, because 1) it's a hack and 2) I chose not to go that route once I found that OpenCV VideoIO was very slow and laggy over TCP/IP to the point of not being usable. UDP might have worked, but gstreamer and VLC were still better.

It really lagged when I was using an OCR to read timestamps from a monitor to actually see if I could get the actual time. Do you mean using libVLC with gstreamer in the other document you posted? I thought it was with java opencv and libVLC? I am really interested in trying to sync some cameras using these RTP timestamps. Sounds like an interesting task to do.

Do you mean using libVLC with gstreamer in the other document you posted? I thought it was with java opencv and libVLC? I am really interested in trying to sync some cameras using these RTP timestamps. Sounds like an interesting task to do.

No, sorry, I meant that I tried both gstreamer and VLC separately, not combined together. In one effort, I was able to hack libVLC (and some open source Java wrappers for it) to get the RTP time stamps and successfully synchronize using them. When I realized that gstreamer allowed access to the RTP time stamps without hacking, I stopped using libVLC and switched to gstreamer. Part of the reason I tried hacking VLC after OpenCV/ffmpeg was that I found that the gstreamer java bindings available online were not up-to-date at the time. Since then, the java bindings were updated and I was able to use the latest gstreamer with them.
Let me know if that makes more sense.

Do you mean using libVLC with gstreamer in the other document you posted? I thought it was with java opencv and libVLC? I am really interested in trying to sync some cameras using these RTP timestamps. Sounds like an interesting task to do.

No, sorry, I meant that I tried both gstreamer and VLC separately, not combined together. In one effort, I was able to hack libVLC (and some open source Java wrappers for it) to get the RTP time stamps and successfully synchronize using them. When I realized that gstreamer allowed access to the RTP time stamps without hacking, I stopped using libVLC and switched to gstreamer. Part of the reason I tried hacking VLC after OpenCV/ffmpeg was that I found that the gstreamer java bindings available online were not up-to-date at the time. Since then, the java bindings were updated and I was able to use the latest gstreamer with them.
Let me know if that makes more sense.

oh it does make sense, the problem I have with Gstreamer mainly is how do you know what clock gstreamer uses/initializes when you're accessing the buffer? Could not really correlate to the exact absolute time at which the frame was captured in gstreamer which really confused me.

the problem I have with Gstreamer mainly is how do you know what clock gstreamer uses/initializes when you're accessing the buffer?

In my case, the GStreamer client was easier to develop because we (as a product team) are in charge of both the camera/server-side implementation and the app/client-side implementation. In the camera GStreamer code, we are replacing the GST pipeline time with the system time, and in fact we're using a separate KLV payload stream to give the system clock time instead of the RTP time. If you don't have access to the server side, you may have to find another way to match up the RTP time to NTP time.

@ryantheseer : that sounds interesting, any chance you could share something about your changes to the GStreamer code / the pipeline?

I can't provide all the details, but here is some of the GStreamer server-side pipeline. The time stamp is being written into a KLV payload to be sent as a separate RTP stream. In our case, we're using something called the V4L2 source in our Linux environment, which just provides the image frames.
image
And part of the client-side pipeline, which decodes the KLV payload to get the timestamp.
image

In the camera GStreamer code, we are replacing the GST pipeline time with the system time

Oh wow, this is very interesting. Might I ask how would you put the system time in the GST pipeline? or can you hint what function I can use to accomplish this?

in fact we're using a separate KLV payload stream to give the system clock time instead of the RTP time. If you don't have access to the server side, you may have to find another way to match up the RTP time to NTP time.

Do you mind explaining on what is a KLV payload stream?

Do you mind explaining on what is a KLV payload stream?

It's a type of RTP payload that is defined as a "Key-length-value" payload. We used the "Precision Time Stamp" key defined by "MISB ST 1603.1"
https://tools.ietf.org/id/draft-ietf-avt-rtp-klv-01.html
https://gwg.nga.mil/misb/st_pubs.html

Might I ask how would you put the system time in the GST pipeline? or can you hint what function I can use to accomplish this?

You should be able to add a KLV RTP stream as shown in the example pipeline, and put in whatever extra information you want into the key-length-value triplet packets.

Was this page helpful?
0 / 5 - 0 ratings