Latency comparisons on RTP, SMPTE FEC, and RIST

It’s been a while since the last time I did some latency measurements of different codecs, and I had some time and resources recently to do some tests on some new encoders I haven’t officially measured before.

The encoder and decoder I had available were the latest version of the OBE C200 encoder and decoder, with a Blackmagic card in both.

These tests were all done with 1080i25 inputs. Marking latency down to milliseconds is possible with specialist equipment, however I’m not very interested in whether a given encode/decode is 426ms or 432ms, within a frame or two is fine.

Method

Play out a looped video which counts down, with frame and field numbers. The following 625i25 dv mov (scaled upto 1080i25 on playout) does just that. The age of the mov shows how long since I last did this measuring!

I then set up the following setup, with no genlock, and started playing the loop. Measurements were not taken at the top of the loop.

The countdown appears on monitor A, then a few seconds later on monitor B.

To measure the latency, take a photo of both frames.

Due to local wiring in my test environment, Monitor A (the input) was on the right of Monitor B. The monitors themselves were identical — Blackmagic SmartView Duos. As the monitors are the same, the delay will be the same.

Taking a photo of the monitor shows the encoder frame, and the decoder frame. You then subtract one from the other and you get a latency in frames. The display (showing both fields at once), coupled with the exposure time of the camera, will mean it’s slightly blurred, but it will give you a latency plus-or-minus a frame.

Above we can see the input was at 2 seconds and 19 frames, and the output was still at 3 seconds and 16 frames. This means the output was 22 frames behind the input. There’s no network delay (both encoder and decoder were on the same vlan on the same switch)

Results

I tested various different error corrections to ensure that the reality matched up with the theory. By and large it did.

The base result for an OBE-OBD chain is 9 frames when set to lowest mode. Not all decoders support lowest mode – which I believe is a 1 frame PPP mode.

Our general field target bitrate is 20mbits, this gives enough bandwidth to not see obvious artefacts on our payloads. Where possible we will aim for 45mbit of video, which is the DPP standard.

20mbit – FEC

Set up a stream with 20mbit of video, in a 20,884kbit MUX, containing 422 10bit video, with no error correction. The decoder was in ‘RTP/FEC’ mode, but no latency dialed in. Encoder latency set to lowest, Decoder latency set to lowest. This came out with the expected 9 frame end to end delay.

Adding in an FEC 20×5 matrix — 20 columns, 5 rows, adds 4 frames. The same applies with a 5×20 matrix, and block aligned vs non-block-aligned makes no difference.

Does this make sense? At 20,884kbit, via an RTP stream of 1,316 bytes per packet, it’s about 2000 packets per second, or 80 packets per frame. A 5×20 matrix will create a 100 packet matrix size, or just over a frame. To decode that matrix then requries another 100 packets of delay, which is upto 2.5 frames, so it’s a little higher than I’d expect.

FEC with a 5×5 matrix only added 3 frames, which makes sense as it’s a smaller matrix size.

20mbit – Dual Streaming

OBE/Ds have the ability to dual stream, via two separate networks, or indeed by timeshifting a repeated packet on a single stream on a single network. This is really helpful in places where bandwidth is cheap, like the far east, but the international links are not neccersarilly reliable. Timeshifting the packets helps when your only connectivity is via a single provider, who may have core network problems which cause networks streams to drop for sub-second outages, which FEC can’t cope with, but timeshifting can.

They do this by adding a receive delay, during which they de-duplicate packets.

The receive delay adds exactly as much latency as you would expect. On top of the 9 frames, adding a 100ms buffer adds 2.5 frames, a 300ms buffer adds 5 frames.

Currently my two links from the US to the UK are running at 78ms and 125ms rtt, so a dual stream buffer would need to be 23.5ms — assuming that the round trip time is symetrical. The actual delay in packets is 110, which at 4,407 packets a second is 24.96ms. With network failures on the routing though, packets may well be sent via different paths, so it’s a judgement call on how much delay to put in to cope with dual streaming. Despite not using FEC or RIST, the decoder much be set to FEC or ARQ mode to ensure the stream works.

20mbit – RIST

OBE has two RIST buffers, one on the encoder, one on the decoder. The encoder buffer has no affect on latency – it’s just how many already-sent packets are kept in memory if a retransmission is asked for. This may wish to be higher than a given decoder’s latency in the case of multicast RIST (haven’t tested that yet)

With no RIST, it’s a 9 frame delay. A 100ms decoder buffer should add 2.5 frames, a 600ms buffer should add 15 frames, an 1100ms should add 27.5 frames. The actual measured differences are 3, 15, and 28 frames, which matches the theory well.

How much delay you need is a much more complex question. I believe the theoretical minimum would be enough time for the retransmit request to occur, which would be just over 1x rtt, however the general feeling I get is that 3xrtt is a good number to aim for, behaviour in various situations will be another investigation.

5mbit and 45mbit

45Mbit makes no difference to latency in RIST, Dual Streaming, or normal streaming. With FEC though, it does make a difference. Again with 20×5 FEC, at 45mbit of video it’s 3 frames of delay.

5Mbit does make a difference – even in no-fec mode. I only have 2 results though so I’m unwilling to believe those figures without retesting.

OBE and OBD video settings

I thought it would be useful to measure how the OBE and OBD video settings differ.

Profile: 422 10bit vs main. Made no difference

Decoder latency: Normal adds 4 frames over Lowest

Encoder latency:

  • Lowest = 0 frames
  • Low(PPP), VBV of 1.2 frames (1/20th bitrate) = 1 frame
  • Normal, VBV of 1.2 frames = 16 frames
  • Normal, VBV of 1 seconds = 61 frames

Summary

Even assuming 3xrtt, at 20mbit, RIST adds less delay at 20mbit than 20×5 FEC for any link upto a 30ms rtt, which is pretty much any fixed link in the UK, it’s only once you go transcontinental that FEC may win out.

It’s early days in our use but being able to assign a different DSCP tag to the retransmits may be helpful in future – that way we can shift them through a queue quickly, but drop them if they start building up too much to ensure they don’t impact the main body of the stream, need to think, model and measure more on that.

Next steps

My next investigation as time allows will be remeasuring latencies – both normal and FEC, on to NTT, Ateme, and Evertz encoders and decoders. The FEC delta should be the same as under an OBE/OBD. I ran measurements on NTTs in 2015, and while they shouldn’t have changed, I don’t have the original figures from those days. The summary put the typical end-to-end latency at 11-12 frames.

References

DV Mov with frame/field number https://newweaver.com/vid/clock10tone.mov

Join the conversation

1 Comment

  1. Actually you are meant to buffer two matrices (the SMPTE spec is very poorly written) and that’s why the latency is higher than you expect.

Leave a comment

Your email address will not be published. Required fields are marked *