Re: [Moq] Latency @ Twitch

Justin said:

"As others have noted, BBR does not work great out of the box for realtime
scenarios."

[BA] At the ICCRG meeting on Monday, there was an update on BBR2:
https://datatracker.ietf.org/meeting/112/materials/slides-112-iccrg-bbrv2-update-00.pdf

While there are some improvements, issues such as "PROBE_RTT" and rapid
rampup after loss remain, and overall, it doesn't seem like BBR2 is going
to help much with realtime scenarios.  Is that fair?

On Tue, Nov 9, 2021 at 12:46 PM Justin Uberti <
juberti@alphaexplorationco.com> wrote:

> Ultimately we found that it wasn't necessary to standardize the CC as long
> as the behavior needed from the remote side (e.g., feedback messaging)
> could be standardized.
>
> As others have noted, BBR does not work great out of the box for realtime
> scenarios. The last time this was discussed, the prevailing idea was to
> allow the QUIC CC to be used as a sort of circuit-breaker, but within that
> envelope the application could use whatever realtime algorithm it preferred
> (e.g, goog-cc).
>
> On Thu, Nov 4, 2021 at 3:58 AM Piers O'Hanlon <piers.ohanlon@bbc.co.uk>
> wrote:
>
>>
>> On 3 Nov 2021, at 21:46, Luke Curley <kixelated@gmail.com> wrote:
>>
>> Yeah, there's definitely some funky behavior in BBR when application
>> limited but it's nowhere near as bad as Cubic/Reno. With those
>> algorithms you need to burst enough packets to fully utilize the congestion
>> window before it can be grown. With BBR I believe you need to burst just
>> enough to fully utilize the pacer, and even then this condition
>> <https://source.chromium.org/chromium/chromium/src/+/master:net/third_party/quiche/src/quic/core/congestion_control/bbr_sender.cc;l=393> lets
>> you use application-limited samples if they would increase the send rate.
>>
>> And there’s also the idle cwnd collapse/reset behaviour to consider if
>> you’re sending a number of frames together and their inter-data gap exceeds
>> the RTO - I’m not quite sure how the various QUIC stacks have translated
>> RFC2861/7661 advice on this…?
>>
>> I started with BBR first because it's simpler, but I'm going to try out
>> BBR2 at some point because of the aforementioned PROBE_RTT issue. I don't
>> follow the congestion control space closely enough; are there any notable
>> algorithms that would better fit the live video use-case?
>>
>> I guess Google’s Goog_CC appears to be well used in the WebRTC space
>> (e.g. WEBRTC
>> <https://webrtc.googlesource.com/src/+/refs/heads/main/modules/congestion_controller/goog_cc>
>>  and aiortc
>> <https://github.com/aiortc/aiortc/blob/1a192386b721861f27b0476dae23686f8f9bb2bc/src/aiortc/rate.py#L271>)
>> despite the draft
>> <https://datatracker.ietf.org/doc/html/draft-ietf-rmcat-gcc> never
>> making it to RFC status… There's also SCREAM
>> <https://datatracker.ietf.org/doc/rfc8298/> which has an open
>> source implementation <https://github.com/EricssonResearch/scream> but
>> not sure how widely deployed it is.
>>
>>
>> On Wed, Nov 3, 2021 at 2:12 PM Ian Swett <ianswett@google.com> wrote:
>>
>>> From personal experience, BBR has some issues with application limited
>>> behavior, but it is still able to grow the congestion window, at least
>>> slightly, so it's likely an improvement over Cubic or Reno.
>>>
>>> On Wed, Nov 3, 2021 at 4:40 PM Luke Curley <kixelated@gmail.com> wrote:
>>>
>>>> I think resync points are an interesting idea although we haven't
>>>> evaluated them. Twitch did push for S-frames in AV1 which will be another
>>>> option in the future instead of encoding a full IDR frame at these resync
>>>> boundaries.
>>>>
>>>> An issue is you have to make the hard decision to abort the current
>>>> download and frantically try to pick up the pieces before the buffer
>>>> depletes. It's a one-way door (maybe your algorithm overreacted) and you're
>>>> going to be throwing out some media just to redownload it at a lower
>>>> bitrate.
>>>>
>>>> Ideally, you could download segments in parallel without causing
>>>> contention. The idea is to spend any available bandwidth on the new segment
>>>> to fix the problem, and any excess bandwidth on the old segment in
>>>> the event it arrives before the player buffer actually depletes. That's
>>>> more or less the core concept for what we've built using QUIC, and it's
>>>> compatible with resync points if we later go down that route.
>>>>
>>>>
>>>> And you're exactly right Piers. The fundamental issue is that a web
>>>> player lacks the low level timing information required to infer
>>>> the delivery rate. You would want something like BBR's rate estimation
>>>> <https://datatracker.ietf.org/doc/html/draft-cheng-iccrg-delivery-rate-estimation> which
>>>> inspects the time delta between packets to determine the send rate. That
>>>> gets really difficult when the OS and browser delay flushing data to the
>>>> application, be it for performance reasons or due to packet loss (to
>>>> maintain head-of-line blocking).
>>>>
>>>> I did run into CUBIC/Reno not being able to grow the congestion window
>>>> when frames are sent one at a time (application limited). I don't believe
>>>> BBR suffers from the same problem though due to the aforementioned rate
>>>> estimator.
>>>>
>>>> On Wed, Nov 3, 2021 at 10:05 AM Ali C. Begen <ali.begen@networked.media>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> > On Nov 3, 2021, at 6:50 PM, Piers O'Hanlon <piers.ohanlon@bbc.co.uk>
>>>>> wrote:
>>>>> >
>>>>> >
>>>>> >
>>>>> >> On 2 Nov 2021, at 20:39, Ali C. Begen <ali.begen=
>>>>> 40networked.media@dmarc.ietf.org> wrote:
>>>>> >>
>>>>> >>
>>>>> >>
>>>>> >>> On Nov 2, 2021, at 3:39 AM, Luke Curley <kixelated@gmail.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Hey folks, I wanted to quickly summarize the problems we've run
>>>>> into at Twitch that have led us to QUIC.
>>>>> >>>
>>>>> >>>
>>>>> >>> Twitch is a live one-to-many product. We primarily focus on video
>>>>> quality due to the graphical fidelity of video games. Viewers can
>>>>> participate in a chat room, which the broadcaster reads and can respond to
>>>>> via video. This means that latency is also somewhat important to facilitate
>>>>> this social interaction.
>>>>> >>>
>>>>> >>> A looong time ago we were using RTMP for both ingest and
>>>>> distribution (Flash player). We switched to HLS for distribution to gain
>>>>> the benefit of 3rd party CDNs, at the cost of dramatically increasing
>>>>> latency. A later project lowered the latency of HLS using chunked-transfer
>>>>> delivery, very similar to LL-DASH (and not LL-HLS). We're still using RTMP
>>>>> for contribution.
>>>>> >>>
>>>>> > I guess Apple do also have their BYTERANGE/CTE mode for LL-HLS which
>>>>> is pretty similar to LL-DASH.
>>>>>
>>>>> Yes, Apple can list the parts (chunks in LL-DASH) as byteranges in the
>>>>> playlist but the frequent playlist refresh and part retrieval process is
>>>>> inevitable in LL-HLS, which is one of the main differences from LL-DASH (no
>>>>> need for manifest refresh and request per segment not chunk).
>>>>>
>>>>> >
>>>>> >>>
>>>>> >>> To summarize the issues with our current distribution system:
>>>>> >>>
>>>>> >>> 1. HLS suffers from head-of-line blocking.
>>>>> >>> During congestion, the current segment stalls and is delivered
>>>>> slower than the encoded bitrate. The player has no recourse than to wait
>>>>> for the segment to finish downloading, risking depleting the buffer. It can
>>>>> switch down to a lower rendition at segment boundaries, but these
>>>>> boundaries occur too infrequently (every 2s) to handle sudden congestion.
>>>>> Trying to switch earlier, either by canceling the current segment or
>>>>> downloading the lower rendition in parallel, only exacerbates the issue.
>>>>> >>
>>>>> > Isn't the HoL limitation more down to the use of HTTP/1.1?
>>>>> >
>>>>> >> DASH has the concept of Resync points that were designed exactly
>>>>> for this purpose (allowing you to emergency downshift in the middle of a
>>>>> segment).
>>>>> >>
>>>>> > I was curious if there are any studies or experience of how resync
>>>>> points perform in practice?
>>>>>
>>>>> Resync points are pretty fresh out of the oven. dash.js has it in the
>>>>> roadmap but not yet implemented (and we also need to generate test
>>>>> streams). So, there is no data available yet with the real clients. But, I
>>>>> suppose you can imagine how in-segment switching can help in sudden bw
>>>>> drops especially for long segments.
>>>>>
>>>>> >
>>>>> >>> 2. HLS has poor "auto" quality (ABR).
>>>>> >>> The player is responsible for choosing the rendition to download.
>>>>> This is a problem when media is delivered frame-by-frame (ie. HTTP
>>>>> chunked-transfer), as we're effectively application-limited by the encoder
>>>>> bitrate. The player can only measure the arrival timestamp of data and does
>>>>> not know when the network can sustain a higher bitrate without just trying
>>>>> it. We hosted an ACM challenge for this issue in particular.
>>>>> >>
>>>>> > The limitation here may also be down to the lack of access to
>>>>> sufficiently accurate timing information about data arrivals in the browser
>>>>> - unfortunately the Streams API, which provides data from the fetch API,
>>>>> doesn’t directly timestamp the data arrivals so the JS app has to timestamp
>>>>> it which can suffer from noise such as scheduling etc - especially a
>>>>> problem for small/fast data arrivals.
>>>>>
>>>>> Yes, you need to get rid of that noise (see LoL+).
>>>>>
>>>>> > I guess another issue could be that if the system is only sending
>>>>> single frames then the network transport may be operating in application
>>>>> limited mode so the cwnd doesn’t grow sufficiently to take advantage of the
>>>>> available capacity.
>>>>>
>>>>> Unless the video bitrate is too low, this should not be an issue most
>>>>> of the time.
>>>>>
>>>>> >
>>>>> >> That exact challenge had three competing solutions, two of which
>>>>> are now part of the official dash.js code. And yes, the player can figure
>>>>> what the network can sustain *without* trying higher bitrate renditions.
>>>>> >>
>>>>> https://github.com/Dash-Industry-Forum/dash.js/wiki/Low-Latency-streaming
>>>>> >> Or read the paper that even had “twitch” in its title here:
>>>>> https://ieeexplore.ieee.org/document/9429986
>>>>> >>
>>>>> > There was a recent study that seems to show that none of the current
>>>>> algorithms are that great for low latency, and the two new dash.js ones
>>>>> appear to lead to much higher levels of rebuffering:
>>>>> > https://dl.acm.org/doi/pdf/10.1145/3458305.3478442
>>>>>
>>>>> Brightcove’s paper uses the LoL and L2A algorithms from the challenge
>>>>> where low latency was the primary goal. For Twitch’s own evaluation, I
>>>>> suggest you watch:
>>>>> https://www.youtube.com/watch?v=rcXFVDotpy4
>>>>> We later addressed the rebuffering issue, developed LoL+, which is the
>>>>> version included in dash.js now and explained at the ieeexplore link I gave
>>>>> above.
>>>>>
>>>>> Copying the authors in case they want to add anything for the paper
>>>>> you cited.
>>>>>
>>>>> -acbegen
>>>>>
>>>>>
>>>>> >
>>>>> > Piers
>>>>> >
>>>>> >>> I believe this is why LL-HLS opts to burst small chunks of data
>>>>> (sub-segments) at the cost of higher latency.
>>>>> >>>
>>>>> >>>
>>>>> >>> Both of these necessitate a larger player buffer, which increases
>>>>> latency. The contribution system it's own problems, but let me sync up with
>>>>> that team first before I try to enumerate them.
>>>>> >>> --
>>>>> >>> Moq mailing list
>>>>> >>> Moq@ietf.org
>>>>> >>> https://www.ietf.org/mailman/listinfo/moq
>>>>> >>
>>>>> >> --
>>>>> >> Moq mailing list
>>>>> >> Moq@ietf.org
>>>>> >> https://www.ietf.org/mailman/listinfo/moq
>>>>>
>>>>> --
>>>> Moq mailing list
>>>> Moq@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/moq
>>>>
>>>
>> --
>> Moq mailing list
>> Moq@ietf.org
>> https://www.ietf.org/mailman/listinfo/moq
>>
> --
> Moq mailing list
> Moq@ietf.org
> https://www.ietf.org/mailman/listinfo/moq
>