Re: [Moq] Latency @ Twitch

Luke Curley <kixelated@gmail.com> Tue, 09 November 2021 22:58 UTC

Return-Path: <kixelated@gmail.com>
X-Original-To: moq@ietfa.amsl.com
Delivered-To: moq@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9B59D3A0809 for <moq@ietfa.amsl.com>; Tue, 9 Nov 2021 14:58:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gjuAJXuSPdEg for <moq@ietfa.amsl.com>; Tue, 9 Nov 2021 14:58:48 -0800 (PST)
Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 47BC73A0805 for <moq@ietf.org>; Tue, 9 Nov 2021 14:58:48 -0800 (PST)
Received: by mail-wm1-x32d.google.com with SMTP id 71so520097wma.4 for <moq@ietf.org>; Tue, 09 Nov 2021 14:58:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cib7alQuRF6wD87+aHT2NYQLSoZEDkbZHLV1rG0J4ys=; b=RtW5UVQaDDzdYK7Voq0f14qxsN1RyNqos78QQtOz6hlFR86zV20BnV4Bc13k5Tn1SA kZ3X1pohm6MtGfqXgFqwZ1+yhVGjYdBKTCZdv3wUbshPnhic1hfJetgazfO40F5hN/IY SlgjJQveTVs2Wi4RVpxwnx6Td8wPsIvXyV4G2CZB1xYuhj0roZ8v/X160VYXQPN+4yeR LmPzkl0qb5t/wQQzAMdNPlkyZ4DMa74cYiHkVF0xMtX4DkE4r5O8V7q4Qrq9omxVVFCz FfInos+jWan2eyhBZ4VR5VgfdiGADd+9oFCm+8leBQcXs9uxVyitxxy9w/jgcq43zse2 0pSQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cib7alQuRF6wD87+aHT2NYQLSoZEDkbZHLV1rG0J4ys=; b=QrJiskG9aTW0WZL3i2saDMXD7scJ51au3POMFPRF3j5agCW1+aubqz6XLIuGMRHPJ4 X1u1FAtsh4qd4bJ/yGFzOOzlGgibIoFAemXccm9zIwi6csWjVi7W6lcJnAWPi9nMtBrF ZmpvPDjkV2+MeJwGHZ9P5zuaoGrw2eAa2t97YCUxJi7iojBJofJ/mYxWlIdtfojdVE3m PE631htEFk6TclZ14DX5Q91zWYukC1SJG05SoBiXqXtNYWq/IlN4C8GE9AFvEALlRVrX HPFXhpaK2t0BpQA36fqhy/bQKsRsUVuCuEjdo0y83qcTsUuVAUCgKOV3weeT3HY/Fcwi P0zA==
X-Gm-Message-State: AOAM530H1Wd3w+m/e7ld7YWvcCUGA5CwA0gJ7nXanc8OwtntST6dUxUL t5S3/Zo0uxAxPxLPZtqmnIJWEkydqfoIeU6U57o=
X-Google-Smtp-Source: ABdhPJyleG9Gxj4K9GxlqTuiwJXawWJH1sZZEaZKrvkHhSkKl6LRAxtnw3uKysmVgIG/v9bnmlCnKTpq97wyE9Ujr+I=
X-Received: by 2002:a1c:e906:: with SMTP id q6mr11448451wmc.126.1636498721135; Tue, 09 Nov 2021 14:58:41 -0800 (PST)
MIME-Version: 1.0
References: <CAHVo=ZnXNnT2uod6oxHXTRoyA58cpn35BrV6eOXXnGUOFbcvSQ@mail.gmail.com> <0ADDD7B3-B49E-40E1-99E9-278EF0EA9B85@networked.media> <AF32886D-0524-45D4-9577-FCEFD601A0A1@bbc.co.uk> <73C6FFEB-CE81-4DE7-B110-55892D746927@networked.media> <CAHVo=Znu7F18fj4Anxz3j1byM+9aQmJ6N4DdFjUZk9fGjG8iXg@mail.gmail.com> <CAKcm_gM=bcALtqoLd8mYLdCiTK=ZfEF0RkXBkw17bPR6MjoMhA@mail.gmail.com> <CAHVo=ZngW+Z4-wGqAb4fRYQiSz6O4tOq1+nuto3PJaYLj1iWFg@mail.gmail.com> <6904CE31-940F-4D10-B312-4AEB67E9F9CB@bbc.co.uk> <CAOLzse37YZdnOLkt70F8yvmSXnaQ+KktX00keje3Vh2xkuFzjg@mail.gmail.com> <CAOW+2dtXVTzYK-ZkY_jSD4y8wa4_LxOO1fEeumwbmTzc1RAzDQ@mail.gmail.com> <9D095CBB-7BA8-4773-8981-8131C956F1C4@cisco.com>
In-Reply-To: <9D095CBB-7BA8-4773-8981-8131C956F1C4@cisco.com>
From: Luke Curley <kixelated@gmail.com>
Date: Tue, 09 Nov 2021 14:58:30 -0800
Message-ID: <CAHVo=Zk3NmN8QwdiSweb6dC+ziVfKLUKn1f=JhcTU3sv1rh2VQ@mail.gmail.com>
To: "Mo Zanaty (mzanaty)" <mzanaty=40cisco.com@dmarc.ietf.org>
Cc: Bernard Aboba <bernard.aboba@gmail.com>, Justin Uberti <juberti@alphaexplorationco.com>, Ian Swett <ianswett@google.com>, "Ali C. Begen" <ali.begen@networked.media>, MOQ Mailing List <moq@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000f2e0ea05d0630ce9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/moq/FGNVhgUkL9m4DhEP_bfn-cbBU08>
Subject: Re: [Moq] Latency @ Twitch
X-BeenThere: moq@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Media over QUIC <moq.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/moq>, <mailto:moq-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/moq/>
List-Post: <mailto:moq@ietf.org>
List-Help: <mailto:moq-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/moq>, <mailto:moq-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Nov 2021 22:58:54 -0000

Maybe a dumb thought, but is the PROBE_RTT phase required when sufficiently
application limited, as is primarily the case for live video? If I
understand correctly, it's meant to drain the queue to remeasure the
minimum RTT, but that doesn't seem necessary when the queue is constantly
being drained due to a lack of data to send.

Either way, the issue is that existing TCP algorithms don't care about the
live video use-case, and those are the ones that have been ported to QUIC
thus far. But like Justin mentioned, this doesn't actually matter for the
sake of standardizing a video over QUIC protocol provided the building
blocks are in place.

The real question is: do QUIC ACKs contain enough signal to implement an
adequate live video congestion control algorithm? If not, how can we
increase that signal, potentially taking cues from RMCAT (ex. RTT on a
per-packet basis)?

On Tue, Nov 9, 2021, 10:27 AM Mo Zanaty (mzanaty) <mzanaty=
40cisco.com@dmarc.ietf.org> wrote:

> All current QUIC CCs (BBRv1/2, CUBIC, NewReno, etc.) are not well suited
> for real-time media, even for a rough “envelope” or “circuit-breaker”.
> RMCAT CCs are explicitly designed for real-time media, but, of course, rely
> on RTCP feedback, so must be adapted to QUIC feedback.
>
>
>
> Mo
>
>
>
>
>
> On 11/9/21, 1:13 PM, "Bernard Aboba" <bernard.aboba@gmail.com> wrote:
>
>
>
> Justin said:
>
>
>
> "As others have noted, BBR does not work great out of the box for realtime
> scenarios."
>
>
>
> [BA] At the ICCRG meeting on Monday, there was an update on BBR2:
>
>
> https://datatracker.ietf.org/meeting/112/materials/slides-112-iccrg-bbrv2-update-00.pdf
>
>
>
> While there are some improvements, issues such as "PROBE_RTT" and rapid
> rampup after loss remain, and overall, it doesn't seem like BBR2 is going
> to help much with realtime scenarios.  Is that fair?
>
>
>
> On Tue, Nov 9, 2021 at 12:46 PM Justin Uberti <
> juberti@alphaexplorationco.com> wrote:
>
> Ultimately we found that it wasn't necessary to standardize the CC as long
> as the behavior needed from the remote side (e.g., feedback messaging)
> could be standardized.
>
>
>
> As others have noted, BBR does not work great out of the box for realtime
> scenarios. The last time this was discussed, the prevailing idea was to
> allow the QUIC CC to be used as a sort of circuit-breaker, but within that
> envelope the application could use whatever realtime algorithm it preferred
> (e.g, goog-cc).
>
>
>
> On Thu, Nov 4, 2021 at 3:58 AM Piers O'Hanlon <piers.ohanlon@bbc.co.uk>
> wrote:
>
>
>
> On 3 Nov 2021, at 21:46, Luke Curley <kixelated@gmail.com> wrote:
>
>
>
> Yeah, there's definitely some funky behavior in BBR when application
> limited but it's nowhere near as bad as Cubic/Reno. With those
> algorithms you need to burst enough packets to fully utilize the congestion
> window before it can be grown. With BBR I believe you need to burst just
> enough to fully utilize the pacer, and even then this condition
> <https://source.chromium.org/chromium/chromium/src/+/master:net/third_party/quiche/src/quic/core/congestion_control/bbr_sender.cc;l=393> lets
> you use application-limited samples if they would increase the send rate.
>
>
>
> And there’s also the idle cwnd collapse/reset behaviour to consider if
> you’re sending a number of frames together and their inter-data gap exceeds
> the RTO - I’m not quite sure how the various QUIC stacks have translated
> RFC2861/7661 advice on this…?
>
>
>
> I started with BBR first because it's simpler, but I'm going to try out
> BBR2 at some point because of the aforementioned PROBE_RTT issue. I don't
> follow the congestion control space closely enough; are there any notable
> algorithms that would better fit the live video use-case?
>
>
>
> I guess Google’s Goog_CC appears to be well used in the WebRTC space (e.g.
> WEBRTC
> <https://webrtc.googlesource.com/src/+/refs/heads/main/modules/congestion_controller/goog_cc>
>  and aiortc
> <https://github.com/aiortc/aiortc/blob/1a192386b721861f27b0476dae23686f8f9bb2bc/src/aiortc/rate.py#L271>)
> despite the draft
> <https://datatracker.ietf.org/doc/html/draft-ietf-rmcat-gcc> never making
> it to RFC status… There's also SCREAM
> <https://datatracker.ietf.org/doc/rfc8298/> which has an open
> source implementation <https://github.com/EricssonResearch/scream> but
> not sure how widely deployed it is.
>
>
>
>
>
> On Wed, Nov 3, 2021 at 2:12 PM Ian Swett <ianswett@google.com> wrote:
>
> From personal experience, BBR has some issues with application limited
> behavior, but it is still able to grow the congestion window, at least
> slightly, so it's likely an improvement over Cubic or Reno.
>
>
>
> On Wed, Nov 3, 2021 at 4:40 PM Luke Curley <kixelated@gmail.com> wrote:
>
> I think resync points are an interesting idea although we haven't
> evaluated them. Twitch did push for S-frames in AV1 which will be another
> option in the future instead of encoding a full IDR frame at these resync
> boundaries.
>
>
>
> An issue is you have to make the hard decision to abort the current
> download and frantically try to pick up the pieces before the buffer
> depletes. It's a one-way door (maybe your algorithm overreacted) and you're
> going to be throwing out some media just to redownload it at a lower
> bitrate.
>
>
>
> Ideally, you could download segments in parallel without causing
> contention. The idea is to spend any available bandwidth on the new segment
> to fix the problem, and any excess bandwidth on the old segment in
> the event it arrives before the player buffer actually depletes. That's
> more or less the core concept for what we've built using QUIC, and it's
> compatible with resync points if we later go down that route.
>
>
>
>
>
> And you're exactly right Piers. The fundamental issue is that a web player
> lacks the low level timing information required to infer the delivery rate.
> You would want something like BBR's rate estimation
> <https://datatracker.ietf.org/doc/html/draft-cheng-iccrg-delivery-rate-estimation> which
> inspects the time delta between packets to determine the send rate. That
> gets really difficult when the OS and browser delay flushing data to the
> application, be it for performance reasons or due to packet loss (to
> maintain head-of-line blocking).
>
>
>
> I did run into CUBIC/Reno not being able to grow the congestion window
> when frames are sent one at a time (application limited). I don't believe
> BBR suffers from the same problem though due to the aforementioned rate
> estimator.
>
>
>
> On Wed, Nov 3, 2021 at 10:05 AM Ali C. Begen <ali.begen@networked.media>
> wrote:
>
>
>
> > On Nov 3, 2021, at 6:50 PM, Piers O'Hanlon <piers.ohanlon@bbc.co.uk>
> wrote:
> >
> >
> >
> >> On 2 Nov 2021, at 20:39, Ali C. Begen <ali.begen=
> 40networked.media@dmarc.ietf.org> wrote:
> >>
> >>
> >>
> >>> On Nov 2, 2021, at 3:39 AM, Luke Curley <kixelated@gmail.com> wrote:
> >>>
> >>> Hey folks, I wanted to quickly summarize the problems we've run into
> at Twitch that have led us to QUIC.
> >>>
> >>>
> >>> Twitch is a live one-to-many product. We primarily focus on video
> quality due to the graphical fidelity of video games. Viewers can
> participate in a chat room, which the broadcaster reads and can respond to
> via video. This means that latency is also somewhat important to facilitate
> this social interaction.
> >>>
> >>> A looong time ago we were using RTMP for both ingest and distribution
> (Flash player). We switched to HLS for distribution to gain the benefit of
> 3rd party CDNs, at the cost of dramatically increasing latency. A later
> project lowered the latency of HLS using chunked-transfer delivery, very
> similar to LL-DASH (and not LL-HLS). We're still using RTMP for
> contribution.
> >>>
> > I guess Apple do also have their BYTERANGE/CTE mode for LL-HLS which is
> pretty similar to LL-DASH.
>
> Yes, Apple can list the parts (chunks in LL-DASH) as byteranges in the
> playlist but the frequent playlist refresh and part retrieval process is
> inevitable in LL-HLS, which is one of the main differences from LL-DASH (no
> need for manifest refresh and request per segment not chunk).
>
> >
> >>>
> >>> To summarize the issues with our current distribution system:
> >>>
> >>> 1. HLS suffers from head-of-line blocking.
> >>> During congestion, the current segment stalls and is delivered slower
> than the encoded bitrate. The player has no recourse than to wait for the
> segment to finish downloading, risking depleting the buffer. It can switch
> down to a lower rendition at segment boundaries, but these boundaries occur
> too infrequently (every 2s) to handle sudden congestion. Trying to switch
> earlier, either by canceling the current segment or downloading the lower
> rendition in parallel, only exacerbates the issue.
> >>
> > Isn't the HoL limitation more down to the use of HTTP/1.1?
> >
> >> DASH has the concept of Resync points that were designed exactly for
> this purpose (allowing you to emergency downshift in the middle of a
> segment).
> >>
> > I was curious if there are any studies or experience of how resync
> points perform in practice?
>
> Resync points are pretty fresh out of the oven. dash.js has it in the
> roadmap but not yet implemented (and we also need to generate test
> streams). So, there is no data available yet with the real clients. But, I
> suppose you can imagine how in-segment switching can help in sudden bw
> drops especially for long segments.
>
> >
> >>> 2. HLS has poor "auto" quality (ABR).
> >>> The player is responsible for choosing the rendition to download. This
> is a problem when media is delivered frame-by-frame (ie. HTTP
> chunked-transfer), as we're effectively application-limited by the encoder
> bitrate. The player can only measure the arrival timestamp of data and does
> not know when the network can sustain a higher bitrate without just trying
> it. We hosted an ACM challenge for this issue in particular.
> >>
> > The limitation here may also be down to the lack of access to
> sufficiently accurate timing information about data arrivals in the browser
> - unfortunately the Streams API, which provides data from the fetch API,
> doesn’t directly timestamp the data arrivals so the JS app has to timestamp
> it which can suffer from noise such as scheduling etc - especially a
> problem for small/fast data arrivals.
>
> Yes, you need to get rid of that noise (see LoL+).
>
> > I guess another issue could be that if the system is only sending single
> frames then the network transport may be operating in application limited
> mode so the cwnd doesn’t grow sufficiently to take advantage of the
> available capacity.
>
> Unless the video bitrate is too low, this should not be an issue most of
> the time.
>
> >
> >> That exact challenge had three competing solutions, two of which are
> now part of the official dash.js code. And yes, the player can figure what
> the network can sustain *without* trying higher bitrate renditions.
> >>
> https://github.com/Dash-Industry-Forum/dash.js/wiki/Low-Latency-streaming
> >> Or read the paper that even had “twitch” in its title here:
> https://ieeexplore.ieee.org/document/9429986
> >>
> > There was a recent study that seems to show that none of the current
> algorithms are that great for low latency, and the two new dash.js ones
> appear to lead to much higher levels of rebuffering:
> > https://dl.acm.org/doi/pdf/10.1145/3458305.3478442
>
> Brightcove’s paper uses the LoL and L2A algorithms from the challenge
> where low latency was the primary goal. For Twitch’s own evaluation, I
> suggest you watch:
> https://www.youtube.com/watch?v=rcXFVDotpy4
> We later addressed the rebuffering issue, developed LoL+, which is the
> version included in dash.js now and explained at the ieeexplore link I gave
> above.
>
> Copying the authors in case they want to add anything for the paper you
> cited.
>
> -acbegen
>
>
> >
> > Piers
> >
> >>> I believe this is why LL-HLS opts to burst small chunks of data
> (sub-segments) at the cost of higher latency.
> >>>
> >>>
> >>> Both of these necessitate a larger player buffer, which increases
> latency. The contribution system it's own problems, but let me sync up with
> that team first before I try to enumerate them.
> >>> --
> >>> Moq mailing list
> >>> Moq@ietf.org
> >>> https://www.ietf.org/mailman/listinfo/moq
> >>
> >> --
> >> Moq mailing list
> >> Moq@ietf.org
> >> https://www.ietf.org/mailman/listinfo/moq
>
> --
> Moq mailing list
> Moq@ietf.org
> https://www.ietf.org/mailman/listinfo/moq
>
>
>
> --
> Moq mailing list
> Moq@ietf.org
> https://www.ietf.org/mailman/listinfo/moq
>
> --
> Moq mailing list
> Moq@ietf.org
> https://www.ietf.org/mailman/listinfo/moq
>
> --
> Moq mailing list
> Moq@ietf.org
> https://www.ietf.org/mailman/listinfo/moq
>