Re: [Moq] Latency @ Twitch
Luke Curley <kixelated@gmail.com> Tue, 09 November 2021 22:58 UTC
Return-Path: <kixelated@gmail.com>
X-Original-To: moq@ietfa.amsl.com
Delivered-To: moq@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9B59D3A0809 for <moq@ietfa.amsl.com>; Tue, 9 Nov 2021 14:58:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gjuAJXuSPdEg for <moq@ietfa.amsl.com>; Tue, 9 Nov 2021 14:58:48 -0800 (PST)
Received: from mail-wm1-x32d.google.com (mail-wm1-x32d.google.com [IPv6:2a00:1450:4864:20::32d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 47BC73A0805 for <moq@ietf.org>; Tue, 9 Nov 2021 14:58:48 -0800 (PST)
Received: by mail-wm1-x32d.google.com with SMTP id 71so520097wma.4 for <moq@ietf.org>; Tue, 09 Nov 2021 14:58:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cib7alQuRF6wD87+aHT2NYQLSoZEDkbZHLV1rG0J4ys=; b=RtW5UVQaDDzdYK7Voq0f14qxsN1RyNqos78QQtOz6hlFR86zV20BnV4Bc13k5Tn1SA kZ3X1pohm6MtGfqXgFqwZ1+yhVGjYdBKTCZdv3wUbshPnhic1hfJetgazfO40F5hN/IY SlgjJQveTVs2Wi4RVpxwnx6Td8wPsIvXyV4G2CZB1xYuhj0roZ8v/X160VYXQPN+4yeR LmPzkl0qb5t/wQQzAMdNPlkyZ4DMa74cYiHkVF0xMtX4DkE4r5O8V7q4Qrq9omxVVFCz FfInos+jWan2eyhBZ4VR5VgfdiGADd+9oFCm+8leBQcXs9uxVyitxxy9w/jgcq43zse2 0pSQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cib7alQuRF6wD87+aHT2NYQLSoZEDkbZHLV1rG0J4ys=; b=QrJiskG9aTW0WZL3i2saDMXD7scJ51au3POMFPRF3j5agCW1+aubqz6XLIuGMRHPJ4 X1u1FAtsh4qd4bJ/yGFzOOzlGgibIoFAemXccm9zIwi6csWjVi7W6lcJnAWPi9nMtBrF ZmpvPDjkV2+MeJwGHZ9P5zuaoGrw2eAa2t97YCUxJi7iojBJofJ/mYxWlIdtfojdVE3m PE631htEFk6TclZ14DX5Q91zWYukC1SJG05SoBiXqXtNYWq/IlN4C8GE9AFvEALlRVrX HPFXhpaK2t0BpQA36fqhy/bQKsRsUVuCuEjdo0y83qcTsUuVAUCgKOV3weeT3HY/Fcwi P0zA==
X-Gm-Message-State: AOAM530H1Wd3w+m/e7ld7YWvcCUGA5CwA0gJ7nXanc8OwtntST6dUxUL t5S3/Zo0uxAxPxLPZtqmnIJWEkydqfoIeU6U57o=
X-Google-Smtp-Source: ABdhPJyleG9Gxj4K9GxlqTuiwJXawWJH1sZZEaZKrvkHhSkKl6LRAxtnw3uKysmVgIG/v9bnmlCnKTpq97wyE9Ujr+I=
X-Received: by 2002:a1c:e906:: with SMTP id q6mr11448451wmc.126.1636498721135; Tue, 09 Nov 2021 14:58:41 -0800 (PST)
MIME-Version: 1.0
References: <CAHVo=ZnXNnT2uod6oxHXTRoyA58cpn35BrV6eOXXnGUOFbcvSQ@mail.gmail.com> <0ADDD7B3-B49E-40E1-99E9-278EF0EA9B85@networked.media> <AF32886D-0524-45D4-9577-FCEFD601A0A1@bbc.co.uk> <73C6FFEB-CE81-4DE7-B110-55892D746927@networked.media> <CAHVo=Znu7F18fj4Anxz3j1byM+9aQmJ6N4DdFjUZk9fGjG8iXg@mail.gmail.com> <CAKcm_gM=bcALtqoLd8mYLdCiTK=ZfEF0RkXBkw17bPR6MjoMhA@mail.gmail.com> <CAHVo=ZngW+Z4-wGqAb4fRYQiSz6O4tOq1+nuto3PJaYLj1iWFg@mail.gmail.com> <6904CE31-940F-4D10-B312-4AEB67E9F9CB@bbc.co.uk> <CAOLzse37YZdnOLkt70F8yvmSXnaQ+KktX00keje3Vh2xkuFzjg@mail.gmail.com> <CAOW+2dtXVTzYK-ZkY_jSD4y8wa4_LxOO1fEeumwbmTzc1RAzDQ@mail.gmail.com> <9D095CBB-7BA8-4773-8981-8131C956F1C4@cisco.com>
In-Reply-To: <9D095CBB-7BA8-4773-8981-8131C956F1C4@cisco.com>
From: Luke Curley <kixelated@gmail.com>
Date: Tue, 09 Nov 2021 14:58:30 -0800
Message-ID: <CAHVo=Zk3NmN8QwdiSweb6dC+ziVfKLUKn1f=JhcTU3sv1rh2VQ@mail.gmail.com>
To: "Mo Zanaty (mzanaty)" <mzanaty=40cisco.com@dmarc.ietf.org>
Cc: Bernard Aboba <bernard.aboba@gmail.com>, Justin Uberti <juberti@alphaexplorationco.com>, Ian Swett <ianswett@google.com>, "Ali C. Begen" <ali.begen@networked.media>, MOQ Mailing List <moq@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000f2e0ea05d0630ce9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/moq/FGNVhgUkL9m4DhEP_bfn-cbBU08>
Subject: Re: [Moq] Latency @ Twitch
X-BeenThere: moq@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Media over QUIC <moq.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/moq>, <mailto:moq-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/moq/>
List-Post: <mailto:moq@ietf.org>
List-Help: <mailto:moq-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/moq>, <mailto:moq-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Nov 2021 22:58:54 -0000
Maybe a dumb thought, but is the PROBE_RTT phase required when sufficiently application limited, as is primarily the case for live video? If I understand correctly, it's meant to drain the queue to remeasure the minimum RTT, but that doesn't seem necessary when the queue is constantly being drained due to a lack of data to send. Either way, the issue is that existing TCP algorithms don't care about the live video use-case, and those are the ones that have been ported to QUIC thus far. But like Justin mentioned, this doesn't actually matter for the sake of standardizing a video over QUIC protocol provided the building blocks are in place. The real question is: do QUIC ACKs contain enough signal to implement an adequate live video congestion control algorithm? If not, how can we increase that signal, potentially taking cues from RMCAT (ex. RTT on a per-packet basis)? On Tue, Nov 9, 2021, 10:27 AM Mo Zanaty (mzanaty) <mzanaty= 40cisco.com@dmarc.ietf.org> wrote: > All current QUIC CCs (BBRv1/2, CUBIC, NewReno, etc.) are not well suited > for real-time media, even for a rough “envelope” or “circuit-breaker”. > RMCAT CCs are explicitly designed for real-time media, but, of course, rely > on RTCP feedback, so must be adapted to QUIC feedback. > > > > Mo > > > > > > On 11/9/21, 1:13 PM, "Bernard Aboba" <bernard.aboba@gmail.com> wrote: > > > > Justin said: > > > > "As others have noted, BBR does not work great out of the box for realtime > scenarios." > > > > [BA] At the ICCRG meeting on Monday, there was an update on BBR2: > > > https://datatracker.ietf.org/meeting/112/materials/slides-112-iccrg-bbrv2-update-00.pdf > > > > While there are some improvements, issues such as "PROBE_RTT" and rapid > rampup after loss remain, and overall, it doesn't seem like BBR2 is going > to help much with realtime scenarios. Is that fair? > > > > On Tue, Nov 9, 2021 at 12:46 PM Justin Uberti < > juberti@alphaexplorationco.com> wrote: > > Ultimately we found that it wasn't necessary to standardize the CC as long > as the behavior needed from the remote side (e.g., feedback messaging) > could be standardized. > > > > As others have noted, BBR does not work great out of the box for realtime > scenarios. The last time this was discussed, the prevailing idea was to > allow the QUIC CC to be used as a sort of circuit-breaker, but within that > envelope the application could use whatever realtime algorithm it preferred > (e.g, goog-cc). > > > > On Thu, Nov 4, 2021 at 3:58 AM Piers O'Hanlon <piers.ohanlon@bbc.co.uk> > wrote: > > > > On 3 Nov 2021, at 21:46, Luke Curley <kixelated@gmail.com> wrote: > > > > Yeah, there's definitely some funky behavior in BBR when application > limited but it's nowhere near as bad as Cubic/Reno. With those > algorithms you need to burst enough packets to fully utilize the congestion > window before it can be grown. With BBR I believe you need to burst just > enough to fully utilize the pacer, and even then this condition > <https://source.chromium.org/chromium/chromium/src/+/master:net/third_party/quiche/src/quic/core/congestion_control/bbr_sender.cc;l=393> lets > you use application-limited samples if they would increase the send rate. > > > > And there’s also the idle cwnd collapse/reset behaviour to consider if > you’re sending a number of frames together and their inter-data gap exceeds > the RTO - I’m not quite sure how the various QUIC stacks have translated > RFC2861/7661 advice on this…? > > > > I started with BBR first because it's simpler, but I'm going to try out > BBR2 at some point because of the aforementioned PROBE_RTT issue. I don't > follow the congestion control space closely enough; are there any notable > algorithms that would better fit the live video use-case? > > > > I guess Google’s Goog_CC appears to be well used in the WebRTC space (e.g. > WEBRTC > <https://webrtc.googlesource.com/src/+/refs/heads/main/modules/congestion_controller/goog_cc> > and aiortc > <https://github.com/aiortc/aiortc/blob/1a192386b721861f27b0476dae23686f8f9bb2bc/src/aiortc/rate.py#L271>) > despite the draft > <https://datatracker.ietf.org/doc/html/draft-ietf-rmcat-gcc> never making > it to RFC status… There's also SCREAM > <https://datatracker.ietf.org/doc/rfc8298/> which has an open > source implementation <https://github.com/EricssonResearch/scream> but > not sure how widely deployed it is. > > > > > > On Wed, Nov 3, 2021 at 2:12 PM Ian Swett <ianswett@google.com> wrote: > > From personal experience, BBR has some issues with application limited > behavior, but it is still able to grow the congestion window, at least > slightly, so it's likely an improvement over Cubic or Reno. > > > > On Wed, Nov 3, 2021 at 4:40 PM Luke Curley <kixelated@gmail.com> wrote: > > I think resync points are an interesting idea although we haven't > evaluated them. Twitch did push for S-frames in AV1 which will be another > option in the future instead of encoding a full IDR frame at these resync > boundaries. > > > > An issue is you have to make the hard decision to abort the current > download and frantically try to pick up the pieces before the buffer > depletes. It's a one-way door (maybe your algorithm overreacted) and you're > going to be throwing out some media just to redownload it at a lower > bitrate. > > > > Ideally, you could download segments in parallel without causing > contention. The idea is to spend any available bandwidth on the new segment > to fix the problem, and any excess bandwidth on the old segment in > the event it arrives before the player buffer actually depletes. That's > more or less the core concept for what we've built using QUIC, and it's > compatible with resync points if we later go down that route. > > > > > > And you're exactly right Piers. The fundamental issue is that a web player > lacks the low level timing information required to infer the delivery rate. > You would want something like BBR's rate estimation > <https://datatracker.ietf.org/doc/html/draft-cheng-iccrg-delivery-rate-estimation> which > inspects the time delta between packets to determine the send rate. That > gets really difficult when the OS and browser delay flushing data to the > application, be it for performance reasons or due to packet loss (to > maintain head-of-line blocking). > > > > I did run into CUBIC/Reno not being able to grow the congestion window > when frames are sent one at a time (application limited). I don't believe > BBR suffers from the same problem though due to the aforementioned rate > estimator. > > > > On Wed, Nov 3, 2021 at 10:05 AM Ali C. Begen <ali.begen@networked.media> > wrote: > > > > > On Nov 3, 2021, at 6:50 PM, Piers O'Hanlon <piers.ohanlon@bbc.co.uk> > wrote: > > > > > > > >> On 2 Nov 2021, at 20:39, Ali C. Begen <ali.begen= > 40networked.media@dmarc.ietf.org> wrote: > >> > >> > >> > >>> On Nov 2, 2021, at 3:39 AM, Luke Curley <kixelated@gmail.com> wrote: > >>> > >>> Hey folks, I wanted to quickly summarize the problems we've run into > at Twitch that have led us to QUIC. > >>> > >>> > >>> Twitch is a live one-to-many product. We primarily focus on video > quality due to the graphical fidelity of video games. Viewers can > participate in a chat room, which the broadcaster reads and can respond to > via video. This means that latency is also somewhat important to facilitate > this social interaction. > >>> > >>> A looong time ago we were using RTMP for both ingest and distribution > (Flash player). We switched to HLS for distribution to gain the benefit of > 3rd party CDNs, at the cost of dramatically increasing latency. A later > project lowered the latency of HLS using chunked-transfer delivery, very > similar to LL-DASH (and not LL-HLS). We're still using RTMP for > contribution. > >>> > > I guess Apple do also have their BYTERANGE/CTE mode for LL-HLS which is > pretty similar to LL-DASH. > > Yes, Apple can list the parts (chunks in LL-DASH) as byteranges in the > playlist but the frequent playlist refresh and part retrieval process is > inevitable in LL-HLS, which is one of the main differences from LL-DASH (no > need for manifest refresh and request per segment not chunk). > > > > >>> > >>> To summarize the issues with our current distribution system: > >>> > >>> 1. HLS suffers from head-of-line blocking. > >>> During congestion, the current segment stalls and is delivered slower > than the encoded bitrate. The player has no recourse than to wait for the > segment to finish downloading, risking depleting the buffer. It can switch > down to a lower rendition at segment boundaries, but these boundaries occur > too infrequently (every 2s) to handle sudden congestion. Trying to switch > earlier, either by canceling the current segment or downloading the lower > rendition in parallel, only exacerbates the issue. > >> > > Isn't the HoL limitation more down to the use of HTTP/1.1? > > > >> DASH has the concept of Resync points that were designed exactly for > this purpose (allowing you to emergency downshift in the middle of a > segment). > >> > > I was curious if there are any studies or experience of how resync > points perform in practice? > > Resync points are pretty fresh out of the oven. dash.js has it in the > roadmap but not yet implemented (and we also need to generate test > streams). So, there is no data available yet with the real clients. But, I > suppose you can imagine how in-segment switching can help in sudden bw > drops especially for long segments. > > > > >>> 2. HLS has poor "auto" quality (ABR). > >>> The player is responsible for choosing the rendition to download. This > is a problem when media is delivered frame-by-frame (ie. HTTP > chunked-transfer), as we're effectively application-limited by the encoder > bitrate. The player can only measure the arrival timestamp of data and does > not know when the network can sustain a higher bitrate without just trying > it. We hosted an ACM challenge for this issue in particular. > >> > > The limitation here may also be down to the lack of access to > sufficiently accurate timing information about data arrivals in the browser > - unfortunately the Streams API, which provides data from the fetch API, > doesn’t directly timestamp the data arrivals so the JS app has to timestamp > it which can suffer from noise such as scheduling etc - especially a > problem for small/fast data arrivals. > > Yes, you need to get rid of that noise (see LoL+). > > > I guess another issue could be that if the system is only sending single > frames then the network transport may be operating in application limited > mode so the cwnd doesn’t grow sufficiently to take advantage of the > available capacity. > > Unless the video bitrate is too low, this should not be an issue most of > the time. > > > > >> That exact challenge had three competing solutions, two of which are > now part of the official dash.js code. And yes, the player can figure what > the network can sustain *without* trying higher bitrate renditions. > >> > https://github.com/Dash-Industry-Forum/dash.js/wiki/Low-Latency-streaming > >> Or read the paper that even had “twitch” in its title here: > https://ieeexplore.ieee.org/document/9429986 > >> > > There was a recent study that seems to show that none of the current > algorithms are that great for low latency, and the two new dash.js ones > appear to lead to much higher levels of rebuffering: > > https://dl.acm.org/doi/pdf/10.1145/3458305.3478442 > > Brightcove’s paper uses the LoL and L2A algorithms from the challenge > where low latency was the primary goal. For Twitch’s own evaluation, I > suggest you watch: > https://www.youtube.com/watch?v=rcXFVDotpy4 > We later addressed the rebuffering issue, developed LoL+, which is the > version included in dash.js now and explained at the ieeexplore link I gave > above. > > Copying the authors in case they want to add anything for the paper you > cited. > > -acbegen > > > > > > Piers > > > >>> I believe this is why LL-HLS opts to burst small chunks of data > (sub-segments) at the cost of higher latency. > >>> > >>> > >>> Both of these necessitate a larger player buffer, which increases > latency. The contribution system it's own problems, but let me sync up with > that team first before I try to enumerate them. > >>> -- > >>> Moq mailing list > >>> Moq@ietf.org > >>> https://www.ietf.org/mailman/listinfo/moq > >> > >> -- > >> Moq mailing list > >> Moq@ietf.org > >> https://www.ietf.org/mailman/listinfo/moq > > -- > Moq mailing list > Moq@ietf.org > https://www.ietf.org/mailman/listinfo/moq > > > > -- > Moq mailing list > Moq@ietf.org > https://www.ietf.org/mailman/listinfo/moq > > -- > Moq mailing list > Moq@ietf.org > https://www.ietf.org/mailman/listinfo/moq > > -- > Moq mailing list > Moq@ietf.org > https://www.ietf.org/mailman/listinfo/moq >
- [Moq] Latency @ Twitch Luke Curley
- Re: [Moq] Latency @ Twitch Ali C. Begen
- Re: [Moq] Latency @ Twitch Piers O'Hanlon
- Re: [Moq] Latency @ Twitch Ali C. Begen
- Re: [Moq] Latency @ Twitch Luke Curley
- Re: [Moq] Latency @ Twitch Ian Swett
- Re: [Moq] Latency @ Twitch Luke Curley
- Re: [Moq] Latency @ Twitch Piers O'Hanlon
- Re: [Moq] Latency @ Twitch Justin Uberti
- Re: [Moq] Latency @ Twitch Bernard Aboba
- Re: [Moq] Latency @ Twitch Christian Huitema
- Re: [Moq] Latency @ Twitch Mo Zanaty (mzanaty)
- Re: [Moq] Latency @ Twitch Luke Curley
- Re: [Moq] Latency @ Twitch Kirill Pugin
- Re: [Moq] Latency @ Twitch Lucas Pardue
- Re: [Moq] Latency @ Twitch Justin Uberti
- Re: [Moq] Latency @ Twitch Justin Uberti
- Re: [Moq] Latency @ Twitch Christian Huitema
- Re: [Moq] Latency @ Twitch Joerg Ott
- Re: [Moq] Latency @ Twitch Piers O'Hanlon
- Re: [Moq] Latency @ Twitch Justin Uberti