Re: [Moq] Latency @ Twitch

Ian Swett <ianswett@google.com> Wed, 03 November 2021 21:13 UTC

Return-Path: <ianswett@google.com>
X-Original-To: moq@ietfa.amsl.com
Delivered-To: moq@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6842D3A11D1 for <moq@ietfa.amsl.com>; Wed, 3 Nov 2021 14:13:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.599
X-Spam-Level:
X-Spam-Status: No, score=-17.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lN_yPAdZ_s3M for <moq@ietfa.amsl.com>; Wed, 3 Nov 2021 14:13:00 -0700 (PDT)
Received: from mail-vk1-xa29.google.com (mail-vk1-xa29.google.com [IPv6:2607:f8b0:4864:20::a29]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7BAAA3A11CF for <moq@ietf.org>; Wed, 3 Nov 2021 14:13:00 -0700 (PDT)
Received: by mail-vk1-xa29.google.com with SMTP id n201so1968099vkn.12 for <moq@ietf.org>; Wed, 03 Nov 2021 14:13:00 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=yBFH2WjMCkLIh1yrQLgRk4sKekM3kkYWWZtkvHznSeU=; b=D0b1AmmCVbqn6rrCdVXVN8hJbRYMDea659lB8MmR4pN4D+04rhBzSIAvjBY55JxD7d SSUt+mGI6VZDQ2zJNfxHbS6VkEWFrWvEiRAzdjbjmZM5NmVXLvf2wYFbmKJS1F5/cWxh mIwveBAdZYy5eiKtQ4PWadOpzjShuEu2m07TKWMsQo1HPtW/qLYxL1eJ9RWsrJ3YuJ/5 aEK4tDFXckKHGfsAWXarb64+XLqQ1Oe9CuESlW/YLk65vk7zbx5SM3EzQg9zchubb5QR XSV2aBSNKLNJ0br69Em0PQyOG//2dsaBy0tEG4hSqoHgtZ2O+aR9T9wQpUWC3HkR2hfu QFQA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=yBFH2WjMCkLIh1yrQLgRk4sKekM3kkYWWZtkvHznSeU=; b=x8PS3P5BrEE91/GsRxFo3fLyTFjQxbZBzgXBQH+EoPwoe211LbfhxdwxUjtPpzPo9j MGJKbSym0/CBwL7pexQxTXww7g3x4J0ccE+KvVkyOC9mWFW8946xDAjs0qyr3l+8rTlc tjxqTN37PHaNz90WdEuYV9Y6QEVCpcRP6cxqduf2u6czDOx5+8Go/e1yo3xkEV8yid+A wyWCjsdjJf45qlkicdgTidlEgPI6b4N6tZnYhUAMlwDf9ETnPknfSPqiy8/BV0V1rVY4 yL7ibptuHFP/avx2TbCABxtcPcwJUlRr5UcZ4F8my7MT18XVHsy1XaAPHZTz0CxXWySw znwg==
X-Gm-Message-State: AOAM532F/NPYo5j6ak+VAnqdLN0AwOKOf1mQcJPDhA+BmXnX+HI/qd9N UvBFdfU5J0QGItg/g6YO+SZZLP4tQDjCKmUNGQK0DQ==
X-Google-Smtp-Source: ABdhPJxOs+xruQ9TolvqjmiVEw+acR5dq7iTygFgAUCQj6ZbraNffRzYKOLL3Axvi/jTm4/2oUZPL18dslNNY9LpFZ4=
X-Received: by 2002:a05:6122:7c8:: with SMTP id l8mr27775111vkr.8.1635973975596; Wed, 03 Nov 2021 14:12:55 -0700 (PDT)
MIME-Version: 1.0
References: <CAHVo=ZnXNnT2uod6oxHXTRoyA58cpn35BrV6eOXXnGUOFbcvSQ@mail.gmail.com> <0ADDD7B3-B49E-40E1-99E9-278EF0EA9B85@networked.media> <AF32886D-0524-45D4-9577-FCEFD601A0A1@bbc.co.uk> <73C6FFEB-CE81-4DE7-B110-55892D746927@networked.media> <CAHVo=Znu7F18fj4Anxz3j1byM+9aQmJ6N4DdFjUZk9fGjG8iXg@mail.gmail.com>
In-Reply-To: <CAHVo=Znu7F18fj4Anxz3j1byM+9aQmJ6N4DdFjUZk9fGjG8iXg@mail.gmail.com>
From: Ian Swett <ianswett@google.com>
Date: Wed, 03 Nov 2021 17:12:42 -0400
Message-ID: <CAKcm_gM=bcALtqoLd8mYLdCiTK=ZfEF0RkXBkw17bPR6MjoMhA@mail.gmail.com>
To: Luke Curley <kixelated@gmail.com>
Cc: "Ali C. Begen" <ali.begen@networked.media>, Piers O'Hanlon <piers.ohanlon@bbc.co.uk>, Bo Zhang <bzgmuj@yahoo.com>, MOQ Mailing List <moq@ietf.org>, Yuriy Reznik <yreznik@brightcove.com>, Thiago Teixeira <tteixeira@brightcove.com>
Content-Type: multipart/alternative; boundary="000000000000ada96a05cfe8df74"
Archived-At: <https://mailarchive.ietf.org/arch/msg/moq/4kfBws44ZQVPgtk7wzz1cdaDM6I>
Subject: Re: [Moq] Latency @ Twitch
X-BeenThere: moq@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Media over QUIC <moq.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/moq>, <mailto:moq-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/moq/>
List-Post: <mailto:moq@ietf.org>
List-Help: <mailto:moq-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/moq>, <mailto:moq-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 03 Nov 2021 21:13:06 -0000

>From personal experience, BBR has some issues with application limited
behavior, but it is still able to grow the congestion window, at least
slightly, so it's likely an improvement over Cubic or Reno.

On Wed, Nov 3, 2021 at 4:40 PM Luke Curley <kixelated@gmail.com> wrote:

> I think resync points are an interesting idea although we haven't
> evaluated them. Twitch did push for S-frames in AV1 which will be another
> option in the future instead of encoding a full IDR frame at these resync
> boundaries.
>
> An issue is you have to make the hard decision to abort the current
> download and frantically try to pick up the pieces before the buffer
> depletes. It's a one-way door (maybe your algorithm overreacted) and you're
> going to be throwing out some media just to redownload it at a lower
> bitrate.
>
> Ideally, you could download segments in parallel without causing
> contention. The idea is to spend any available bandwidth on the new segment
> to fix the problem, and any excess bandwidth on the old segment in
> the event it arrives before the player buffer actually depletes. That's
> more or less the core concept for what we've built using QUIC, and it's
> compatible with resync points if we later go down that route.
>
>
> And you're exactly right Piers. The fundamental issue is that a web player
> lacks the low level timing information required to infer the delivery rate.
> You would want something like BBR's rate estimation
> <https://datatracker.ietf.org/doc/html/draft-cheng-iccrg-delivery-rate-estimation> which
> inspects the time delta between packets to determine the send rate. That
> gets really difficult when the OS and browser delay flushing data to the
> application, be it for performance reasons or due to packet loss (to
> maintain head-of-line blocking).
>
> I did run into CUBIC/Reno not being able to grow the congestion window
> when frames are sent one at a time (application limited). I don't believe
> BBR suffers from the same problem though due to the aforementioned rate
> estimator.
>
> On Wed, Nov 3, 2021 at 10:05 AM Ali C. Begen <ali.begen@networked.media>
> wrote:
>
>>
>>
>> > On Nov 3, 2021, at 6:50 PM, Piers O'Hanlon <piers.ohanlon@bbc.co.uk>
>> wrote:
>> >
>> >
>> >
>> >> On 2 Nov 2021, at 20:39, Ali C. Begen <ali.begen=
>> 40networked.media@dmarc.ietf.org> wrote:
>> >>
>> >>
>> >>
>> >>> On Nov 2, 2021, at 3:39 AM, Luke Curley <kixelated@gmail.com> wrote:
>> >>>
>> >>> Hey folks, I wanted to quickly summarize the problems we've run into
>> at Twitch that have led us to QUIC.
>> >>>
>> >>>
>> >>> Twitch is a live one-to-many product. We primarily focus on video
>> quality due to the graphical fidelity of video games. Viewers can
>> participate in a chat room, which the broadcaster reads and can respond to
>> via video. This means that latency is also somewhat important to facilitate
>> this social interaction.
>> >>>
>> >>> A looong time ago we were using RTMP for both ingest and distribution
>> (Flash player). We switched to HLS for distribution to gain the benefit of
>> 3rd party CDNs, at the cost of dramatically increasing latency. A later
>> project lowered the latency of HLS using chunked-transfer delivery, very
>> similar to LL-DASH (and not LL-HLS). We're still using RTMP for
>> contribution.
>> >>>
>> > I guess Apple do also have their BYTERANGE/CTE mode for LL-HLS which is
>> pretty similar to LL-DASH.
>>
>> Yes, Apple can list the parts (chunks in LL-DASH) as byteranges in the
>> playlist but the frequent playlist refresh and part retrieval process is
>> inevitable in LL-HLS, which is one of the main differences from LL-DASH (no
>> need for manifest refresh and request per segment not chunk).
>>
>> >
>> >>>
>> >>> To summarize the issues with our current distribution system:
>> >>>
>> >>> 1. HLS suffers from head-of-line blocking.
>> >>> During congestion, the current segment stalls and is delivered slower
>> than the encoded bitrate. The player has no recourse than to wait for the
>> segment to finish downloading, risking depleting the buffer. It can switch
>> down to a lower rendition at segment boundaries, but these boundaries occur
>> too infrequently (every 2s) to handle sudden congestion. Trying to switch
>> earlier, either by canceling the current segment or downloading the lower
>> rendition in parallel, only exacerbates the issue.
>> >>
>> > Isn't the HoL limitation more down to the use of HTTP/1.1?
>> >
>> >> DASH has the concept of Resync points that were designed exactly for
>> this purpose (allowing you to emergency downshift in the middle of a
>> segment).
>> >>
>> > I was curious if there are any studies or experience of how resync
>> points perform in practice?
>>
>> Resync points are pretty fresh out of the oven. dash.js has it in the
>> roadmap but not yet implemented (and we also need to generate test
>> streams). So, there is no data available yet with the real clients. But, I
>> suppose you can imagine how in-segment switching can help in sudden bw
>> drops especially for long segments.
>>
>> >
>> >>> 2. HLS has poor "auto" quality (ABR).
>> >>> The player is responsible for choosing the rendition to download.
>> This is a problem when media is delivered frame-by-frame (ie. HTTP
>> chunked-transfer), as we're effectively application-limited by the encoder
>> bitrate. The player can only measure the arrival timestamp of data and does
>> not know when the network can sustain a higher bitrate without just trying
>> it. We hosted an ACM challenge for this issue in particular.
>> >>
>> > The limitation here may also be down to the lack of access to
>> sufficiently accurate timing information about data arrivals in the browser
>> - unfortunately the Streams API, which provides data from the fetch API,
>> doesn’t directly timestamp the data arrivals so the JS app has to timestamp
>> it which can suffer from noise such as scheduling etc - especially a
>> problem for small/fast data arrivals.
>>
>> Yes, you need to get rid of that noise (see LoL+).
>>
>> > I guess another issue could be that if the system is only sending
>> single frames then the network transport may be operating in application
>> limited mode so the cwnd doesn’t grow sufficiently to take advantage of the
>> available capacity.
>>
>> Unless the video bitrate is too low, this should not be an issue most of
>> the time.
>>
>> >
>> >> That exact challenge had three competing solutions, two of which are
>> now part of the official dash.js code. And yes, the player can figure what
>> the network can sustain *without* trying higher bitrate renditions.
>> >>
>> https://github.com/Dash-Industry-Forum/dash.js/wiki/Low-Latency-streaming
>> >> Or read the paper that even had “twitch” in its title here:
>> https://ieeexplore.ieee.org/document/9429986
>> >>
>> > There was a recent study that seems to show that none of the current
>> algorithms are that great for low latency, and the two new dash.js ones
>> appear to lead to much higher levels of rebuffering:
>> > https://dl.acm.org/doi/pdf/10.1145/3458305.3478442
>>
>> Brightcove’s paper uses the LoL and L2A algorithms from the challenge
>> where low latency was the primary goal. For Twitch’s own evaluation, I
>> suggest you watch:
>> https://www.youtube.com/watch?v=rcXFVDotpy4
>> We later addressed the rebuffering issue, developed LoL+, which is the
>> version included in dash.js now and explained at the ieeexplore link I gave
>> above.
>>
>> Copying the authors in case they want to add anything for the paper you
>> cited.
>>
>> -acbegen
>>
>>
>> >
>> > Piers
>> >
>> >>> I believe this is why LL-HLS opts to burst small chunks of data
>> (sub-segments) at the cost of higher latency.
>> >>>
>> >>>
>> >>> Both of these necessitate a larger player buffer, which increases
>> latency. The contribution system it's own problems, but let me sync up with
>> that team first before I try to enumerate them.
>> >>> --
>> >>> Moq mailing list
>> >>> Moq@ietf.org
>> >>> https://www.ietf.org/mailman/listinfo/moq
>> >>
>> >> --
>> >> Moq mailing list
>> >> Moq@ietf.org
>> >> https://www.ietf.org/mailman/listinfo/moq
>>
>> --
> Moq mailing list
> Moq@ietf.org
> https://www.ietf.org/mailman/listinfo/moq
>