Re: [Moq] Warp

Hi Luke, I will reply inline (easier for me):

On Tue, Feb 15, 2022 at 8:24 AM Luke Curley <kixelated@gmail.com> wrote:

> Hey Ali, a few things to clear up. I like to reply in sections instead of
> inline, but please let me know if I didn't address one of your comments.
>
> Also I pushed the draft to GitHub
> <https://github.com/kixelated/warp-draft> if anybody has any suggestions
> or feedback. This is my first ID and I'm still learning how to effectively
> communicate these ideas.
>

Thanks.

*Warp supports B-frames.*
> The encoder outputs the bitstream in decode order, not presentation or
> encode order.
>

Decode order is always the same as encode order, but presentation order can
be different if you use B like frames.

> Put simply, frames can only reference frames earlier in the bitstream,
> even if the frames are to be displayed in a different order. This bitstream
> is sent over a QUIC stream and fed into the decoder with no modification or
> reordering. The tail of a segment can be dropped with no artifacting
> (missing reference frames) because of this decode order.
>

Part of the confusion was due to your use of "earlier" without saying which
order. Remember, encode and decode orders are the same. The use of B frames
will cause you to delay some of the frames at the source because you cannot
encode a B frame until some frames after that B frame (in presentation
order like the P frame this B frame depends on) is encoded. This trades of
encoding quality with additional delay (increasing latency). But the nice
thing about this is if you do this, you get enough data to burst that will
ease the bw measurement on the client side.

> In fact, one of our issues with WebRTC is that it does not support h.264
> B-frames. Like you mentioned, this meant we had to re-encode our content at
> a VQ loss, even though we do not care about the ~100ms of latency caused by
> b-frames.
>

Indeed, webrtc does not care about quality as much as it does about latency.

> *Warp has a player buffer.*
> Frames should be flushed to the QUIC layer as soon as they are output by
> the encoder. However, that does not mean that frames are expected to arrive
> at the same rate. Network jitter is unavoidable.
>
> Warp uses an adjustable buffer just like HLS/DASH for smooth playback. The
> player can choose to increase the buffer size (rebuffer) if audio/video are
> not forthcoming, increasing latency. Warp also adds the ability to skip
> over video gaps in the buffer during moderate congestion instead of
> rebuffering.
>

And my point was the player can adjust the playback speed instead of
skipping some frames. I am arguing this is likely to be more favorable to
most users.

> The theory is that this is a better user experience, although it will take
> some time to figure out the ideal breakdown between rebuffering and video
> skipping. We're serving a fraction of Twitch traffic using Warp and despite
> a few bugs, the results are positive.
>

Can you elaborate more on "positive"? What do you guys actually do to
figure out which one is more favorable by the users?

One extreme is higher latency, no skipping and high quality and the other
extreme is lower latency, possible skips/jumps in audio/video and the
source encoding will be a bit lower quality, too. The sweet spot for me is
moderate latency, no skipping/jumping around unless absolutely necessary
and adjusting the playback speed to deal with relatively minor-to-major
congestion.

> Yet again, this was an issue we ran into with WebRTC. We apparently have a
> lot of jitter in our video system: too much for the WebRTC jitter buffer to
> handle. The solution was to add a buffer in front of th WebRTC connection
> to smooth out some of the jitter, but that's not ideal.
>

I feel your pain, so do many others on this thread. if you are talking to
someone, to keep the discussion going in a meaningful way, latency must be
bounded to a few hundred of ms each way. Yet, if we are talking about
something that does not have to be this interactive, we get many more tools
in our toolbox.

> *Warp prioritizes segments*
> This is very difficult to visualize as time is a dimension. Here's my
> attempt via email...
>
> Newer segments are higher priority than older segments, so they are sent
> as they are generated. If there is an excess bandwidth (as allowed by the
> congestion controller), then it is used to finish the delivery of older
> segments. It's slightly more complicated when you throw audio into the mix
> but the same concepts apply.
>

I get this part and feel like it is the right strategy. But, there will be
a cliff effect here beyond which most segments will not ever be completed
because the newcomers always take the larger share. I am not entirely sure
where that cliff will be and based on your writing I suppose you are not,
either :)

> This means that the player can have gaps in the buffer. The player chooses
> how long to wait for the each segment to fill in these gaps. If the player
> chooses to skip a segment, it can cancel the stream just to free it up for
> flow control, or it could just wait for this low priority stream to
> eventually arrive if there is excess bandwidth (great for rewind or VOD
> recordings).
>

Backfilling is useful, agreed. Watching the content with gaps after a
rewind would kill the experience.

> So to address your comments, there's no explicit message or signal when a
> segment is skipped. The sender will always send the higher priority data
> first because that's what the low-latency live player "wants". This lack of
> coordination eliminates a lot of edge cases and means that no bandwidth is
> wasted delivering a segment while a cancel message is being (re)transmitted.
>

Got it. My motivation for the client-to-server message was the need for the
client saying "I don't like/need what you are sending, send me rather this".

> *Low-latency ABR is hard*
> In congestion control lingo, frame-based live media is "application
> limited".
>

We call it "source limited" but application limited also works.

> This means that there's often not enough data to fill the congestion
> window or to saturate the pacer, so you don't know if the network could
> sustain a higher bitrate. There are algorithms out there with varied
> results, but unfortunately, the only sure way to determine the capacity of
> the network is to test the network. That's why you either need to buffer
> data or run a speed test to saturate the network, if only for a short while.
>

If I were to be pedantic, I'd repeat what I said in the first email, packet
pair/train methods can be made work without needing to buffer data or
saturating the network. But, doing this with enough accuracy requires
access to the sockets. Not something easy to pull off in the JS code.

> QUIC makes it significantly easier to test network throughput. Just pad
> the connection with PING frames (or a low priority stream) to evaluate if
> it can sustain a given throughput. Just be aware that there's are some
> gotchas with sending PADDING frames.
>

With QUIC, you obviously have access to sockets and can pull a rabbit out
of the hat. And you have options beyond padding frames. This will be more
accurate than any other attempt in the JS plane but my point was we made it
work more or less ok w/o needing that much burst data in the paper I cited
below.

-acbegen

>
> On Mon, Feb 14, 2022, 5:13 PM Ali C. Begen <ali.begen@networked.media>
> wrote:
>
>> Hi Luke,
>>
>> On Mon, Feb 14, 2022 at 11:46 PM Luke Curley <kixelated@gmail.com> wrote:
>>
>>> Hey Hang,
>>>
>>> ABR is the primary mechanism for HLS/DASH to deal with congestion. Warp
>>> adds the ability to skip video at the end of media segments until the ABR
>>> algorithm kicks in. So all things remaining the same, Warp would be a
>>> better user experience.
>>>
>>
>> Warp seems to require the frames to be transmitted in the encode/decode
>> order and to eliminate any additional delay due to frame reordering, I
>> suppose you forbid frames referencing future frames during the encoding? I
>> am not much of an encoding person, but this will certainly reduce your
>> visual quality at a target bitrate.
>>
>> While this allows dropping the tail of a segment when the sender realizes
>> they won't make it on time, depending on the number of frames the sender
>> drops/skips, the resulting jittery playback might be discomforting. The
>> jump in the audio will be even worse.
>>
>> You claim this is a better user experience. Any numbers or studies that
>> you can back this up with? Nobody likes stalls but I don't think anybody
>> likes skipped video/audio frames, either.
>>
>>
>>> That being said, we still want to reduce latency. The way to minimize
>>> latency is to transfer each frame to the player as it is encoded, provided
>>> any dependencies are transferred first (GoP structure).
>>>
>>
>> So, you do allow forward prediction (because it will improve the quality)?
>>
>>
>>> This poses a problem for client-side ABR.
>>> <https://blog.twitch.tv/en/2020/01/15/twitch-invites-you-to-take-on-our-acm-mmsys-2020-grand-challenge/> Measuring
>>> the arrival time of frames on the client side is not enough signal to
>>> determine the connection bandwidth, making switching up renditions quite
>>> difficult. There's three solutions to this problem that I've seen: 1. hold
>>> back enough media to burst the connection (LL-HLS)
>>>
>>
>> Measuring the bandwidth with just a few packets (packet pair/train) is
>> not very reliable but it does not mean one needs to hold back several
>> frames to measure something reliable. That's why we opted for counting the
>> data received and the actual time it took to receive that data (discarding
>> the idle times) to measure the bandwidth.
>> the code:
>> http://reference.dashif.org/dash.js/nightly/samples/low-latency/lolp_index.html
>> the paper:
>> https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9429986
>>
>>
>>> or 2. run a speed test on the connection every so often (our LHLS
>>> solution)
>>>
>>
>> If you are already fighting with congestion, this will only make things
>> worse. Setting the right priority on the measurement stream is also
>> problematic. I would not go for this approach for Warp.
>>
>>
>>> or 3. hope that machine learning can save the day.
>>>
>>
>> It can only do so if it is trained with the right data. There is some
>> hope here, but then conditions vary vastly so this is not a quick win,
>> either.
>>
>>
>>> With Warp we used a fourth option: 4. have the sender perform ABR. The
>>> sender knows the send rate, knows the queue size, and is the entity
>>> actively limiting the amount of data that can be sent. This is not ideal
>>> for a CDN because they're traditionally designed to be stateless with a
>>> standardized API, but I certainly think it's a solvable problem.
>>>
>>
>> Right, the sender knows exactly what is able to send. But, it does not
>> know whether the receiver wants all that data. In principle, it is the
>> client who needs to decide on the selected representation and I suppose
>> Warp will eventually include some messages to be sent by the client to that
>> effect.
>>
>> At the moment, does the Warp sender pick the segment at the highest
>> bitrate that it assumes it can send on time w/o any tail dropping? Or could
>> it pick an even higher bitrate segment and risk dropping some frames toward
>> the end?
>>
>> -acbegen
>>
>>
>>>
>>> On Mon, Feb 14, 2022 at 3:02 AM shihang (C) <shihang9@huawei.com> wrote:
>>>
>>>> @Luke, I wonder whether the timely bandwidth estimate is needed for
>>>> ABR, given that client has 2-5s buffer anyway(when facing congestion which
>>>> is the primary scenario of Warp). The client side ABR is more scalable than
>>>> sender side ABR, right? Is the computation overhead of the sender side ABR
>>>> one of obstacles when deploying to the CDN?
>>>>
>>>>
>>>>
>>>> Best Regards,
>>>>
>>>> Hang
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *发件人**:* Moq <moq-bounces@ietf.org> *代表 *Luke Curley
>>>> *发送时间:* 2022年2月12日 13:20
>>>> *收件人:* Law, Will <wilaw@akamai.com>
>>>> *抄送:* MOQ Mailing List <moq@ietf.org>
>>>> *主题:* Re: [Moq] Warp
>>>>
>>>>
>>>>
>>>> ...and to clarify what I mean by "CDN support", I mean using HTTP/3
>>>> requests instead of QUIC streams. A client could request each HLS/DASH
>>>> segment in parallel providing the Warp priority as a header
>>>> <https://datatracker.ietf.org/doc/html/draft-ietf-httpbis-priority>.
>>>> You effectively get the same segment data and prioritization but
>>>> encapsulated in a HTTP response.
>>>>
>>>>
>>>>
>>>> However it does get quite a bit more complicated than that. The biggest
>>>> issue is that prioritization is not guaranteed, especially when multiple
>>>> connections are involved (ex. different hostnames). It's also very
>>>> difficult for the server to provide a timely bandwidth estimate for ABR. We
>>>> opted to take the simpler route and push via WebTransport instead of
>>>> pulling via HTTP/3.
>>>>
>>>> On Fri, Feb 11, 2022, 5:32 PM Luke Curley <kixelated@gmail.com> wrote:
>>>>
>>>> Hey Will,
>>>>
>>>>
>>>>
>>>> Unlike HLS, the media sender is responsible for ABR. Our server pulls
>>>> the estimated bitrate directly from the QUIC congestion controller (BBR,
>>>> Cubic, etc) and switches renditions at segment boundaries. This is a
>>>> dramatic improvement over client-side ABR because it's the actual rate at
>>>> which media can be sent. It's also the primary challenge with using Warp
>>>> over HTTP/3 with CDN support.
>>>>
>>>>
>>>>
>>>> Also I want to clarify that this draft is not complete. I wanted to
>>>> focus on what I felt were the core concepts that would shape a WG. That may
>>>> have been a mistake because it's come up a few times... and in fact. the
>>>> client can create streams. These are used to send messages like
>>>> load/play/pause/track but somehow I completely neglected to document it.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Feb 11, 2022 at 4:27 PM Law, Will <wilaw@akamai.com> wrote:
>>>>
>>>> @Luke – how does WARP handle throughput variation across the
>>>> connection (the equivalent of ABR with HAS)? The draft indicates that older
>>>> frames are dropped in the face of congestion. This implies that resolution
>>>> and encoded bitrate remain constant and that it’s the rendered frame rate
>>>> that drops on the client to compensate for any throughput degradation. If
>>>> that is correct, then at what point can the client decide I’m tired of
>>>> receiving the 4K feed at 8fps, I’d rather get 1080p at 30fps? Conceivably
>>>> it could request the server to begin sending a lower resolution/bitrate
>>>> stream of data, however the established streams are unidirectional and no
>>>> control back-channel is defined. It could also tune-in to a new QUIC stream
>>>> at the appropriate bitrate, if there was some standard metadata to define
>>>> what was available and how to access it.   Do you consider discovery and
>>>> service description to be out of scope of this core protocol definition? If
>>>> so, has any thought be given to extending WARP so that it includes service
>>>> discovery and description and perhaps a control back-channel?
>>>>
>>>>
>>>>
>>>> Cheers
>>>>
>>>> Will
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From: *Luke Curley <kixelated@gmail.com>
>>>> *Date: *Friday, February 11, 2022 at 1:11 PM
>>>> *To: *Sergio Garcia Murillo <sergio.garcia.murillo@gmail.com>
>>>> *Cc: *MOQ Mailing List <moq@ietf.org>
>>>> *Subject: *Re: [Moq] Warp
>>>>
>>>>
>>>>
>>>> Hey Sergio,
>>>>
>>>>
>>>>
>>>> Warp has flexible latency depending on the broadcaster and viewer(s).
>>>>
>>>>
>>>>
>>>> The broadcaster chooses their encoding settings, for example using
>>>> b-frames (higher latency/quality) or using a larger look-ahead buffer
>>>> (better compression and rate control). The viewer dynamically chooses their
>>>> buffer size, dictating how long to wait before skipping the end of a
>>>> segment.
>>>>
>>>>
>>>>
>>>> With a perfect network, Warp would transfer each video frame from the
>>>> encoder to decoder as they are generated. However, congestion makes that
>>>> impossible, which is why it's necessary to have a dynamic player buffer for
>>>> smooth playback. For example, a viewer with a reliable connection may have
>>>> a 500ms buffer, while a viewer with a cellular connection may have a 2s
>>>> buffer, while a viewer in a developing country may have a 5s buffer, while
>>>> a service that archives the stream may have a 30s buffer for maximum
>>>> reliability.
>>>>
>>>>
>>>>
>>>> The broadcaster and any intermediate proxies do not know or care about
>>>> each viewer's desired latency. They just create QUIC streams, transmit
>>>> packets based on stream priority, and eventually close any streams if they
>>>> reach some maximum upper bound. This makes it ideal for video distribution
>>>> especially when multiple caches and proxies are involved.
>>>>
>>>>
>>>>
>>>> On Fri, Feb 11, 2022 at 11:59 AM Sergio Garcia Murillo <
>>>> sergio.garcia.murillo@gmail.com> wrote:
>>>>
>>>> Hi luke,
>>>>
>>>>
>>>>
>>>> QUICK question, what is the target glass to glass latency for WARP?
>>>>
>>>>
>>>>
>>>> Best regards
>>>>
>>>> Sergio
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> El vie, 11 feb 2022 20:22, Luke Curley <kixelated@gmail.com> escribió:
>>>>
>>>> Hey MOQ, I just published a draft for Warp
>>>> <https://urldefense.com/v3/__https:/datatracker.ietf.org/doc/draft-lcurley-warp/__;!!GjvTz_vk!CN-FzuL40h3RSvNUobOIUtEEMChMR2oAcW4N7QAzHt3yJISvAijnqM0MaK7E$>.
>>>> Here's a quick FAQ:
>>>>
>>>>
>>>>
>>>> *What is Warp?*
>>>>
>>>> Twitch has developed a new video distribution protocol to replace our
>>>> custom low-latency HLS stack. Warp uses QUIC streams to deliver media
>>>> segments, prioritizing streams based on content and age. This allows
>>>> viewers to skip old video content during congestion instead of buffering;
>>>> improving the user experience and reducing latency.
>>>>
>>>>
>>>>
>>>> *What about contribution?*
>>>>
>>>> Warp is very similar to Facebook's RUSH
>>>> <https://urldefense.com/v3/__https:/www.ietf.org/archive/id/draft-kpugin-rush-00.html__;!!GjvTz_vk!CN-FzuL40h3RSvNUobOIUtEEMChMR2oAcW4N7QAzHt3yJISvAijnqK2Y57G-$> and
>>>> can be used as a contribution protocol. There's a few fundamental
>>>> differences, like the prioritization scheme and transferring media as
>>>> segments. This first version of the draft focuses on these core differences
>>>> and omits anything else that could be a distraction.
>>>>
>>>>
>>>>
>>>> *Why not WebRTC?*
>>>>
>>>> We initially used WebRTC (both media and data channels) for
>>>> last mile-delivery but the user experience was significantly worse than our
>>>> existing stack. There were so many minor issues, primarily caused by
>>>> WebRTC's focus on real-time latency and the inability to control the client
>>>> (browser) behavior. I personally had to scrap years of work on a custom
>>>> SFU. 😔
>>>>
>>>>
>>>>
>>>> *Why not use datagrams?*
>>>>
>>>> Warp uses QUIC streams because it dramatically simplifies the protocol.
>>>> We get the full benefit of QUIC's fragmentation, congestion control, flow
>>>> control, recovery, cancellation, multiplexing, etc. Using datagrams gives
>>>> you extra flexibility but it also means you have to reimplement everything
>>>> on every platform.
>>>>
>>>>
>>>>
>>>> *Why not use HTTP?*
>>>>
>>>> Good question! The key to warp is the prioritization mechanism, which
>>>> could work with HTTP/3 and possibly HTTP/2. Twitch has the benefit of
>>>> running our own network so it was just simpler to make a push-based
>>>> protocol using QUIC and WebTransport. I've got some ideas for a more
>>>> complicated HTTP solution that would enable CDN support..
>>>>
>>>>
>>>>
>>>> *How is media delivered?*
>>>>
>>>> Warp sends each segment (group of pictures) over a QUIC stream. Audio
>>>> and newer video segments are prioritized, causing older video segments to
>>>> starve during congestion. Either side can cancel the stream to effectively
>>>> drop the tail of a segment. Media is quite linear by nature and most frames
>>>> need to be processed in decode order.
>>>>
>>>>
>>>>
>>>> *Why not drop individual frames?*
>>>>
>>>> We decided that it wasn't worth dropping non-reference frames, given
>>>> their infrequency and relatively small size for high quality media. Our
>>>> hardware encodes (QuickSync) have only reference frames and we've seen
>>>> software encodes with only 3% non-reference frames by file size. And of
>>>> course, dropping reference frames will cause artifacting or freezing so
>>>> that wasn't an option.
>>>>
>>>>
>>>> * How could this be improved?*
>>>>
>>>> We want to experiment with layered coding (ex. SVC) at some point in
>>>> the future. This would involve transferring non-reference frames/slices on
>>>> a different QUIC stream so they can be deprioritized. Simulcast would work
>>>> the same way: transfer each rendition on a different QUIC stream
>>>> prioritized based on the resolution.
>>>>
>>>>
>>>>
>>>> *Why use fMP4?*
>>>>
>>>> HLS and DASH support CMAF: a standard for fragmenting MP4 files. Warp
>>>> uses this file format so we can deliver the same segment data regardless of
>>>> the delivery protocol. The Warp MP4 atom uses JSON because I was too lazy
>>>> to do things "properly" for this first draft. The wire format doesn't
>>>> matter!
>>>>
>>>> --
>>>> Moq mailing list
>>>> Moq@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/moq
>>>>
>>>> <https://urldefense.com/v3/__https:/www.ietf.org/mailman/listinfo/moq__;!!GjvTz_vk!CN-FzuL40h3RSvNUobOIUtEEMChMR2oAcW4N7QAzHt3yJISvAijnqCbRA_82$>
>>>>
>>>> --
>>> Moq mailing list
>>> Moq@ietf.org
>>> https://www.ietf.org/mailman/listinfo/moq
>>>
>>