Re: [Moq] Data model and MoQ streams

If the “catalog” is another media flow, then that could certainly work. That’d certainly not be O(n^2).

I don’t think patching is even needed so long as we have a ‘stream-of-messages’ thing (where one can ask for message X)—it is just way easier that way!

-=R

I also like “catalog” FWIW

From: Law, Will <wilaw@akamai.com>
Date: Thursday, December 1, 2022 at 9:59 AM
To: Roberto Peon <fenix@meta.com>
Cc: Christian Huitema <huitema@huitema.net>, MOQ Mailing List <moq@ietf.org>, Luke Curley <kixelated@gmail.com>
Subject: Re: [Moq] Data model and MoQ streams
@Roberto - what if the “manifest” were just another media object flow that the client subscribed to? So it would receive it at the start of the playback, then subscribe to updates which would it would receive automatically. Such a subscribe-able
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
ZjQcmQRYFpfptBannerEnd
@Roberto  - what if the “manifest” were just another media object flow that the client subscribed to? So it would receive it at the start of the playback, then subscribe to updates which would it would receive automatically. Such a subscribe-able manifest can be sent as a monolithic object, or broken up into a subscription of its sub-components as you suggest in the thread. The problem with the latter approach is that you still need a an overall descriptor to indicate what sub-components are available. The “manifest”  will likely be compact and only dispatched when there is a change in the publishers content mix (adaption sets, representations etc to use DASH terms, not with every GOP as with HLS).  This is likely to be in the order of minutes, so its data contribution over the wire is de minimus compared to the other media flows. We could conceive of a patch update mechanism, in which the initial receipt is the full manifest and then you subscribe to a flow of delta updates. Since clients can join at arbitrary times and we don’t want to have to produce custom updates per client, this introduces the complexity of signaling to indicate to the client which base version it should apply the patch to, or else all patches are a delta from the base and not from each other, which reduces the patch efficiency. I think the simplicity of a solution in which 1) the manifest is super compact 2) it only updates when the publishers changes its overall content mix and 3) updates carry the complete manifest – is most attractive and scalable. If n is the number of changes in the inventory, then the manifest is dispatched with order O(n) not O(n^2).

Cheers
Will

BTW - I use the term “manifest” to describe the inventory offered by a publisher. We  have strong aversions connotations around “playlist” and “manifest” with the HLS and DASH formats respectively. To avoid inheriting unintended assumptions, we should use a new term in the world of MoQ. I propose the term “catalog” to represent this inventory.  Its syntactically concise, not overloaded in the format world and understandable without explanation. You’ll see this term used in some upcoming drafts which Suhas and myself will release shortly. But for now, think about “catalog” and if it might be a preferred alternative to “manifest”.

From: Roberto Peon <fenix@meta.com>
Date: Thursday, December 1, 2022 at 9:02 AM
To: Luke Curley <kixelated@gmail.com>, "Law, Will" <wilaw@akamai.com>
Cc: Christian Huitema <huitema@huitema.net>, MOQ Mailing List <moq@ietf.org>
Subject: Re: [Moq] Data model and MoQ streams

Manifests that are not appendable are going to need to send O(n^2) data when new things are added to the manifest. This has been particularly bad with live streaming, where the lengths, sizes, and codec parameters change over the lifetime of the video/stream.

An (appendable) stream of messages would allow a manifest that was at most O(n) transfer.
Even better would be to take some of the structure from pre-existing manifests, and represent as stream-of-messages, i.e. a stream of periods, a stream of representations, etc.

This can be (re) leveraged for both person-to-person and broadcast so long as a player has the ability to receive something other than what it requested.

(e.g. if a manifest is a list of things that /could/ be sent, but there is some complication in generating or fetching the most desired representation, then some other representation should be sent)
-=R

From: Moq <moq-bounces@ietf.org> on behalf of Luke Curley <kixelated@gmail.com>
Date: Wednesday, November 30, 2022 at 4:25 PM
To: Law, Will <wilaw@akamai.com>
Cc: Christian Huitema <huitema@huitema.net>, MOQ Mailing List <moq@ietf.org>
Subject: Re: [Moq] Data model and MoQ streams
My current thought is that the sender advertises what tracks are available for subscription, including any custom metadata: TRACK 1: resolution=480p, bitrate=2000, codec=avc1. 4d002a TRACK 2: resolution=720p, bitrate=4000, codec=avc1. 4d002aTRACK
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
ZjQcmQRYFpfptBannerEnd
My current thought is that the sender advertises what tracks are available for subscription, including any custom metadata:

TRACK 1: resolution=480p,  bitrate=2000, codec=avc1.4d002a
TRACK 2: resolution=720p,  bitrate=4000, codec=avc1.4d002a
TRACK 3: resolution=1080p, bitrate=6000, codec=av01.0.15M.10
TRACK 4: bitrate=128, codec=mp4a.40.2, language=eng
TRACK 5: bitrate=128, codec=mp4a.40.2, language=jap

This is effectively a manifest. We could potentially leverage an existing manifest (ex. HLS/DASH), use an existing container (ex. MP4 moov), or make something custom. Personally, I like sending an init segment (ex. mp4 moov) since it already has a lot of this information and is required to configure the decoder, but I digress.

The receiver chooses which tracks to receive:

PLAY [3, 2, 1]
PLAY 4

This means "send me 1080p, otherwise 720p, otherwise 480p based on my available bitrate" and "send me english". The relay does need to keep a mapping from track ID to bitrate but I think that's fine. The order is also important, so the relay can naively check if the track is enabled and below the available bitrate, rather than needing to sort itself based on business logic.

This could be very useful for seamless track switching:

TRACK 6: resolution=720p, bitrate=3000, codec=avc1.4d002a, advertisement=true, enabled=false
PLAY [6, 3, 2, 1]
...
TRACK_UPDATE 6: enabled=true

The relay would start to send track 6 after it's been enabled with no round-trip required. The sender can optionally disable tracks 3/2/1 to avoid non-advertisement content from being available at the time.

On Wed, Nov 30, 2022 at 1:54 PM Law, Will <wilaw@akamai.com<mailto:wilaw@akamai.com>> wrote:
I want to highlight one issue with scalability as we begin to propose solutions in which. “.. Or the receiver could just tell the sender to choose a track from a subset (ex. only these tracks, which are below 720p). The sender only needs to know the maximum bitrate for each track.”. This scheme requires the sender to have knowledge of what tracks are available, for a given resource, along with their bitrates, resolutions and any other attributes on which a client may choose to filter (such as language, captions, accessibility etc).  When that sender is an edge relay, we must maintain state of the content “package” in order to implement these server-side decisions. This requires memory, and the relay parsing some of type of package description, such as a manifest.  We assume that the relay delivering the subscription was even the one that previously delivered some type of manifest and this may not be true. There will be many types of manifests, their format will change all the time and when they do the entire delivery surface must be constantly updated to accommodate the evolution. In highly scaled system, having the ability for the edges not to have maintain content state over time , to not have to have knowledge of the media and to easily substitute one relay for another during delivery leads to more robust architecture. At a higher level, we can think of  knowledge of the internal offerings of a given live resource as a contract between the publisher and subscribers(s), where those agents are the only ones that need to understand the composition and the relays simply follow routing instructions and do not make decisions outside the state of the WebTransport connections which they manage.

I can illustrate this with two different means to achieve what Luke hinted at in this thread.

Option A
Client: Hey server, send me the highest bitrate stream of 720p or below that you can for the resource ABC123
Server: I must previously have received the manifest describing what streams are available for ABC123, parsed it, stored it. So I look up the streams, filter out those > 720p and select the highest bitrate from the remainder.

It would be a more scalable design if the relay only had to maintain state about the WebTransport connections. This can be accomplished by the subscriber providing the list of qualifying subscription identifiers, along with their target bitrates, and then asking the sender (which is a relay) to pick one.

Option B
Client, Hey relay, from this list of stream IDs and bitrates , send me the highest bitrate appropriate for my connection  [“123”:1Mbps,”456”:2Mbps,”789”:5Mbps].
Relay: I see your throughput is 3Mbps so I’m sending you stream 456.

Option B is easier to scale. The relay doesn’t need to know that “456” is part of ABC123. I can ask any relay for this content, even if it has never seen resource ABC123 before. It allows a lot of flexibility in how the content is described/referenced by delegating the composition of the resource to the publishers and subscribers and having relays respond to a very simple and low level set of forwarding instructions.

Cheers
Will

From: Luke Curley <kixelated@gmail.com<mailto:kixelated@gmail.com>>
Date: Wednesday, November 30, 2022 at 10:03 AM
To: Christian Huitema <huitema@huitema.net<mailto:huitema@huitema.net>>
Cc: MOQ Mailing List <moq@ietf.org<mailto:moq@ietf.org>>
Subject: Re: [Moq] Data model and MoQ streams

I like the summary; no disagreements here.

I think any confusion has been caused by loose terminology and loose requirements. I'm going to take a stab at both but I don't really know what I'm doing or how to be most effective in IETF.

The media bitrate needs to be adjusted in response to congestion. For 1:1 the encoder can change the encoded bitrate, but for 1:N we need a bitrate ladder.

HLS/DASH works by letting the receiver choose the next rendition (audio+video track) to download based on decoder support and network conditions. Unfortunately, the receiver has very little information about congestion when media is delivered frame-by-frame (more info<https://github.com/kixelated/warp-draft/issues/44>). This is a fundamental problem with LL-DASH and Twitch's LHLS.

The solution is to expose the congestion controller's estimated bitrate from the sender. This could be pushed periodically like a RTCP sender report but that has a delay, especially during congestion.

I propose an alternative. In Warp, a session advertises subscribable tracks (aka media streams) that can have different content, encodings, bitrate, etc. There can be multiple active subscriptions and for each subscription, the receiver asks the sender for one track from a provided list. The sender uses the congestion controller's estimated bitrate, rounding down to choose the track. This sender-side ABR is extremely simple and has worked great in production.

This gives the receiver the ability to control the desired experience while allowing it to delegate ABR responsibility. The receiver could request specific tracks using subscriptions with a list of size 1, implementing receiver-side ABR if they would like. Or the receiver could just tell the sender to choose a track from a subset (ex. only these tracks, which are below 720p). The sender only needs to know the maximum bitrate for each track.

On Mon, Nov 28, 2022, 6:44 PM Christian Huitema <huitema@huitema.net<mailto:huitema@huitema.net>> wrote:
This email stems from an ongoing discussion of the "data model" used by
MoQ on Slack. Slack is a great tool for rapid exchanges, but not every
member of this list follows it. Also, it is not archived, which means
that the exchanges will disappear after a few weeks. So, email. Lots of
what follows is my personal take on the debate.

The questions started with exchanges between Luke and Suhas about the
names of variables used in protocol headers. These exchanges were made a
bit harder because we don't have good agreement on the data model behind
MoQ, including agreements on how to name what. Part of that is because
different teams are working on different scenarios, such as streaming
and real time, and also different network configurations, such as with
relay or not.

I think that we have some agreement about what MoQ shall do: enabling
the transport of media streams. The client opens a QUIC connection using
Web Transport and requests one or several media streams. The server
sends the corresponding data, until the client somehow closes the media
stream. That means we also have an agreement about what is out-of-scope:
some communication scenarios require the orchestration of several media
streams, such as multiple audio, video and other streams from multiple
participants in a conference. I would expect applications doing that to
open multiple MoQ streams, perhaps using multiple connections, and
organizing the orchestration themselves. The "MoQ stream" would be the
building block.

We have a bit of a discussion on what the "MoQ stream" is. There is
broad agreement on the general concept that the media is composed of a
series of "objects", organized as series of groups (GOP). But then there
are differences, because a given media stream (say, a video) can be
encoded in multiple ways, say high, medium and low definition. The Warp
draft calls these different "renditions" of the media stream.

The differences are largely due to the way different teams plan to
handle congestion. One way is to have the server decide. The media is
sent as a series of GOP, each on its own QUIC stream. At the beginning
of each GOP, the server looks at transmission conditions and decides
what rendition to use for the next GOP. This is a very convenient way to
manage congestion, but it imposes constraints: each rendition shall have
the same notion of GOP, which is not obvious is for example the low def
and high def codecs are operating in parallel. In that architecture,
relays have to acquire all renditions of a MoQ stream, so they can do
the real time adaptation. Real time clients also have to upload multiple
renditions so relays can get them and adapt.

Another way is to let the client decide. The client asks for a specific
rendition, and the server provides exactly that. In case of congestion,
the server drops some data to fit into the available bandwidth. The
client notices that, closes the current stream, and opens a new MoQ
stream with a lower definition. Adaptation takes a bit longer than in
the previous scenario, but there is no requirement to synchronize GOP
boundaries across different algorithms. The relay management is also a
bit simpler.

Then there are mixed scenarios. The client might ask for both low def
and high def, display high def as long as it receives it correctly,
switch to low def if high def stutters. The server would send GOP for
low def and high def in parallel, using a higher priority for low def.
Relays could use similar strategies, asking for all available renditions.

I think these positions are not as far apart as it seems. They lead to
exposing the "rendition" property prominently in the protocol. (I hope
we can find a way to do that in a manner independent of media and
codec.) This would lead to something like:

* client requests a media (by name) and specifies the renditions that it
is willing to receive. Server responds with some kind of accept message.
* server transmits the media as a series of GOP, which each GOP starting
with identification of which media stream and what rendition this is.
Servers may send several GOP renditions in parallel, each using its own
QUIC stream, with appropriate priorities. Servers chose what to send
based on client references and network conditions.
* relays act as client vis a vis the origin or the upstream relay, act
as server for the clients or the downstream relays, typically request
multiple renditions in parallel.

There are things to iron out. Media names are typically long URIs. We
would want a short identifier or "media ID" in the QUIC stream headers.
The mapping from media name to media ID could be negotiated as part of
the initial request/accept exchange. The GOP headers would have to carry
a rendition ID -- or maybe we negotiate a unique ID for each valid
combination of media and rendition, add complexity and save a few bits.
(I could defend both sides of that argument.)

OK. That message is already too long. But I hope it helps informing the
WG and making progress.

-- Christian Huitema

--
Moq mailing list
Moq@ietf.org<mailto:Moq@ietf.org>
https://www.ietf.org/mailman/listinfo/moq<https://www.ietf.org/mailman/listinfo/moq>