Re: [Moq] Data model and MoQ streams

On Fri, Dec 2, 2022 at 11:16 AM Luke Curley <kixelated@gmail.com> wrote:

> I would caution against aiming to make a media-agnostic protocol right out
> of the gate.
>

 Agreed.

>
> We have very clear media use-cases that need to be supported. There's
> basic features like the ability to seek backwards that either don't work
> with QUICR as designed, or no longer make it a generic transport. I'm not
> saying that it's not possible, and it might even be the best solution, just
> that we're getting ahead of ourselves.
>

QUICR does support the idea of catch up and needed semantics (pretty much
like http cached object retrieval) for caches/relays to retrieve the data
. I do agree that is not semantically equivalent to seeking backwards, but
as you pointed out, it can be made to work ( needs time based seeking).

QUICR was proposed with the idea of unifying streaming and interactive
use-cases in mind, with native relay support and extensibility in mind (for
media centric use-cases like AR/VR, for example) and that makes it very
much in scope of what MOQ is trying to solve, IIUC

> In my opinion, we should build a simple and extensible media protocol. We
> should always be aware of new use-cases and amenable to splitting the
> protocol into a generic and media components (ex. QUIC vs HTTP/3).
>

>
>
> On Thu, Dec 1, 2022, 6:11 PM Suhas Nandakumar <suhasietf@gmail.com> wrote:
>
>> Agree with Will and Roberto.
>>
>> Within QuicR, for example,  the manifest/catalog is just another type of
>> object that can be published and subscribed too. How it gets delivered is
>> via QUIC Streams and has knobs to control its priority and other things.
>> The same is true with media objects. Having the common idea of
>> resources/objects with names that you publish and subscribe to can keep
>> application/domain logic out of delivery/transport protocol.
>>
>>
>> On Thu, Dec 1, 2022 at 11:03 AM Roberto Peon <fenix=
>> 40meta.com@dmarc.ietf.org> wrote:
>>
>>> If the “catalog” is another media flow, then that could certainly work.
>>> That’d certainly not be O(n^2).
>>>
>>> I don’t think patching is even needed so long as we have a
>>> ‘stream-of-messages’ thing (where one can ask for message X)—it is just way
>>> easier that way!
>>>
>>> -=R
>>>
>>>
>>>
>>> I also like “catalog” FWIW
>>>
>>>
>>>
>>> *From: *Law, Will <wilaw@akamai.com>
>>> *Date: *Thursday, December 1, 2022 at 9:59 AM
>>> *To: *Roberto Peon <fenix@meta.com>
>>> *Cc: *Christian Huitema <huitema@huitema.net>, MOQ Mailing List <
>>> moq@ietf.org>, Luke Curley <kixelated@gmail.com>
>>> *Subject: *Re: [Moq] Data model and MoQ streams
>>>
>>> @Roberto - what if the “manifest” were just another media object flow
>>> that the client subscribed to? So it would receive it at the start of the
>>> playback, then subscribe to updates which would it would receive
>>> automatically. Such a subscribe-able
>>>
>>> ZjQcmQRYFpfptBannerStart
>>>
>>> *This Message Is From an External Sender *
>>>
>>> ZjQcmQRYFpfptBannerEnd
>>>
>>> @Roberto  - what if the “manifest” were just another media object flow
>>> that the client subscribed to? So it would receive it at the start of the
>>> playback, then subscribe to updates which would it would receive
>>> automatically. Such a subscribe-able manifest can be sent as a monolithic
>>> object, or broken up into a subscription of its sub-components as you
>>> suggest in the thread. The problem with the latter approach is that you
>>> still need a an overall descriptor to indicate what sub-components are
>>> available. The “manifest”  will likely be compact and only dispatched when
>>> there is a change in the publishers content mix (adaption sets,
>>> representations etc to use DASH terms, not with every GOP as with HLS).
>>> This is likely to be in the order of minutes, so its data contribution over
>>> the wire is de minimus compared to the other media flows. We could conceive
>>> of a patch update mechanism, in which the initial receipt is the full
>>> manifest and then you subscribe to a flow of delta updates. Since clients
>>> can join at arbitrary times and we don’t want to have to produce custom
>>> updates per client, this introduces the complexity of signaling to indicate
>>> to the client which base version it should apply the patch to, or else all
>>> patches are a delta from the base and not from each other, which reduces
>>> the patch efficiency. I think the simplicity of a solution in which 1) the
>>> manifest is super compact 2) it only updates when the publishers changes
>>> its overall content mix and 3) updates carry the complete manifest – is
>>> most attractive and scalable. If n is the number of changes in the
>>> inventory, then the manifest is dispatched with order O(n) not O(n^2).
>>>
>>>
>>>
>>> Cheers
>>>
>>> Will
>>>
>>>
>>>
>>> BTW - I use the term “manifest” to describe the inventory offered by a
>>> publisher. We  have strong aversions connotations around “playlist” and
>>> “manifest” with the HLS and DASH formats respectively. To avoid inheriting
>>> unintended assumptions, we should use a new term in the world of MoQ. I
>>> propose the term “*catalog*” to represent this inventory.  Its
>>> syntactically concise, not overloaded in the format world and
>>> understandable without explanation. You’ll see this term used in some
>>> upcoming drafts which Suhas and myself will release shortly. But for now,
>>> think about “catalog” and if it might be a preferred alternative to
>>> “manifest”.
>>>
>>>
>>>
>>>
>>>
>>> *From: *Roberto Peon <fenix@meta.com>
>>> *Date: *Thursday, December 1, 2022 at 9:02 AM
>>> *To: *Luke Curley <kixelated@gmail.com>, "Law, Will" <wilaw@akamai.com>
>>> *Cc: *Christian Huitema <huitema@huitema.net>, MOQ Mailing List <
>>> moq@ietf.org>
>>> *Subject: *Re: [Moq] Data model and MoQ streams
>>>
>>>
>>>
>>> Manifests that are not appendable are going to need to send O(n^2) data
>>> when new things are added to the manifest. This has been particularly bad
>>> with live streaming, where the lengths, sizes, and codec parameters change
>>> over the lifetime of the video/stream.
>>>
>>> An (appendable) stream of messages would allow a manifest that was at
>>> most O(n) transfer.
>>> Even better would be to take some of the structure from pre-existing
>>> manifests, and represent as stream-of-messages, i.e. a stream of periods, a
>>> stream of representations, etc.
>>>
>>> This can be (re) leveraged for both person-to-person and broadcast so
>>> long as a player has the ability to receive something other than what it
>>> requested.
>>>
>>> (e.g. if a manifest is a list of things that /could/ be sent, but there
>>> is some complication in generating or fetching the most desired
>>> representation, then some other representation should be sent)
>>>
>>> -=R
>>>
>>>
>>>
>>> *From: *Moq <moq-bounces@ietf.org> on behalf of Luke Curley <
>>> kixelated@gmail.com>
>>> *Date: *Wednesday, November 30, 2022 at 4:25 PM
>>> *To: *Law, Will <wilaw@akamai.com>
>>> *Cc: *Christian Huitema <huitema@huitema.net>, MOQ Mailing List <
>>> moq@ietf.org>
>>> *Subject: *Re: [Moq] Data model and MoQ streams
>>>
>>> My current thought is that the sender advertises what tracks are
>>> available for subscription, including any custom metadata: TRACK 1:
>>> resolution=480p, bitrate=2000, codec=avc1. 4d002a TRACK 2: resolution=720p,
>>> bitrate=4000, codec=avc1. 4d002aTRACK
>>>
>>> ZjQcmQRYFpfptBannerStart
>>>
>>> *This Message Is From an External Sender *
>>>
>>> ZjQcmQRYFpfptBannerEnd
>>>
>>> My current thought is that the sender advertises what tracks are
>>> available for subscription, including any custom metadata:
>>>
>>>
>>>
>>> TRACK 1: resolution=480p,  bitrate=2000, codec=avc1.4d002a
>>>
>>> TRACK 2: resolution=720p,  bitrate=4000, codec=avc1.4d002a
>>>
>>> TRACK 3: resolution=1080p, bitrate=6000, codec=av01.0.15M.10
>>>
>>> TRACK 4: bitrate=128, codec=mp4a.40.2, language=eng
>>>
>>> TRACK 5: bitrate=128, codec=mp4a.40.2, language=jap
>>>
>>>
>>>
>>> This is effectively a manifest. We could potentially leverage an
>>> existing manifest (ex. HLS/DASH), use an existing container (ex. MP4 moov),
>>> or make something custom. Personally, I like sending an init segment (ex.
>>> mp4 moov) since it already has a lot of this information and is required to
>>> configure the decoder, but I digress.
>>>
>>>
>>>
>>>
>>>
>>> The receiver chooses which tracks to receive:
>>>
>>>
>>>
>>> PLAY [3, 2, 1]
>>>
>>> PLAY 4
>>>
>>>
>>>
>>> This means "send me 1080p, otherwise 720p, otherwise 480p based on my
>>> available bitrate" and "send me english". The relay does need to keep a
>>> mapping from track ID to bitrate but I think that's fine. The order is also
>>> important, so the relay can naively check if the track is enabled and below
>>> the available bitrate, rather than needing to sort itself based on business
>>> logic.
>>>
>>>
>>>
>>>
>>>
>>> This could be very useful for seamless track switching:
>>>
>>>
>>>
>>> TRACK 6: resolution=720p, bitrate=3000, codec=avc1.4d002a,
>>> advertisement=true, enabled=false
>>>
>>> PLAY [6, 3, 2, 1]
>>>
>>> ...
>>>
>>> TRACK_UPDATE 6: enabled=true
>>>
>>>
>>>
>>> The relay would start to send track 6 after it's been enabled with no
>>> round-trip required. The sender can optionally disable tracks 3/2/1 to
>>> avoid non-advertisement content from being available at the time.
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Nov 30, 2022 at 1:54 PM Law, Will <wilaw@akamai.com> wrote:
>>>
>>> I want to highlight one issue with scalability as we begin to propose
>>> solutions in which. “.. *Or the receiver could just tell the sender to
>>> choose a track from a subset (ex. only these tracks, which are below 720p).
>>> The sender only needs to know the maximum bitrate for each track.”. *This
>>> scheme requires the sender to have knowledge of what tracks are available,
>>> for a given resource, along with their bitrates, resolutions and any other
>>> attributes on which a client may choose to filter (such as language,
>>> captions, accessibility etc).  When that sender is an edge relay, we must
>>> maintain state of the content “package” in order to implement these
>>> server-side decisions. This requires memory, and the relay parsing some of
>>> type of package description, such as a manifest.  We assume that the relay
>>> delivering the subscription was even the one that previously delivered some
>>> type of manifest and this may not be true. There will be many types of
>>> manifests, their format will change all the time and when they do the
>>> entire delivery surface must be constantly updated to accommodate the
>>> evolution. In highly scaled system, having the ability for the edges not to
>>> have maintain content state over time , to not have to have knowledge of
>>> the media and to easily substitute one relay for another during delivery
>>> leads to more robust architecture. At a higher level, we can think of
>>>  knowledge of the internal offerings of a given live resource as a contract
>>> between the publisher and subscribers(s), where those agents are the only
>>> ones that need to understand the composition and the relays simply follow
>>> routing instructions and do not make decisions outside the state of the
>>> WebTransport connections which they manage.
>>>
>>>
>>>
>>> I can illustrate this with two different means to achieve what Luke
>>> hinted at in this thread.
>>>
>>>
>>>
>>> Option A
>>>
>>> Client: Hey server, send me the highest bitrate stream of 720p or below
>>> that you can for the resource ABC123
>>>
>>> Server: I must previously have received the manifest describing what
>>> streams are available for ABC123, parsed it, stored it. So I look up the
>>> streams, filter out those > 720p and select the highest bitrate from the
>>> remainder.
>>>
>>>
>>>
>>> It would be a more scalable design if the relay only had to maintain
>>> state about the WebTransport connections. This can be accomplished by the
>>> subscriber providing the list of qualifying subscription identifiers, along
>>> with their target bitrates, and then asking the sender (which is a relay)
>>> to pick one.
>>>
>>>
>>>
>>> Option B
>>>
>>> Client, Hey relay, from this list of stream IDs and bitrates , send me
>>> the highest bitrate appropriate for my connection
>>> [“123”:1Mbps,”456”:2Mbps,”789”:5Mbps].
>>>
>>> Relay: I see your throughput is 3Mbps so I’m sending you stream 456.
>>>
>>>
>>>
>>>
>>>
>>> Option B is easier to scale. The relay doesn’t need to know that “456”
>>> is part of ABC123. I can ask any relay for this content, even if it has
>>> never seen resource ABC123 before. It allows a lot of flexibility in how
>>> the content is described/referenced by delegating the composition of the
>>> resource to the publishers and subscribers and having relays respond to a
>>> very simple and low level set of forwarding instructions.
>>>
>>>
>>>
>>> Cheers
>>>
>>> Will
>>>
>>>
>>>
>>>
>>>
>>> *From: *Luke Curley <kixelated@gmail.com>
>>> *Date: *Wednesday, November 30, 2022 at 10:03 AM
>>> *To: *Christian Huitema <huitema@huitema.net>
>>> *Cc: *MOQ Mailing List <moq@ietf.org>
>>> *Subject: *Re: [Moq] Data model and MoQ streams
>>>
>>>
>>>
>>> I like the summary; no disagreements here.
>>>
>>>
>>>
>>> I think any confusion has been caused by loose terminology and loose
>>> requirements. I'm going to take a stab at both but I don't really know what
>>> I'm doing or how to be most effective in IETF.
>>>
>>>
>>>
>>> The media bitrate needs to be adjusted in response to congestion. For
>>> 1:1 the encoder can change the encoded bitrate, but for 1:N we need a
>>> bitrate ladder.
>>>
>>>
>>>
>>> HLS/DASH works by letting the receiver choose the next rendition
>>> (audio+video track) to download based on decoder support and network
>>> conditions. Unfortunately, the receiver has very little information about
>>> congestion when media is delivered frame-by-frame (more info
>>> <https://github.com/kixelated/warp-draft/issues/44>). This is a
>>> fundamental problem with LL-DASH and Twitch's LHLS.
>>>
>>>
>>>
>>> The solution is to expose the congestion controller's estimated bitrate
>>> from the sender. This could be pushed periodically like a RTCP sender
>>> report but that has a delay, especially during congestion.
>>>
>>>
>>>
>>> I propose an alternative. In Warp, a session advertises subscribable
>>> tracks (aka media streams) that can have different content, encodings,
>>> bitrate, etc. There can be multiple active subscriptions and for each
>>> subscription, the receiver asks the sender for *one* track from a
>>> provided list. The sender uses the congestion controller's estimated
>>> bitrate, rounding down to choose the track. This sender-side ABR is
>>> extremely simple and has worked great in production.
>>>
>>>
>>>
>>> This gives the receiver the ability to control the desired experience
>>> while allowing it to delegate ABR responsibility. The receiver could
>>> request specific tracks using subscriptions with a list of size 1,
>>> implementing receiver-side ABR if they would like. Or the receiver could
>>> just tell the sender to choose a track from a subset (ex. only these
>>> tracks, which are below 720p). The sender only needs to know the maximum
>>> bitrate for each track.
>>>
>>>
>>>
>>> On Mon, Nov 28, 2022, 6:44 PM Christian Huitema <huitema@huitema.net>
>>> wrote:
>>>
>>> This email stems from an ongoing discussion of the "data model" used by
>>> MoQ on Slack. Slack is a great tool for rapid exchanges, but not every
>>> member of this list follows it. Also, it is not archived, which means
>>> that the exchanges will disappear after a few weeks. So, email. Lots of
>>> what follows is my personal take on the debate.
>>>
>>> The questions started with exchanges between Luke and Suhas about the
>>> names of variables used in protocol headers. These exchanges were made a
>>> bit harder because we don't have good agreement on the data model behind
>>> MoQ, including agreements on how to name what. Part of that is because
>>> different teams are working on different scenarios, such as streaming
>>> and real time, and also different network configurations, such as with
>>> relay or not.
>>>
>>> I think that we have some agreement about what MoQ shall do: enabling
>>> the transport of media streams. The client opens a QUIC connection using
>>> Web Transport and requests one or several media streams. The server
>>> sends the corresponding data, until the client somehow closes the media
>>> stream. That means we also have an agreement about what is out-of-scope:
>>> some communication scenarios require the orchestration of several media
>>> streams, such as multiple audio, video and other streams from multiple
>>> participants in a conference. I would expect applications doing that to
>>> open multiple MoQ streams, perhaps using multiple connections, and
>>> organizing the orchestration themselves. The "MoQ stream" would be the
>>> building block.
>>>
>>> We have a bit of a discussion on what the "MoQ stream" is. There is
>>> broad agreement on the general concept that the media is composed of a
>>> series of "objects", organized as series of groups (GOP). But then there
>>> are differences, because a given media stream (say, a video) can be
>>> encoded in multiple ways, say high, medium and low definition. The Warp
>>> draft calls these different "renditions" of the media stream.
>>>
>>> The differences are largely due to the way different teams plan to
>>> handle congestion. One way is to have the server decide. The media is
>>> sent as a series of GOP, each on its own QUIC stream. At the beginning
>>> of each GOP, the server looks at transmission conditions and decides
>>> what rendition to use for the next GOP. This is a very convenient way to
>>> manage congestion, but it imposes constraints: each rendition shall have
>>> the same notion of GOP, which is not obvious is for example the low def
>>> and high def codecs are operating in parallel. In that architecture,
>>> relays have to acquire all renditions of a MoQ stream, so they can do
>>> the real time adaptation. Real time clients also have to upload multiple
>>> renditions so relays can get them and adapt.
>>>
>>> Another way is to let the client decide. The client asks for a specific
>>> rendition, and the server provides exactly that. In case of congestion,
>>> the server drops some data to fit into the available bandwidth. The
>>> client notices that, closes the current stream, and opens a new MoQ
>>> stream with a lower definition. Adaptation takes a bit longer than in
>>> the previous scenario, but there is no requirement to synchronize GOP
>>> boundaries across different algorithms. The relay management is also a
>>> bit simpler.
>>>
>>> Then there are mixed scenarios. The client might ask for both low def
>>> and high def, display high def as long as it receives it correctly,
>>> switch to low def if high def stutters. The server would send GOP for
>>> low def and high def in parallel, using a higher priority for low def.
>>> Relays could use similar strategies, asking for all available renditions.
>>>
>>> I think these positions are not as far apart as it seems. They lead to
>>> exposing the "rendition" property prominently in the protocol. (I hope
>>> we can find a way to do that in a manner independent of media and
>>> codec.) This would lead to something like:
>>>
>>> * client requests a media (by name) and specifies the renditions that it
>>> is willing to receive. Server responds with some kind of accept message.
>>> * server transmits the media as a series of GOP, which each GOP starting
>>> with identification of which media stream and what rendition this is.
>>> Servers may send several GOP renditions in parallel, each using its own
>>> QUIC stream, with appropriate priorities. Servers chose what to send
>>> based on client references and network conditions.
>>> * relays act as client vis a vis the origin or the upstream relay, act
>>> as server for the clients or the downstream relays, typically request
>>> multiple renditions in parallel.
>>>
>>> There are things to iron out. Media names are typically long URIs. We
>>> would want a short identifier or "media ID" in the QUIC stream headers.
>>> The mapping from media name to media ID could be negotiated as part of
>>> the initial request/accept exchange. The GOP headers would have to carry
>>> a rendition ID -- or maybe we negotiate a unique ID for each valid
>>> combination of media and rendition, add complexity and save a few bits.
>>> (I could defend both sides of that argument.)
>>>
>>> OK. That message is already too long. But I hope it helps informing the
>>> WG and making progress.
>>>
>>> -- Christian Huitema
>>>
>>>
>>>
>>>
>>> --
>>> Moq mailing list
>>> Moq@ietf.org
>>> https://www.ietf.org/mailman/listinfo/moq
>>>
>>> --
>>> Moq mailing list
>>> Moq@ietf.org
>>> https://www.ietf.org/mailman/listinfo/moq
>>>
>>