Re: [Sframe] Partial decodability and IDUs

This may be too simplistic, but there's also a non-codec-specific approach
here that occurs to me: Just have the SFrame header have a length field.

I'm imagining a setup where we have the SFrame layer between the encode and
packetization layers, and:
- The encoding/decoding layer produces consumes each frame as a sequence of
IDUs
- The SFrame layer translates between IDUs and SFrame encrypted enits
(SEUs? in any case, each SEU is an encrypted IDU)
- The packetization / depacketization layer packs SEUs into packets

The only thing you need to make that work is (1) a mechanism for the
receiver to understand what chunks of the SEU sequence he has (e.g., fixing
reordering), and (2) a way to unpack SEUs if there can be multiple in a
packet.  It seems like (1) could mostly be a transport assumption.  For
(2), you would just need something like a length field.

As Sergio points out, there is a need for someone to know where the IDU
boundaries are, either at the SFrame layer (if the input is a whole frame),
or at the encode layer (if the encode-SFrame API can talk in terms of
IDUs).  But especially in the latter case, this framework keeps the
codec-specific stuff local to the encode layer, which is codec-specific in
any case.

There is some trade-off here, in that this framework doesn't expose any
encoded information to the SFU.  But ISTM that if you want that
functionality, then (a) you're going to have to have codec awareness
sprinkled all through the stack and (b) you're going to have to be really
careful designing the which-parts-to-encrypt scheme to avoid undermining
your security guarantees.  So I'm generally inclined toward the cleaner
abstraction here.

--Richard

On Thu, Nov 19, 2020 at 5:11 PM Sergio Garcia Murillo <
sergio.garcia.murillo@cosmosoftware.io> wrote:

> You are right regarding that the SFrame layer does not need to know what
> is feed in for encryption, but in order to be able to have a working end to
> end solution for webrtc, someone will need to define what and how this IDUs
> are generated and reassembled for each codec if we want to have
> interoperable implementations in different devices.
>
> That process is codec-dependant and I would require quite a lot of effort
> (and also supporting it on the agnostic packetization), so I would prefer
> to have strong arguments in favor of doing it.
>
>
>
> On 19/11/2020 22:53, Justin Uberti wrote:
>
> The encoder needs to be aware of any mechanism to generate IDUs (e.g.,
> slices), and typically each of these IDUs will be handed up to the consumer
> individually. So the SFRAME layer doesn't need to do any splitting, it just
> knows that it should treat each IDU as something it needs to individually
> SFRAME and packetize.
>
> On Thu, Nov 19, 2020 at 1:40 PM Sergio Garcia Murillo <
> sergio.garcia.murillo@gmail.com> wrote:
>
>> Hi all,
>>
>>
>> As most of you already know, this morning I made a presentation in
>> AVTCORE introducing the topic about the need to specify an agnostic video
>> codec packetization format.
>>
>>
>> https://datatracker.ietf.org/meeting/109/materials/slides-109-avtcore-sframe-rtp-encapsulation-00
>>
>>
>> I got an AP for creating an initial draft so it could be reviewed and
>> accepted.
>>
>> However, there were two main concerns that we should address in this this
>> group:
>>
>>    - Historically, avtcore has explicitly designed not to be payload
>>    agnostic and  declined to standardized codec agnostic payload formats in
>>    number of cases.  If that is to be changed, needs to be done deliberately.
>>    - Need to define the "minimum decoding unit" or "independently
>>    decodable unit", that SFrame will work with.
>>
>>
>> Regarding the second one
>>
>>    - Full video frames (just use whatever is the encoder output)
>>    - Spatial layer frames
>>    - "independend decodable subframes" like h264 slices, vp8 partitions
>>    or av1 tiles which allows partial decodability which is mainly aimed for
>>    enhancing packet loss resilience.
>>
>>
>> Spatial layer frames is the minimum we should target as if not it will
>> just prevent SFUs for using SVC codecs. So the question is if we should go
>> deeper and implement lower partitions of the frames or not.
>>
>>
>> AFAIK, currently, libwertc does not support partial decodability and I
>> personally haven't seen any practical usage of this in the RTC world (while
>> it makes a lot of sense in streaming/broadcasting world), but would like to
>> hear what is the view and experience of the other members of this group.
>> Also note that if we are going to support them on SFrame this will require
>> a greater effort because we will need to explicitly define how the frames
>> must be split before being encrypted y SFrame for *each* possible video
>> codec (h264,h265,vp8,vp9,av1,...).
>>
>>
>> There was also the question about how/if we should support other codec
>> features like DON/interleaved mode for h264, which I also think we should
>> not support mainly because we are not currently using it on webrtc
>> implementations.
>>
>>
>> What do you think?
>>
>>
>> Best regards
>>
>> Sergio
>>
>>
>> --
>> Sframe mailing list
>> Sframe@ietf.org
>> https://www.ietf.org/mailman/listinfo/sframe
>>
>
> --
> Sframe mailing list
> Sframe@ietf.org
> https://www.ietf.org/mailman/listinfo/sframe
>