Re: [Sframe] Partial decodability and IDUs

Sergio said:

"I am afraid that we end up another doing a per-codec packetizer, which
would be almost the same as just using current packetization formats and
explain how to use encrypted data with them. And all for a feature that we
are not current using in webrtc."

[BA] Some per-codec logic is required.  How else can the metadata be
obtained from the bitstream? But the question is whether it is *required*
to packetize IDUs separately, so they don't share fate.  I'd say this would
depend on the use case. If WebRTC didn't care enough about this to
implement it previously, why should SFrame make a difference?

On Thu, Nov 19, 2020 at 3:15 PM Sergio Garcia Murillo <
sergio.garcia.murillo@cosmosoftware.io> wrote:

> How to actually packetize the IDUs is not a big deal, you only need to
> have a start and end of IDU bits on the header and rely on increasing rtp
> cseq nums (i.e. no IDU interleaving), but at the end of the day someone
> will have to specify what an IDU in h264 means and what is its format (a
> group of nal units?).
>
> I am afraid that we end up another doing a per-codec packetizer, which
> would be almost the same as just using current packetization formats and
> explain how to use encrypted data with them. And all for a feature that we
> are not current using in webrtc.
>
> Best regards
>
> Sergio
>
>
> On 19/11/2020 23:54, Richard Barnes wrote:
>
> This may be too simplistic, but there's also a non-codec-specific approach
> here that occurs to me: Just have the SFrame header have a length field.
>
> I'm imagining a setup where we have the SFrame layer between the encode
> and packetization layers, and:
> - The encoding/decoding layer produces consumes each frame as a sequence
> of IDUs
> - The SFrame layer translates between IDUs and SFrame encrypted enits
> (SEUs? in any case, each SEU is an encrypted IDU)
> - The packetization / depacketization layer packs SEUs into packets
>
> The only thing you need to make that work is (1) a mechanism for the
> receiver to understand what chunks of the SEU sequence he has (e.g., fixing
> reordering), and (2) a way to unpack SEUs if there can be multiple in a
> packet.  It seems like (1) could mostly be a transport assumption.  For
> (2), you would just need something like a length field.
>
> As Sergio points out, there is a need for someone to know where the IDU
> boundaries are, either at the SFrame layer (if the input is a whole frame),
> or at the encode layer (if the encode-SFrame API can talk in terms of
> IDUs).  But especially in the latter case, this framework keeps the
> codec-specific stuff local to the encode layer, which is codec-specific in
> any case.
>
> There is some trade-off here, in that this framework doesn't expose any
> encoded information to the SFU.  But ISTM that if you want that
> functionality, then (a) you're going to have to have codec awareness
> sprinkled all through the stack and (b) you're going to have to be really
> careful designing the which-parts-to-encrypt scheme to avoid undermining
> your security guarantees.  So I'm generally inclined toward the cleaner
> abstraction here.
>
> --Richard
>
> On Thu, Nov 19, 2020 at 5:11 PM Sergio Garcia Murillo <
> sergio.garcia.murillo@cosmosoftware.io> wrote:
>
>> You are right regarding that the SFrame layer does not need to know what
>> is feed in for encryption, but in order to be able to have a working end to
>> end solution for webrtc, someone will need to define what and how this IDUs
>> are generated and reassembled for each codec if we want to have
>> interoperable implementations in different devices.
>>
>> That process is codec-dependant and I would require quite a lot of effort
>> (and also supporting it on the agnostic packetization), so I would prefer
>> to have strong arguments in favor of doing it.
>>
>>
>>
>> On 19/11/2020 22:53, Justin Uberti wrote:
>>
>> The encoder needs to be aware of any mechanism to generate IDUs (e.g.,
>> slices), and typically each of these IDUs will be handed up to the consumer
>> individually. So the SFRAME layer doesn't need to do any splitting, it just
>> knows that it should treat each IDU as something it needs to individually
>> SFRAME and packetize.
>>
>> On Thu, Nov 19, 2020 at 1:40 PM Sergio Garcia Murillo <
>> sergio.garcia.murillo@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>>
>>> As most of you already know, this morning I made a presentation in
>>> AVTCORE introducing the topic about the need to specify an agnostic video
>>> codec packetization format.
>>>
>>>
>>> https://datatracker.ietf.org/meeting/109/materials/slides-109-avtcore-sframe-rtp-encapsulation-00
>>>
>>>
>>> I got an AP for creating an initial draft so it could be reviewed and
>>> accepted.
>>>
>>> However, there were two main concerns that we should address in this
>>> this group:
>>>
>>>    - Historically, avtcore has explicitly designed not to be payload
>>>    agnostic and  declined to standardized codec agnostic payload formats in
>>>    number of cases.  If that is to be changed, needs to be done deliberately.
>>>    - Need to define the "minimum decoding unit" or "independently
>>>    decodable unit", that SFrame will work with.
>>>
>>>
>>> Regarding the second one
>>>
>>>    - Full video frames (just use whatever is the encoder output)
>>>    - Spatial layer frames
>>>    - "independend decodable subframes" like h264 slices, vp8 partitions
>>>    or av1 tiles which allows partial decodability which is mainly aimed for
>>>    enhancing packet loss resilience.
>>>
>>>
>>> Spatial layer frames is the minimum we should target as if not it will
>>> just prevent SFUs for using SVC codecs. So the question is if we should go
>>> deeper and implement lower partitions of the frames or not.
>>>
>>>
>>> AFAIK, currently, libwertc does not support partial decodability and I
>>> personally haven't seen any practical usage of this in the RTC world (while
>>> it makes a lot of sense in streaming/broadcasting world), but would like to
>>> hear what is the view and experience of the other members of this group.
>>> Also note that if we are going to support them on SFrame this will require
>>> a greater effort because we will need to explicitly define how the frames
>>> must be split before being encrypted y SFrame for *each* possible video
>>> codec (h264,h265,vp8,vp9,av1,...).
>>>
>>>
>>> There was also the question about how/if we should support other codec
>>> features like DON/interleaved mode for h264, which I also think we should
>>> not support mainly because we are not currently using it on webrtc
>>> implementations.
>>>
>>>
>>> What do you think?
>>>
>>>
>>> Best regards
>>>
>>> Sergio
>>>
>>>
>>> --
>>> Sframe mailing list
>>> Sframe@ietf.org
>>> https://www.ietf.org/mailman/listinfo/sframe
>>>
>>
>> --
>> Sframe mailing list
>> Sframe@ietf.org
>> https://www.ietf.org/mailman/listinfo/sframe
>>
> --
> Sframe mailing list
> Sframe@ietf.org
> https://www.ietf.org/mailman/listinfo/sframe
>