Re: [Sframe] Partial decodability and IDUs

Richard Barnes <> Thu, 19 November 2020 22:55 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 925153A133E for <>; Thu, 19 Nov 2020 14:55:02 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.896
X-Spam-Status: No, score=-1.896 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id fBGN1i7s0dXW for <>; Thu, 19 Nov 2020 14:55:00 -0800 (PST)
Received: from ( [IPv6:2607:f8b0:4864:20::72f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 40DC23A133D for <>; Thu, 19 Nov 2020 14:54:59 -0800 (PST)
Received: by with SMTP id a13so7229790qkl.4 for <>; Thu, 19 Nov 2020 14:54:59 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=uHJLjOycj5tJMROqfDIntvDsuGTYRVwC7AbU7PwE72o=; b=oIix/dNDVoYY2mryth7HCczZO+PhQl54lI/Ws9pcLkXzuAiA1dnSoA6/wkQZL5dpoz i+YocEabxo+PgNQ7ffMnUc3ptAlurn6rfolDNDd38XWU7mow88tLFex9V6WOf/XV7ZL4 te210DYKWlJlgCJfUx/6hJu9WWsvTL+PBVJWMl+kfNK6+xxkbtvAGcoKjK29s+HwjNPz z6YHIneaK99RPfj+SMmU09UOOR9eOLbHuc3TQHVOW1sq+U+Uwexo5tH/Cva1MShefccP KvCs+qUZuH3gjOXph+GTxef0hoHfA/CiGb8zux7l9FZdOLaA/6f6o/l6w/ykDagwp7mz 6BiQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=uHJLjOycj5tJMROqfDIntvDsuGTYRVwC7AbU7PwE72o=; b=nr3s/4rAof20aJzYrC4turXhE5YDww8PqplTBuVq9MiwktfrxbpCCRYSfQvY+Wc3wl klr0KxoEhSo8zZUUEJz1JY6wqucERilWLvv1CIMaI86cS+3URfZTaUf5l/4cbg5PdDmr R8t7nUA4bwiEQbelorNImnnwnhCe0dfjNQei2lkjyfNCq9qHmBIFvUQ6oubJ75lgLwZI kTm7pYSvDU4nLmEw3grwnjZyWKXOmOrN0+7A6xgyXxhSwL6yLM76lLfxxjGBE0MEKnYX dohBBQ9BUops6vk944myukO30ZDCcqd3Kf1hRDmydBftxJFo0pzxi8WECO/oE2Pl+fhO gywA==
X-Gm-Message-State: AOAM532J9Iy+lhTWtWpAbP/U5lbhhQjcCWjC9mKRXcXCXRoUNGMlg2BK 5kto57nL1Jg89pCb4aY3wTZ/ajJqx747TBnuPeezcRicDy0=
X-Google-Smtp-Source: ABdhPJyO3uOXZ/Hy/+xqln27uhVoKso2jheNFYUk9pdncnzni/Y141Td3ti1diY32drqh1rMGypWowZNNxgaVSlWSUM=
X-Received: by 2002:a37:6143:: with SMTP id v64mr13502836qkb.490.1605826498777; Thu, 19 Nov 2020 14:54:58 -0800 (PST)
MIME-Version: 1.0
References: <> <> <>
In-Reply-To: <>
From: Richard Barnes <>
Date: Thu, 19 Nov 2020 17:54:46 -0500
Message-ID: <>
To: Sergio Garcia Murillo <>
Content-Type: multipart/alternative; boundary="000000000000087b3405b47d9ed2"
Archived-At: <>
Subject: Re: [Sframe] Partial decodability and IDUs
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 19 Nov 2020 22:55:03 -0000

This may be too simplistic, but there's also a non-codec-specific approach
here that occurs to me: Just have the SFrame header have a length field.

I'm imagining a setup where we have the SFrame layer between the encode and
packetization layers, and:
- The encoding/decoding layer produces consumes each frame as a sequence of
- The SFrame layer translates between IDUs and SFrame encrypted enits
(SEUs? in any case, each SEU is an encrypted IDU)
- The packetization / depacketization layer packs SEUs into packets

The only thing you need to make that work is (1) a mechanism for the
receiver to understand what chunks of the SEU sequence he has (e.g., fixing
reordering), and (2) a way to unpack SEUs if there can be multiple in a
packet.  It seems like (1) could mostly be a transport assumption.  For
(2), you would just need something like a length field.

As Sergio points out, there is a need for someone to know where the IDU
boundaries are, either at the SFrame layer (if the input is a whole frame),
or at the encode layer (if the encode-SFrame API can talk in terms of
IDUs).  But especially in the latter case, this framework keeps the
codec-specific stuff local to the encode layer, which is codec-specific in
any case.

There is some trade-off here, in that this framework doesn't expose any
encoded information to the SFU.  But ISTM that if you want that
functionality, then (a) you're going to have to have codec awareness
sprinkled all through the stack and (b) you're going to have to be really
careful designing the which-parts-to-encrypt scheme to avoid undermining
your security guarantees.  So I'm generally inclined toward the cleaner
abstraction here.


On Thu, Nov 19, 2020 at 5:11 PM Sergio Garcia Murillo <> wrote:

> You are right regarding that the SFrame layer does not need to know what
> is feed in for encryption, but in order to be able to have a working end to
> end solution for webrtc, someone will need to define what and how this IDUs
> are generated and reassembled for each codec if we want to have
> interoperable implementations in different devices.
> That process is codec-dependant and I would require quite a lot of effort
> (and also supporting it on the agnostic packetization), so I would prefer
> to have strong arguments in favor of doing it.
> On 19/11/2020 22:53, Justin Uberti wrote:
> The encoder needs to be aware of any mechanism to generate IDUs (e.g.,
> slices), and typically each of these IDUs will be handed up to the consumer
> individually. So the SFRAME layer doesn't need to do any splitting, it just
> knows that it should treat each IDU as something it needs to individually
> SFRAME and packetize.
> On Thu, Nov 19, 2020 at 1:40 PM Sergio Garcia Murillo <
>> wrote:
>> Hi all,
>> As most of you already know, this morning I made a presentation in
>> AVTCORE introducing the topic about the need to specify an agnostic video
>> codec packetization format.
>> I got an AP for creating an initial draft so it could be reviewed and
>> accepted.
>> However, there were two main concerns that we should address in this this
>> group:
>>    - Historically, avtcore has explicitly designed not to be payload
>>    agnostic and  declined to standardized codec agnostic payload formats in
>>    number of cases.  If that is to be changed, needs to be done deliberately.
>>    - Need to define the "minimum decoding unit" or "independently
>>    decodable unit", that SFrame will work with.
>> Regarding the second one
>>    - Full video frames (just use whatever is the encoder output)
>>    - Spatial layer frames
>>    - "independend decodable subframes" like h264 slices, vp8 partitions
>>    or av1 tiles which allows partial decodability which is mainly aimed for
>>    enhancing packet loss resilience.
>> Spatial layer frames is the minimum we should target as if not it will
>> just prevent SFUs for using SVC codecs. So the question is if we should go
>> deeper and implement lower partitions of the frames or not.
>> AFAIK, currently, libwertc does not support partial decodability and I
>> personally haven't seen any practical usage of this in the RTC world (while
>> it makes a lot of sense in streaming/broadcasting world), but would like to
>> hear what is the view and experience of the other members of this group.
>> Also note that if we are going to support them on SFrame this will require
>> a greater effort because we will need to explicitly define how the frames
>> must be split before being encrypted y SFrame for *each* possible video
>> codec (h264,h265,vp8,vp9,av1,...).
>> There was also the question about how/if we should support other codec
>> features like DON/interleaved mode for h264, which I also think we should
>> not support mainly because we are not currently using it on webrtc
>> implementations.
>> What do you think?
>> Best regards
>> Sergio
>> --
>> Sframe mailing list
> --
> Sframe mailing list