Re: [Sframe] Partial decodability and IDUs

Sergio Garcia Murillo <> Thu, 19 November 2020 23:15 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8E9223A0DEE for <>; Thu, 19 Nov 2020 15:15:34 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 7fkp7LU3wmSg for <>; Thu, 19 Nov 2020 15:15:32 -0800 (PST)
Received: from ( [IPv6:2a00:1450:4864:20::434]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 0B9BB3A0DEB for <>; Thu, 19 Nov 2020 15:15:31 -0800 (PST)
Received: by with SMTP id s8so8131630wrw.10 for <>; Thu, 19 Nov 2020 15:15:31 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language; bh=KGVYiuoEmNn5Imd6lMpehmCoCKDBARJgPsH95LgFkTs=; b=nXjyRx9KgMr2LNXqulTHaPau5ev/uSUhE/7EL/zqQJmLg8Z/ytERjkVAvkDpygOcPJ u/vDDrL2LYiNlEIECAjUb/NsSBAlyMwwwLT1l+uk1ejf8/LThudDR0llc9Ut7RHy06hC GJ43JmkWb+X1rHmBtKlMIOs1BHNj4VCaHwYUhkV7fkkLCr6HUEFicgkNtZARhnPaoYzs zaFg8XYLDVqp6oG8WC+aPZhWerWL1mL+3ThLosYjPUaWwZ01dleKQI/ftkJTd9to/2Yw oM3oafY+xbIWqoPySHL7J8vrD0OAUypOOoagOV0Xz8ui7KoJ4ttq1EN8sZP0Bv557hsO i+QQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=KGVYiuoEmNn5Imd6lMpehmCoCKDBARJgPsH95LgFkTs=; b=o7Yhnb+nr7H4ww3L1E5D/bjPav7HouPsdIuj5xaTI4cOXVhhlMmEqnFU7WsHtU+tzD 0nS/gFFltSoq/252Qdnkz23ZOZ4CB0k+74PsgdhfOHxJAGV9e75vDnOE8n3E+urBZg9b K58tM6CUqAjv5YGKXoGaZ8Dmil/PO5C6mXxacC9MJ8+UGt5EbZJ5fXclpPSbMkcVFYJE Gck0FVR4FFHPUBSYobx44QPrZbq6fOXhoOSDVIR1gRZVyLKd3M8uvUeMDLgpjeLfO+AK C4SCWYpq8pjzB8YHjmqF5kEjcI1YHjox7Z7HjtBDM/COWA5FAOc6kZdQG/wd2o8uDmH7 a+QQ==
X-Gm-Message-State: AOAM530e0eUBekS0qDWGHmXPNRxtpcunkvkPDEcdFjygtMSQtB2FHiv8 mfaK7X5dfKKerZghR+JUQ/85FvuvqXqRJA==
X-Google-Smtp-Source: ABdhPJxgYw05HRYu7ocKZUwyRGDnNTbPdnMYG3qjEkGWHg9tg+HDDi74PLfG9387dvjY8cgtvkJD5w==
X-Received: by 2002:a5d:6046:: with SMTP id j6mr12783089wrt.317.1605827730016; Thu, 19 Nov 2020 15:15:30 -0800 (PST)
Received: from [] ( []) by with ESMTPSA id n23sm2022986wmk.24.2020. (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 19 Nov 2020 15:15:29 -0800 (PST)
To: Richard Barnes <>
References: <> <> <> <>
From: Sergio Garcia Murillo <>
Message-ID: <>
Date: Fri, 20 Nov 2020 00:15:28 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.4.3
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: multipart/alternative; boundary="------------921F768AA26BAFA08AB9EF0B"
Content-Language: en-US
Archived-At: <>
Subject: Re: [Sframe] Partial decodability and IDUs
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 19 Nov 2020 23:15:34 -0000

How to actually packetize the IDUs is not a big deal, you only need to 
have a start and end of IDU bits on the header and rely on increasing 
rtp cseq nums (i.e. no IDU interleaving), but at the end of the day 
someone will have to specify what an IDU in h264 means and what is its 
format (a group of nal units?).

I am afraid that we end up another doing a per-codec packetizer, which 
would be almost the same as just using current packetization formats and 
explain how to use encrypted data with them. And all for a feature that 
we are not current using in webrtc.

Best regards


On 19/11/2020 23:54, Richard Barnes wrote:
> This may be too simplistic, but there's also a non-codec-specific 
> approach here that occurs to me: Just have the SFrame header have a 
> length field.
> I'm imagining a setup where we have the SFrame layer between the 
> encode and packetization layers, and:
> - The encoding/decoding layer produces consumes each frame as a 
> sequence of IDUs
> - The SFrame layer translates between IDUs and SFrame encrypted enits 
> (SEUs? in any case, each SEU is an encrypted IDU)
> - The packetization / depacketization layer packs SEUs into packets
> The only thing you need to make that work is (1) a mechanism for the 
> receiver to understand what chunks of the SEU sequence he has (e.g., 
> fixing reordering), and (2) a way to unpack SEUs if there can be 
> multiple in a packet.  It seems like (1) could mostly be a transport 
> assumption.  For (2), you would just need something like a length field.
> As Sergio points out, there is a need for someone to know where the 
> IDU boundaries are, either at the SFrame layer (if the input is a 
> whole frame), or at the encode layer (if the encode-SFrame API can 
> talk in terms of IDUs).  But especially in the latter case, this 
> framework keeps the codec-specific stuff local to the encode layer, 
> which is codec-specific in any case.
> There is some trade-off here, in that this framework doesn't expose 
> any encoded information to the SFU.  But ISTM that if you want that 
> functionality, then (a) you're going to have to have codec awareness 
> sprinkled all through the stack and (b) you're going to have to be 
> really careful designing the which-parts-to-encrypt scheme to avoid 
> undermining your security guarantees.  So I'm generally inclined 
> toward the cleaner abstraction here.
> --Richard
> On Thu, Nov 19, 2020 at 5:11 PM Sergio Garcia Murillo 
> < 
> <>> wrote:
>     You are right regarding that the SFrame layer does not need to
>     know what is feed in for encryption, but in order to be able to
>     have a working end to end solution for webrtc, someone will need
>     to define what and how this IDUs are generated and reassembled for
>     each codec if we want to have interoperable implementations in
>     different devices.
>     That process is codec-dependant and I would require quite a lot of
>     effort (and also supporting it on the agnostic packetization), so
>     I would prefer to have strong arguments in favor of doing it.
>     On 19/11/2020 22:53, Justin Uberti wrote:
>>     The encoder needs to be aware of any mechanism to generate IDUs
>>     (e.g., slices), and typically each of these IDUs will be handed
>>     up to the consumer individually. So the SFRAME layer doesn't need
>>     to do any splitting, it just knows that it should treat each IDU
>>     as something it needs to individually SFRAME and packetize.
>>     On Thu, Nov 19, 2020 at 1:40 PM Sergio Garcia Murillo
>>     <
>>     <>> wrote:
>>         Hi all,
>>         As most of you already know, this morning I made a
>>         presentation in AVTCORE introducing the topic about the need
>>         to specify an agnostic video codec packetization format.
>>         <>
>>         I got an AP for creating an initial draft so it could be
>>         reviewed and accepted.
>>         However, there were two main concerns that we should address
>>         in this this group:
>>           * Historically, avtcore has explicitly designed not to be
>>             payload agnostic and declined to standardized codec
>>             agnostic payload formats in  number of cases.  If that is
>>             to be changed, needs to be done deliberately.
>>           * Need to define the "minimum decoding unit" or
>>             "independently decodable unit", that SFrame will work with.
>>         Regarding the second one
>>           * Full video frames (just use whatever is the encoder output)
>>           * Spatial layer frames
>>           * "independend decodable subframes" like h264 slices, vp8
>>             partitions or av1 tiles which allows partial decodability
>>             which is mainly aimed for enhancing packet loss resilience.
>>         Spatial layer frames is the minimum we should target as if
>>         not it will just prevent SFUs for using SVC codecs. So the
>>         question is if we should go deeper and implement lower
>>         partitions of the frames or not.
>>         AFAIK, currently, libwertc does not support partial
>>         decodability and I personally haven't seen any practical
>>         usage of this in the RTC world (while it makes a lot of sense
>>         in streaming/broadcasting world), but would like to hear what
>>         is the view and experience of the other members of this
>>         group. Also note that if we are going to support them on
>>         SFrame this will require a greater effort because we will
>>         need to explicitly define how the frames must be split before
>>         being encrypted y SFrame for *each* possible video codec
>>         (h264,h265,vp8,vp9,av1,...).
>>         There was also the question about how/if we should support
>>         other codec features like DON/interleaved mode for h264,
>>         which I also think we should not support mainly because we
>>         are not currently using it on webrtc implementations.
>>         What do you think?
>>         Best regards
>>         Sergio
>>         -- 
>>         Sframe mailing list
>> <>
>>         <>
>     -- 
>     Sframe mailing list
> <>
>     <>