Re: [AVTCORE] Framemarking in video packets

Harald Alvestrand <harald@alvestrand.no> Tue, 16 August 2022 06:27 UTC

Return-Path: <harald@alvestrand.no>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id BDD84C15948F for <avt@ietfa.amsl.com>; Mon, 15 Aug 2022 23:27:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.907
X-Spam-Level:
X-Spam-Status: No, score=-1.907 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NICE_REPLY_A=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OMzRqXiJugvW for <avt@ietfa.amsl.com>; Mon, 15 Aug 2022 23:27:17 -0700 (PDT)
Received: from smtp.alvestrand.no (smtp.alvestrand.no [65.21.189.24]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D7223C14CE24 for <avt@ietf.org>; Mon, 15 Aug 2022 23:27:15 -0700 (PDT)
Received: from [192.168.3.110] (unknown [78.156.11.215]) by smtp.alvestrand.no (Postfix) with ESMTPSA id 15B9C49E89 for <avt@ietf.org>; Tue, 16 Aug 2022 08:27:13 +0200 (CEST)
Message-ID: <919014e0-5dc4-9296-8b15-644c54d53bb1@alvestrand.no>
Date: Tue, 16 Aug 2022 08:27:12 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0
Content-Language: en-US
To: avt@ietf.org
References: <87pmhbg17n.fsf@hobgoblin.ariadne.com>
From: Harald Alvestrand <harald@alvestrand.no>
In-Reply-To: <87pmhbg17n.fsf@hobgoblin.ariadne.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/jxLxEtlAT47r0hkBy8oMfSlfKS0>
Subject: Re: [AVTCORE] Framemarking in video packets
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Aug 2022 06:27:20 -0000

This only works if the header extension is unencrypted.
Most stuff on the Internet seems to be tending towards encrypted, 
including RTP headers (see "cryptex").

I'd think that the setting of markings on the packets needs to be done 
at the sender (by someone with access to the unencrypted packet) and 
carried in some envelope part of the packet.

(the ill-fated PLUS BOF shows just how controversial such proposals can 
be, unfortunately).


On 8/8/22 00:04, Dale R. Worley wrote:
> [Second attempt to send to AVT.]
> 
> I'm new at this stuff, so I though that I would work through a design
> exercise about enabling routers to selectively drop packets in video
> streams in an "optimized" way.  I don't know whether this is a
> well-understood design space that I'm unaware of or not.  So if
> there's a better version of this out there, please point me to it.
> 
> The concept is to define a design framework that incorporates the
> draft-ietf-avtext-framemarking RTP header extension as the "canonical"
> representation of the relevant information about each RTP packet, and
> then see if I can map the processing of draft-dong-priority-rtp-packet
> into the framework.
> 
> A.  A video packet is contained in an RTP packet which is contained in
> a UDP packet (or other suitable transport protocol).
> 
> B.  The RTP packet contains the header extension of
> draft-ietf-avtext-framemarking, describing the video packet, added by
> the video stream source (and thus later stages of processing are
> independent of the video codec).
> 
> C.  There is a subset of the information in the header extension which
> is useful to "routers", that is, memory-less middleboxes.  This
> appears to be the D (discardable) bit and the TID and LID
> spatial/temporal layer identifiers.  The I (independent frame) bit is
> generally derivable from the information in video frames, but it's not
> clear that it can be usefully acted upon by stateless devices.
> 
> My reading of draft-dong-priority-rtp-packet is that all of the
> information it uses to determine packet handling for any particular
> video encoding is also carried in the framemarking header extension.
> Though I may not be understanding all of the text correctly as there
> are a lot of different layer ID numbers involved and the text assumes
> some understanding of their underlying semantics.
> 
> D.  This information subset is revealed to the routers through a
> mechanism that routers can easily act upon.
> 
> Alternative D1.  "draft-dong-priority-rtp-packet NRI alternative" The
> two-bit NRI value from the video packet is mapped into one of two sets
> of four DSCP AF values, depending on whether the video is
> "interactive" or "non-interactive".  However, the NRI value is not
> fully reflected into the framemarking extension; the D bit indicates
> whether NRI is 0 or non-0; other NRI values are not distinguished.
> 
> Alternative D2.  "draft-dong-priority-rtp-packet TID alternative" The
> three-bit TID value is mapped into one of two sets of eight DSCP AF
> values, depending on whether the video is "interactive" or
> "non-interactive".  TID is directly present in the framemarking
> extension.
> 
> Alternative D3.  Tag the packets with the same DSCP value, but add an
> additional RTP header extension to make it easy for routers to
> identify and find the framemarking header extension: define and use it
> in such a way as to present a fixed four-octet value on an aligned
> boundary at a predictable offset within the IP payload, immediately
> preceding the framemarking header extension.
> 
> There are difficulties with D1 and D2.  In order to avoid packet
> reordering within a stream when using the "default" interpretation of
> the DSCP values (RFC 4594), the stream's packets can only use one of
> the AF groups of three DSCP values.  This is nowhere near enough to
> encode the information that could be available about the video frames.
> Even if the entire DSCP code point space is used, it might be
> inadequate for the spatial/temporal hierarchies in a scalable video
> encoding.
> 
> D3 requires that routers can (1) determine whether a packet contains
> an RTP packet containing a framemarking header extension, and (2)
> locate the extension.  Unfortunately, RTP packet decoding is usually
> triggered by the UDP destination port, but the ports are dynamically
> assigned by the endpoints, so the routers can't search for a single
> value.  Similarly, the RTP header extensions are identified by an ID
> value, but the IDs are dynamically assigned by the signaling for each
> flow.  The framemarking extension itself doesn't contain any long
> fields of fixed contents.
> 
> The only way to locate the framemarking extension that comes to mind
> is to define an additional extension containing fixed contents that
> would allow the router to (with high probability) detect the presence
> of framemarking.  The overall packet format would be:
> 
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     | IP header                                                     |
>     | ...                                                           |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     | UDP header                                                    |
>     | ...                                                           |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     RTP header:
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     |V=2|P|X|  CC   |M|     PT      |       sequence number         |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     |                           timestamp                           |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     |           synchronization source (SSRC) identifier            |
>     +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
>     |            contributing source (CSRC) identifiers             |
>     |                             ....                              |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     For "one-byte" header extensions:
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     |       0xBE    |    0xDE       |           length              |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     |flag ID| L=2   |     flag data                                 |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     | ID    | L     | framemarking data ...                         |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     For "two-byte" header extensions:
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     |       0x100           |appbits|           length              |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     | flag ID       |     L=1       |     flag data                 |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>     | ID            | L             | framemarking data ...         |
>     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> 
> There would be a requirement that the signaling specifies that a fixed
> flag ID maps to the flag extension via mapping it to a fixed URI.
> Then a router could recognized framemarking via:
> 
> 1. At the beginning of the UDP payload, the V field is 2, the X field
> is 1.
> 
> 2. The putative beginning of the header extensions is at offset
> 4*(3+CC).
> 
> 3. Either (a) the first two bytes of the extensions are 0xBE and 0xDE,
> and bytes 4 through 7 are the flag ID, 2, and three bytes of fixed
> flag data, or (b) the first 12 bits of the extension are 0x100, and
> bytes 4 through 7 are the flag ID, 1, and two bytes of fixed flag
> data.
> 
> 4. If those tests are passed, the framemarking extension begins at
> offset 8 in the header extensions.
> 
> Unfortunately the CC field contains 4 bits and there are two styles of
> headers, so this requires 32 "fixed position, fixed bit pattern" tests
> to implement.  Since two-byte headers can express everything one-byte
> headers express, this could be reduced to 16 texts by fixing that
> two-byte headers must always be used.
> 
> This scheme adds 4 bytes to each RTP packet.
> 
> An alternative approach would be to replace the header extension magic
> numbers (0xBEDE and 0x100) with different magic numbers that indicate
> that the first header extension is framemarking, but that would have
> upward compatibility problems.
> 
> Dale
> 
> _______________________________________________
> Audio/Video Transport Core Maintenance
> avt@ietf.org
> https://www.ietf.org/mailman/listinfo/avt