Re: [AVTCORE] Framemarking in video packets

"shihang (C)" <> Mon, 08 August 2022 11:53 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 14413C157B36 for <>; Mon, 8 Aug 2022 04:53:36 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.906
X-Spam-Status: No, score=-1.906 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H2=-0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id qWtf2td9mUj9 for <>; Mon, 8 Aug 2022 04:53:34 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 5BB12C14CF11 for <>; Mon, 8 Aug 2022 04:53:34 -0700 (PDT)
Received: from (unknown []) by (SkyGuard) with ESMTP id 4M1ZH50HrQz67NJq for <>; Mon, 8 Aug 2022 19:49:05 +0800 (CST)
Received: from ( by ( with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 8 Aug 2022 13:53:30 +0200
Received: from ( by ( with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.24; Mon, 8 Aug 2022 19:53:28 +0800
Received: from ([]) by ([]) with mapi id 15.01.2375.024; Mon, 8 Aug 2022 19:53:28 +0800
From: "shihang (C)" <>
To: "Dale R. Worley" <>, "" <>
Thread-Topic: [AVTCORE] Framemarking in video packets
Thread-Index: AQHYqqoNX+LnqAe11UC+IZCff4s4t62kVNKQ
Date: Mon, 8 Aug 2022 11:53:28 +0000
Message-ID: <>
References: <>
In-Reply-To: <>
Accept-Language: zh-CN, en-US
Content-Language: zh-CN
x-originating-ip: []
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-CFilter-Loop: Reflected
Archived-At: <>
Subject: Re: [AVTCORE] Framemarking in video packets
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Mon, 08 Aug 2022 11:53:36 -0000

Hi Dale,
Are you suggesting that we modify the RFC 8285 to reserve a global ID for framemarking extension? By using the fixed global ID, you remove the signaling process when using the framemarking extension, right? 

Best regards,
Hang Shi

-----Original Message-----
From: avt <> On Behalf Of Dale R. Worley
Sent: Monday, August 8, 2022 6:04 AM
Subject: [AVTCORE] Framemarking in video packets

[Second attempt to send to AVT.]

I'm new at this stuff, so I though that I would work through a design exercise about enabling routers to selectively drop packets in video streams in an "optimized" way.  I don't know whether this is a well-understood design space that I'm unaware of or not.  So if there's a better version of this out there, please point me to it.

The concept is to define a design framework that incorporates the draft-ietf-avtext-framemarking RTP header extension as the "canonical"
representation of the relevant information about each RTP packet, and then see if I can map the processing of draft-dong-priority-rtp-packet into the framework.

A.  A video packet is contained in an RTP packet which is contained in a UDP packet (or other suitable transport protocol).

B.  The RTP packet contains the header extension of draft-ietf-avtext-framemarking, describing the video packet, added by the video stream source (and thus later stages of processing are independent of the video codec).

C.  There is a subset of the information in the header extension which is useful to "routers", that is, memory-less middleboxes.  This appears to be the D (discardable) bit and the TID and LID spatial/temporal layer identifiers.  The I (independent frame) bit is generally derivable from the information in video frames, but it's not clear that it can be usefully acted upon by stateless devices.

My reading of draft-dong-priority-rtp-packet is that all of the information it uses to determine packet handling for any particular video encoding is also carried in the framemarking header extension.
Though I may not be understanding all of the text correctly as there are a lot of different layer ID numbers involved and the text assumes some understanding of their underlying semantics.

D.  This information subset is revealed to the routers through a mechanism that routers can easily act upon.

Alternative D1.  "draft-dong-priority-rtp-packet NRI alternative" The two-bit NRI value from the video packet is mapped into one of two sets of four DSCP AF values, depending on whether the video is "interactive" or "non-interactive".  However, the NRI value is not fully reflected into the framemarking extension; the D bit indicates whether NRI is 0 or non-0; other NRI values are not distinguished.

Alternative D2.  "draft-dong-priority-rtp-packet TID alternative" The three-bit TID value is mapped into one of two sets of eight DSCP AF values, depending on whether the video is "interactive" or "non-interactive".  TID is directly present in the framemarking extension.

Alternative D3.  Tag the packets with the same DSCP value, but add an additional RTP header extension to make it easy for routers to identify and find the framemarking header extension: define and use it in such a way as to present a fixed four-octet value on an aligned boundary at a predictable offset within the IP payload, immediately preceding the framemarking header extension.

There are difficulties with D1 and D2.  In order to avoid packet reordering within a stream when using the "default" interpretation of the DSCP values (RFC 4594), the stream's packets can only use one of the AF groups of three DSCP values.  This is nowhere near enough to encode the information that could be available about the video frames.
Even if the entire DSCP code point space is used, it might be inadequate for the spatial/temporal hierarchies in a scalable video encoding.

D3 requires that routers can (1) determine whether a packet contains an RTP packet containing a framemarking header extension, and (2) locate the extension.  Unfortunately, RTP packet decoding is usually triggered by the UDP destination port, but the ports are dynamically assigned by the endpoints, so the routers can't search for a single value.  Similarly, the RTP header extensions are identified by an ID value, but the IDs are dynamically assigned by the signaling for each flow.  The framemarking extension itself doesn't contain any long fields of fixed contents.

The only way to locate the framemarking extension that comes to mind is to define an additional extension containing fixed contents that would allow the router to (with high probability) detect the presence of framemarking.  The overall packet format would be:

   | IP header                                                     |
   | ...                                                           |
   | UDP header                                                    |
   | ...                                                           |
   RTP header:
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   |                           timestamp                           |
   |           synchronization source (SSRC) identifier            |
   |            contributing source (CSRC) identifiers             |
   |                             ....                              |
   For "one-byte" header extensions:
   |       0xBE    |    0xDE       |           length              |
   |flag ID| L=2   |     flag data                                 |
   | ID    | L     | framemarking data ...                         |
   For "two-byte" header extensions:
   |       0x100           |appbits|           length              |
   | flag ID       |     L=1       |     flag data                 |
   | ID            | L             | framemarking data ...         |

There would be a requirement that the signaling specifies that a fixed flag ID maps to the flag extension via mapping it to a fixed URI.
Then a router could recognized framemarking via:

1. At the beginning of the UDP payload, the V field is 2, the X field is 1.

2. The putative beginning of the header extensions is at offset 4*(3+CC).

3. Either (a) the first two bytes of the extensions are 0xBE and 0xDE, and bytes 4 through 7 are the flag ID, 2, and three bytes of fixed flag data, or (b) the first 12 bits of the extension are 0x100, and bytes 4 through 7 are the flag ID, 1, and two bytes of fixed flag data.

4. If those tests are passed, the framemarking extension begins at offset 8 in the header extensions.

Unfortunately the CC field contains 4 bits and there are two styles of headers, so this requires 32 "fixed position, fixed bit pattern" tests to implement.  Since two-byte headers can express everything one-byte headers express, this could be reduced to 16 texts by fixing that two-byte headers must always be used.

This scheme adds 4 bytes to each RTP packet.

An alternative approach would be to replace the header extension magic numbers (0xBEDE and 0x100) with different magic numbers that indicate that the first header extension is framemarking, but that would have upward compatibility problems.


Audio/Video Transport Core Maintenance