[AVTCORE] Framemarking in video packets

worley@ariadne.com Sun, 07 August 2022 22:06 UTC

Return-Path: <worley@alum.mit.edu>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id D92A6C13CCC5 for <avt@ietfa.amsl.com>; Sun, 7 Aug 2022 15:06:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.992
X-Spam-Status: No, score=-5.992 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.248, RCVD_IN_DNSWL_HI=-5, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcastmailservice.net
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id amp-ptYSGsLY for <avt@ietfa.amsl.com>; Sun, 7 Aug 2022 15:06:18 -0700 (PDT)
Received: from resqmta-h1p-028591.sys.comcast.net (resqmta-h1p-028591.sys.comcast.net []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E4CF2C15C539 for <avt@ietf.org>; Sun, 7 Aug 2022 15:06:18 -0700 (PDT)
Received: from resomta-h1p-027913.sys.comcast.net ([]) by resqmta-h1p-028591.sys.comcast.net with ESMTP id Ko7xoU9Av2WGfKoNcoRTJe; Sun, 07 Aug 2022 22:04:16 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1659909856; bh=vkhgiyI+f2GjzTY5uVxg8z0FW3OxrVMkx2l19maHtGA=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID; b=l9tSQG2d1As4lktVUgcsCX1GdLBrAgpOoQYwz5q1fS2M4ekqpEYfNofWtaHpZGrHo 9nls8xWlkIwWdhdIyvSH5iCUIiAR5aNtu38pDC0qwnft/7DR9uuepYqBVWrAjDDTOM tvQgbgz9sl05mMiIn6RUpnxnf2iulYiTebAH7SDcW/ih78yIRJTilN0A5E3Ka5FRnH 8Np8pzNZl0/e3aGroN+ZwgDWtEfC139IYdCeIqulOLNevhYZrruhq9NTA7CqSgNG8n MDobJf85IqUmSgTEQBwD3nz98Ky+Eq4qI8LF+1bFWG3F/0HzGbSu6lXNzmyvHvw3bG ylWSgVlPbQD0Q==
Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4a00:430::7b07]) by resomta-h1p-027913.sys.comcast.net with ESMTPA id KoNZoUWz69ViGKoNboYMlN; Sun, 07 Aug 2022 22:04:16 +0000
X-Xfinity-VMeta: sc=0.00;st=legit
Received: from hobgoblin.ariadne.com (localhost []) by hobgoblin.ariadne.com (8.16.1/8.16.1) with ESMTPS id 277M4Ckb270713 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT) for <avt@ietf.org>; Sun, 7 Aug 2022 18:04:12 -0400
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.16.1/8.16.1/Submit) id 277M4CG9270710; Sun, 7 Aug 2022 18:04:12 -0400
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com (Dale R. Worley)
To: avt@ietf.org
Sender: worley@ariadne.com (Dale R. Worley)
Date: Sun, 07 Aug 2022 18:04:12 -0400
Message-ID: <87pmhbg17n.fsf@hobgoblin.ariadne.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/-sDXNbigBRKUEggSD08Zw6_pr1g>
Subject: [AVTCORE] Framemarking in video packets
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 07 Aug 2022 22:06:23 -0000

[Second attempt to send to AVT.]

I'm new at this stuff, so I though that I would work through a design
exercise about enabling routers to selectively drop packets in video
streams in an "optimized" way.  I don't know whether this is a
well-understood design space that I'm unaware of or not.  So if
there's a better version of this out there, please point me to it.

The concept is to define a design framework that incorporates the
draft-ietf-avtext-framemarking RTP header extension as the "canonical"
representation of the relevant information about each RTP packet, and
then see if I can map the processing of draft-dong-priority-rtp-packet
into the framework.

A.  A video packet is contained in an RTP packet which is contained in
a UDP packet (or other suitable transport protocol).

B.  The RTP packet contains the header extension of
draft-ietf-avtext-framemarking, describing the video packet, added by
the video stream source (and thus later stages of processing are
independent of the video codec).

C.  There is a subset of the information in the header extension which
is useful to "routers", that is, memory-less middleboxes.  This
appears to be the D (discardable) bit and the TID and LID
spatial/temporal layer identifiers.  The I (independent frame) bit is
generally derivable from the information in video frames, but it's not
clear that it can be usefully acted upon by stateless devices.

My reading of draft-dong-priority-rtp-packet is that all of the
information it uses to determine packet handling for any particular
video encoding is also carried in the framemarking header extension.
Though I may not be understanding all of the text correctly as there
are a lot of different layer ID numbers involved and the text assumes
some understanding of their underlying semantics.

D.  This information subset is revealed to the routers through a
mechanism that routers can easily act upon.

Alternative D1.  "draft-dong-priority-rtp-packet NRI alternative" The
two-bit NRI value from the video packet is mapped into one of two sets
of four DSCP AF values, depending on whether the video is
"interactive" or "non-interactive".  However, the NRI value is not
fully reflected into the framemarking extension; the D bit indicates
whether NRI is 0 or non-0; other NRI values are not distinguished.

Alternative D2.  "draft-dong-priority-rtp-packet TID alternative" The
three-bit TID value is mapped into one of two sets of eight DSCP AF
values, depending on whether the video is "interactive" or
"non-interactive".  TID is directly present in the framemarking

Alternative D3.  Tag the packets with the same DSCP value, but add an
additional RTP header extension to make it easy for routers to
identify and find the framemarking header extension: define and use it
in such a way as to present a fixed four-octet value on an aligned
boundary at a predictable offset within the IP payload, immediately
preceding the framemarking header extension.

There are difficulties with D1 and D2.  In order to avoid packet
reordering within a stream when using the "default" interpretation of
the DSCP values (RFC 4594), the stream's packets can only use one of
the AF groups of three DSCP values.  This is nowhere near enough to
encode the information that could be available about the video frames.
Even if the entire DSCP code point space is used, it might be
inadequate for the spatial/temporal hierarchies in a scalable video

D3 requires that routers can (1) determine whether a packet contains
an RTP packet containing a framemarking header extension, and (2)
locate the extension.  Unfortunately, RTP packet decoding is usually
triggered by the UDP destination port, but the ports are dynamically
assigned by the endpoints, so the routers can't search for a single
value.  Similarly, the RTP header extensions are identified by an ID
value, but the IDs are dynamically assigned by the signaling for each
flow.  The framemarking extension itself doesn't contain any long
fields of fixed contents.

The only way to locate the framemarking extension that comes to mind
is to define an additional extension containing fixed contents that
would allow the router to (with high probability) detect the presence
of framemarking.  The overall packet format would be:

   | IP header                                                     |
   | ...                                                           |
   | UDP header                                                    |
   | ...                                                           |
   RTP header:
   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   |                           timestamp                           |
   |           synchronization source (SSRC) identifier            |
   |            contributing source (CSRC) identifiers             |
   |                             ....                              |
   For "one-byte" header extensions:
   |       0xBE    |    0xDE       |           length              |
   |flag ID| L=2   |     flag data                                 |
   | ID    | L     | framemarking data ...                         |
   For "two-byte" header extensions:
   |       0x100           |appbits|           length              |
   | flag ID       |     L=1       |     flag data                 |
   | ID            | L             | framemarking data ...         |

There would be a requirement that the signaling specifies that a fixed
flag ID maps to the flag extension via mapping it to a fixed URI.
Then a router could recognized framemarking via:

1. At the beginning of the UDP payload, the V field is 2, the X field
is 1.

2. The putative beginning of the header extensions is at offset

3. Either (a) the first two bytes of the extensions are 0xBE and 0xDE,
and bytes 4 through 7 are the flag ID, 2, and three bytes of fixed
flag data, or (b) the first 12 bits of the extension are 0x100, and
bytes 4 through 7 are the flag ID, 1, and two bytes of fixed flag

4. If those tests are passed, the framemarking extension begins at
offset 8 in the header extensions.

Unfortunately the CC field contains 4 bits and there are two styles of
headers, so this requires 32 "fixed position, fixed bit pattern" tests
to implement.  Since two-byte headers can express everything one-byte
headers express, this could be reduced to 16 texts by fixing that
two-byte headers must always be used.

This scheme adds 4 bytes to each RTP packet.

An alternative approach would be to replace the header extension magic
numbers (0xBEDE and 0x100) with different magic numbers that indicate
that the first header extension is framemarking, but that would have
upward compatibility problems.