[dispatch] Framemarking in video packets
worley@ariadne.com Tue, 02 August 2022 14:18 UTC
Return-Path: <worley@alum.mit.edu>
X-Original-To: dispatch@ietfa.amsl.com
Delivered-To: dispatch@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 208D5C1907D9 for <dispatch@ietfa.amsl.com>; Tue, 2 Aug 2022 07:18:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.992
X-Spam-Level:
X-Spam-Status: No, score=-0.992 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.248, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcastmailservice.net
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JpMnOZh6J3N3 for <dispatch@ietfa.amsl.com>; Tue, 2 Aug 2022 07:17:55 -0700 (PDT)
Received: from resqmta-h1p-028590.sys.comcast.net (resqmta-h1p-028590.sys.comcast.net [96.102.200.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9E5EFC13C52D for <dispatch@ietf.org>; Tue, 2 Aug 2022 07:17:55 -0700 (PDT)
Received: from resomta-h1p-027918.sys.comcast.net ([96.102.179.204]) by resqmta-h1p-028590.sys.comcast.net with ESMTP id IrEro87tJgfn1IsgcodHT2; Tue, 02 Aug 2022 14:15:54 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20211018a; t=1659449754; bh=OUIwMsTvZVlOpGjEyQB/tQatUG+hwOvNjcdY7XyxuyQ=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID; b=D59bTAKcfpq7Az6iFzXhPP0B7NCmckhdudx0G/0UjzgryQgTIq98UL1SRaSzqndpx J6gKDDGRxt6y/wJGHnuSkLUhU8Lv5aL3JhWXaIFqjLmMJ8oUCLXWAZXCMmhI/5Ekev upJSXOwaucEEbkvgY1a57+oAYW6KLb2dYe39pWWeba+lly7QWUSb92vBOkSpl6LCc1 4FK5XQTxv1xytCGzTUtAEbuOCPmxIm/O32JdCI68UOHn5azd7gpWPF1kBO14oYbbx7 gBg6dMk0piOl2T2HN7N3WIb5752hw2nq/cH4BDqvUy3/ACWXync9y6AR9ISJllJRXo StZ+BuLhhvC4w==
Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4a00:430::d4c8]) by resomta-h1p-027918.sys.comcast.net with ESMTPA id IsgNof3J3URJkIsgPoEr9d; Tue, 02 Aug 2022 14:15:53 +0000
X-Xfinity-VMeta: sc=0.00;st=legit
Received: from hobgoblin.ariadne.com (localhost [127.0.0.1]) by hobgoblin.ariadne.com (8.16.1/8.16.1) with ESMTPS id 272EFdc13757290 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NOT); Tue, 2 Aug 2022 10:15:39 -0400
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.16.1/8.16.1/Submit) id 272EFdCT3757287; Tue, 2 Aug 2022 10:15:39 -0400
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com
To: dispatch@ietf.org, awt@ietf.org
Sender: worley@ariadne.com
Date: Tue, 02 Aug 2022 10:15:38 -0400
Message-ID: <878ro6afzp.fsf@hobgoblin.ariadne.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/dispatch/iIIfGhFPAbtd4TdhfPftdyKSowM>
Subject: [dispatch] Framemarking in video packets
X-BeenThere: dispatch@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: DISPATCH Working Group Mail List <dispatch.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dispatch>, <mailto:dispatch-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dispatch/>
List-Post: <mailto:dispatch@ietf.org>
List-Help: <mailto:dispatch-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dispatch>, <mailto:dispatch-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Aug 2022 14:18:01 -0000
I'm new at this stuff, so I though that I would work through a design exercise about enabling routers to selectively drop packets in video streams in an "optimized" way. I don't know whether this is a well-understood design space that I'm unaware of or not. So if there's a better version of this out there, please point me to it. The concept is to define a design framework that incorporates the draft-ietf-avtext-framemarking RTP header extension as the "canonical" representation of the relevant information about each RTP packet, and then see if I can map the processing of draft-dong-priority-rtp-packet into the framework. A. A video packet is contained in an RTP packet which is contained in a UDP packet (or other suitable transport protocol). B. The RTP packet contains the header extension of draft-ietf-avtext-framemarking, describing the video packet, added by the video stream source (and thus later stages of processing are independent of the video codec). C. There is a subset of the information in the header extension which is useful to "routers", that is, memory-less middleboxes. This appears to be the D (discardable) bit and the TID and LID spatial/temporal layer identifiers. The I (independent frame) bit is generally derivable from the information in video frames, but it's not clear that it can be usefully acted upon by stateless devices. My reading of draft-dong-priority-rtp-packet is that all of the information it uses to determine packet handling for any particular video encoding is also carried in the framemarking header extension. Though I may not be understanding all of the text correctly as there are a lot of different layer ID numbers involved and the text assumes some understanding of their underlying semantics. D. This information subset is revealed to the routers through a mechanism that routers can easily act upon. Alternative D1. "draft-dong-priority-rtp-packet NRI alternative" The two-bit NRI value from the video packet is mapped into one of two sets of four DSCP AF values, depending on whether the video is "interactive" or "non-interactive". However, the NRI value is not fully reflected into the framemarking extension; the D bit indicates whether NRI is 0 or non-0; other NRI values are not distinguished. Alternative D2. "draft-dong-priority-rtp-packet TID alternative" The three-bit TID value is mapped into one of two sets of eight DSCP AF values, depending on whether the video is "interactive" or "non-interactive". TID is directly present in the framemarking extension. Alternative D3. Tag the packets with the same DSCP value, but add an additional RTP header extension to make it easy for routers to identify and find the framemarking header extension: define and use it in such a way as to present a fixed four-octet value on an aligned boundary at a predictable offset within the IP payload, immediately preceding the framemarking header extension. There are difficulties with D1 and D2. In order to avoid packet reordering within a stream when using the "default" interpretation of the DSCP values (RFC 4594), the stream's packets can only use one of the AF groups of three DSCP values. This is nowhere near enough to encode the information that could be available about the video frames. Even if the entire DSCP code point space is used, it might be inadequate for the spatial/temporal hierarchies in a scalable video encoding. D3 requires that routers can (1) determine whether a packet contains an RTP packet containing a framemarking header extension, and (2) locate the extension. Unfortunately, RTP packet decoding is usually triggered by the UDP destination port, but the ports are dynamically assigned by the endpoints, so the routers can't search for a single value. Similarly, the RTP header extensions are identified by an ID value, but the IDs are dynamically assigned by the signaling for each flow. The framemarking extension itself doesn't contain any long fields of fixed contents. The only way to locate the framemarking extension that comes to mind is to define an additional extension containing fixed contents that would allow the router to (with high probability) detect the presence of framemarking. The overall packet format would be: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | IP header | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UDP header | | ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ RTP header: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ For "one-byte" header extensions: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0xBE | 0xDE | length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |flag ID| L=2 | flag data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ID | L | framemarking data ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ For "two-byte" header extensions: +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x100 |appbits| length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | flag ID | L=1 | flag data | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | ID | L | framemarking data ... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ There would be a requirement that the signaling specifies that a fixed flag ID maps to the flag extension via mapping it to a fixed URI. Then a router could recognized framemarking via: 1. At the beginning of the UDP payload, the V field is 2, the X field is 1. 2. The putative beginning of the header extensions is at offset 4*(3+CC). 3. Either (a) the first two bytes of the extensions are 0xBE and 0xDE, and bytes 4 through 7 are the flag ID, 2, and three bytes of fixed flag data, or (b) the first 12 bits of the extension are 0x100, and bytes 4 through 7 are the flag ID, 1, and two bytes of fixed flag data. 4. If those tests are passed, the framemarking extension begins at offset 8 in the header extensions. Unfortunately the CC field contains 4 bits and there are two styles of headers, so this requires 32 "fixed position, fixed bit pattern" tests to implement. Since two-byte headers can express everything one-byte headers express, this could be reduced to 16 texts by fixing that two-byte headers must always be used. This scheme adds 4 bytes to each RTP packet. An alternative approach would be to replace the header extension magic numbers (0xBEDE and 0x100) with different magic numbers that indicate that the first header extension is framemarking, but that would have upward compatibility problems. Dale