Re: [AVTCORE] I-D Action: draft-ietf-avtext-framemarking-08.txt

worley@ariadne.com (Dale R. Worley) Wed, 18 December 2019 03:57 UTC

Return-Path: <worley@alum.mit.edu>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C3AC112008F for <avt@ietfa.amsl.com>; Tue, 17 Dec 2019 19:57:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.684
X-Spam-Level:
X-Spam-Status: No, score=-1.684 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.25, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_SOFTFAIL=0.665] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=comcastmailservice.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p2aB_bRKA1Am for <avt@ietfa.amsl.com>; Tue, 17 Dec 2019 19:57:13 -0800 (PST)
Received: from resqmta-ch2-07v.sys.comcast.net (resqmta-ch2-07v.sys.comcast.net [IPv6:2001:558:fe21:29:69:252:207:39]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AF6F4120033 for <avt@ietf.org>; Tue, 17 Dec 2019 19:57:13 -0800 (PST)
Received: from resomta-ch2-05v.sys.comcast.net ([69.252.207.101]) by resqmta-ch2-07v.sys.comcast.net with ESMTP id hQBXieljBuVqohQSWiOcKx; Wed, 18 Dec 2019 03:57:12 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=comcastmailservice.net; s=20180828_2048; t=1576641432; bh=10wIzkNkRHQkggnz5sKjczkYDcFZMZbN5qvXAbnbOnc=; h=Received:Received:Received:Received:From:To:Subject:Date: Message-ID; b=jb9vsDfEE377h5r9WYuIYtDjmTI9AnToLfq67vdqMZOrTDKLev3rVVVOMfKST7zmo iAphXE592TMhZSK99pT98rB4cVgpcEmC/T65AOIC0bjlWvo5kMg5zEAL3UW8j0A0hA 5sSL/A4QzdkzDL98aqGpECUePC8hbFHCDIsLvjyukFDcSMyJS1PTPKlngVx0d2a7QE ydAb6IyTDgzyZepEm2bKGgTXqP6CNFh5lta6+gKUTbuB35P6OvktqGjCRGV3UJYwcp 7iX/h25/XDPwQmlUxynJ1fwBIo2ZyHafE64AaYBE57/SRous/NifoDv0rp2vkLon1Y HRBnKolDyYVug==
Received: from hobgoblin.ariadne.com ([IPv6:2601:192:4600:1e00:222:fbff:fe91:d396]) by resomta-ch2-05v.sys.comcast.net with ESMTPA id hQSVi7JT7TyT1hQSViLxFD; Wed, 18 Dec 2019 03:57:12 +0000
X-Xfinity-VMeta: sc=15.00;st=legit
Received: from hobgoblin.ariadne.com (hobgoblin.ariadne.com [127.0.0.1]) by hobgoblin.ariadne.com (8.14.7/8.14.7) with ESMTP id xBI3vAR5023911; Tue, 17 Dec 2019 22:57:10 -0500
Received: (from worley@localhost) by hobgoblin.ariadne.com (8.14.7/8.14.7/Submit) id xBI3v9Uu023906; Tue, 17 Dec 2019 22:57:09 -0500
X-Authentication-Warning: hobgoblin.ariadne.com: worley set sender to worley@alum.mit.edu using -f
From: worley@ariadne.com
To: worley@ariadne.com
Cc: mzanaty@cisco.com, magnus.westerlund@ericsson.com, draft-ietf-avtext-framemarking@ietf.org, avt@ietf.org
In-Reply-To: <87tv6abnue.fsf@hobgoblin.ariadne.com> (worley@ariadne.com)
Sender: worley@ariadne.com
Date: Tue, 17 Dec 2019 22:57:09 -0500
Message-ID: <87zhfq1e22.fsf@hobgoblin.ariadne.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/wa1lMDZnHK7mHc_GPKWO7y6W1Ho>
Subject: Re: [AVTCORE] I-D Action: draft-ietf-avtext-framemarking-08.txt
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Dec 2019 03:57:15 -0000

In attempt to clarify a particular point:

The whole point of draft-ietf-avtext-framemarking is to provide
information to packet-handling devices on how to manipulate streams of
RTP packets containing encoded video, even when the device cannot
understand the payload of the RTP packets, either because they are
encrypted or they are in a video format that the device does not
understand.

Three typical operations are:
1) Routers dropping packets due to congestion, trying to determine the
least "costly" packets to drop.
2) Routers trying to "shape" the bandwidth demand of a video stream by
removing one or more highest-resolution layers from the video encoding.
3) RTP switches wanting to splice from one video stream to another,
looking for an "efficient" place to switch to the new stream.

In order for draft-ietf-avtext-framemarking to work well, the
significance of the extension data must be well-defined, so devices know
what the extension data tells about the RTP packets.  Inevitably, this
means that the extension data is interpreted within a model about the
packets and how they are related.

The fundamental structure seems to be the "frame-in-layer", the set of
packets that have the same SSRC, RTP timestamp, LID, and TID values.  A
frame-in-layer is assumed to encode a particular image at a particular
temporal and spatial resolution.

The remaining extension data seems to always encode dependencies between
frames-in-layer, that is, if the receiver is to successfully decode one
particular frame-in-layer, it needs all or most of the packets in
certain other frames-in-layer.

E.g., the D bit says that this frame-in-layer is not depended upon by
any other frame-in-layer.  The implication is that dropping these
packets is "low cost" compared to dropping packets within a
frame-in-layere that is depended on by another frame-in-layer.

There is an implication that if one frame-in-layer has the same SSRC,
same timestamp, no higher LID, and no higher TID than another
frame-in-layer, then the latter frame-in-layer depends on the former
frame-in-layer.

Within this framework, the algorithm for applying the extension to any
particular video encoding attempts to capture the actual dependency
structure of the video packets within the model that the extension data
can express.  There can be two sorts of mismatch:  "false positive",
where the extensions express a dependency not present in the video
encoding, and "false negative", where the extension do not express a
dependency which is present in the video encoding.  The extreme case is
"complex, irregular scalability structures that do not conform to
common, fixed patterns of inter-layer dependencies and referencing
structures."  In that case, using TID and LID is likely to not be
beneficial, and the extension data will tend to express a lot of "false
positive" dependencies.

What I'm pushing for is that all of this machinery be clearly stated,
especially exactly what dependencies are signaled by the extension data.
If those are left to the common intuitive understanding, we're likely to
have a lot of edge cases implemented differently by different devices,
leading to poor user experience (although probably not outright
non-interoperability).

Dale