Re: [AVTCORE] I-D Action: draft-ietf-avtext-framemarking-08.txt

"Mo Zanaty (mzanaty)" <mzanaty@cisco.com> Thu, 21 November 2019 23:06 UTC

Return-Path: <mzanaty@cisco.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1DF691201EA; Thu, 21 Nov 2019 15:06:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -14.5
X-Spam-Level:
X-Spam-Status: No, score=-14.5 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cisco.com header.b=EcpEG7fw; dkim=pass (1024-bit key) header.d=cisco.onmicrosoft.com header.b=U11K1ezH
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qyniHeIACTDv; Thu, 21 Nov 2019 15:06:34 -0800 (PST)
Received: from alln-iport-3.cisco.com (alln-iport-3.cisco.com [173.37.142.90]) (using TLSv1.2 with cipher DHE-RSA-SEED-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2DC271201DE; Thu, 21 Nov 2019 15:06:34 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=cisco.com; i=@cisco.com; l=15039; q=dns/txt; s=iport; t=1574377594; x=1575587194; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-id:content-transfer-encoding: mime-version; bh=x0wAV4STg8FrbVwWe6V/MKsxV3ZemmbFT17rJRHryzk=; b=EcpEG7fw4oEI4+sIeu1n+jwJF3+ZZBTHgItJ4do3415S0wHuq7EH/Sng RkTJ/Flh9w3JiPbSj73ttYsuPOTyrA7MZqgvuHhLHqqzhtsVL199XguQ9 9MwGI3/vaC+rO+r+Mqdssp8RUz7Aj5N1IXBOANhq4aEvcVwosZmZ1pZfM o=;
X-IPAS-Result: A0CUBQDeF9dd/5JdJa1kHAEBAQEBBwEBEQEEBAEBgX6BS1AFbFggBAsqCodmA4psmmCBQoEQA1QJAQEBDAEBIwoCAQGEQAKCKCQ4EwIDAQEBAwIDAgEBBAEBAQIBBQRthTcMhVICAQMSKAYBASkIBgEPAgEINhAyJQIEDgUigwGCRgMtAQEOoxoCgTiIYIIngn4BAQWBOAKDUhiCFwMGgTaMFhqBQD+BEYIUfj6BBIFeAQEDgTgOGIVujRmgLW4KgiuHGolIhG4bgj4vhzuPcI4GggOGd5FUAgQCBAUCDgEBBYFpIoFYcBWDJ1ARFIZIDAMJC4NQhRSFP3SBKI1egTABgQ4BAQ
IronPort-PHdr: 9a23:1vM5FBAk2/1aRSYcEtUiUyQJPHJ1sqjoPgMT9pssgq5PdaLm5Zn5IUjD/qs13kTRU9Dd7PRJw6rNvqbsVHZIwK7JsWtKMfkuHwQAld1QmgUhBMCfDkiuKezjaSUmDexJVURu+DewNk0GUMs=
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-AV: E=Sophos;i="5.69,113,1571702400"; d="scan'208";a="385668293"
Received: from rcdn-core-10.cisco.com ([173.37.93.146]) by alln-iport-3.cisco.com with ESMTP/TLS/DHE-RSA-SEED-SHA; 21 Nov 2019 23:06:33 +0000
Received: from XCH-RCD-009.cisco.com (xch-rcd-009.cisco.com [173.37.102.19]) by rcdn-core-10.cisco.com (8.15.2/8.15.2) with ESMTPS id xALN6XhA022398 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=FAIL); Thu, 21 Nov 2019 23:06:33 GMT
Received: from xhs-aln-002.cisco.com (173.37.135.119) by XCH-RCD-009.cisco.com (173.37.102.19) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 17:06:32 -0600
Received: from xhs-aln-002.cisco.com (173.37.135.119) by xhs-aln-002.cisco.com (173.37.135.119) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Thu, 21 Nov 2019 17:06:31 -0600
Received: from NAM04-CO1-obe.outbound.protection.outlook.com (173.37.151.57) by xhs-aln-002.cisco.com (173.37.135.119) with Microsoft SMTP Server (TLS) id 15.0.1473.3 via Frontend Transport; Thu, 21 Nov 2019 17:06:31 -0600
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iX8sW3KWW0vDxx8YtcElwdE1LEVhIE+gr62YWUufNXoOyElyw8oVFjQlSTcPZXyg84Kuxg3V7fpitcEgwCigzOLK4UR/xB86lGDheA4sbPcgwGHvj8pRHRIHV3LWNZPwcMw73gnPRKrbCo5lw60X0RgRzVj/VW1US6INw9mMsV6jBK62hsU69M/uVYn8B69oxQdu4cdJE9tzR19lv8PFty1TchT+0OeLoUp6QM+M4l156PiJudf8lkMqCKgmDQ4so2SuvQnlKL6kA7HJR0x6jKBNPmVndEw3yZCB4vq6wdwUu0OvmuAmDYzap0XKH9RaPmxxUabPXgic277j3BqXuQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yFoRHnqwu+LhhsHonY3FaUSnWZVtrbYQc0z3ek9h5aM=; b=bywdvjhcdJr4fLIw1fRkyXjfB3sYV7lpAXHEa8h/c56WcGiR8YJ5IKRN6KsmqNKaPMK9P6GxP7ZqCJg//TiZreGQB0Q7ufupMN5PdoQZsZ7MdS26zPpdAh+gYoFeP78a2dVl8USfZqWpmHLWPp60mcD0SsjRBPESqYrdFsmQDzwVvnTsT6I90EfKZXSuVOKwNa7pposK/p2WcXylIfzLKHZyB+IhmEDyPuAlzni5RzLIRuTgj1nuh0I7LIdw8B1XlkU9UdwCsx+GTpw+fpfoFALYF+nwS7vqqkT04O3+ZWq0CBSmPsC4//YBDYjtKWzG6GQGLDss36xhpgFsRHmEag==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=cisco.com; dmarc=pass action=none header.from=cisco.com; dkim=pass header.d=cisco.com; arc=none
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cisco.onmicrosoft.com; s=selector2-cisco-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=yFoRHnqwu+LhhsHonY3FaUSnWZVtrbYQc0z3ek9h5aM=; b=U11K1ezH2K4gajG20AvbIdEkjqt7hEgCddSYFCn86a9zGERtCkZ4e5FHnSR3CNsebgjyP2Vw+61hj370djNyirfGgZRGyEbxEc1WiPeaZANt2NHcmON3EcWU34GHce7mJ5AosSaSgp+WHpkTfHLsXYql28FqVDgfo+Y8d2uHApU=
Received: from BN6PR11MB1617.namprd11.prod.outlook.com (10.172.23.142) by BN6PR11MB3986.namprd11.prod.outlook.com (10.255.128.96) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2451.29; Thu, 21 Nov 2019 23:06:29 +0000
Received: from BN6PR11MB1617.namprd11.prod.outlook.com ([fe80::9dea:fccf:da4c:d96e]) by BN6PR11MB1617.namprd11.prod.outlook.com ([fe80::9dea:fccf:da4c:d96e%11]) with mapi id 15.20.2474.019; Thu, 21 Nov 2019 23:06:29 +0000
From: "Mo Zanaty (mzanaty)" <mzanaty@cisco.com>
To: "Dale R. Worley" <worley@ariadne.com>
CC: "draft-ietf-avtext-framemarking@ietf.org" <draft-ietf-avtext-framemarking@ietf.org>, "avt@ietf.org" <avt@ietf.org>, Magnus Westerlund <magnus.westerlund@ericsson.com>
Thread-Topic: [AVTCORE] I-D Action: draft-ietf-avtext-framemarking-08.txt
Thread-Index: AQHVoMBOwnjeZuCL/UiEgD9DL5tsAw==
Date: Thu, 21 Nov 2019 23:06:29 +0000
Message-ID: <D9FC4088.91915%mzanaty@cisco.com>
References: <HE1PR0701MB2522C3B9D045627496CDE91A955A0@HE1PR0701MB2522.eurprd07.prod.outlook.com> <877ecbiatc.fsf@hobgoblin.ariadne.com>
In-Reply-To: <877ecbiatc.fsf@hobgoblin.ariadne.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.7.7.170905
authentication-results: spf=none (sender IP is ) smtp.mailfrom=mzanaty@cisco.com;
x-originating-ip: [103.98.134.60]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 4afaf52b-6662-48e4-9a5f-08d76ed77179
x-ms-traffictypediagnostic: BN6PR11MB3986:
x-ms-exchange-purlcount: 1
x-microsoft-antispam-prvs: <BN6PR11MB3986122BB7CE8C4610560867B44E0@BN6PR11MB3986.namprd11.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-forefront-prvs: 0228DDDDD7
x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(4636009)(366004)(136003)(346002)(376002)(396003)(39860400002)(199004)(189003)(6246003)(36756003)(2906002)(102836004)(66476007)(26005)(91956017)(6916009)(256004)(14444005)(71200400001)(71190400001)(305945005)(8676002)(966005)(99286004)(53546011)(2616005)(6506007)(229853002)(76116006)(11346002)(478600001)(66556008)(64756008)(66946007)(446003)(66446008)(6436002)(6486002)(81166006)(81156014)(58126008)(316002)(8936002)(25786009)(86362001)(186003)(3846002)(6116002)(6306002)(54906003)(6512007)(76176011)(30864003)(4326008)(5660300002)(7736002)(14454004)(66066001); DIR:OUT; SFP:1101; SCL:1; SRVR:BN6PR11MB3986; H:BN6PR11MB1617.namprd11.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: cisco.com does not designate permitted sender hosts)
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: dxd2QaNdnYwMXrlXqp1YiHA0npvVHoJoVFDMG3+cbMLvOUbT5zG42KwU+ct1lXp3V1bssWQXi10OSGxAal3drrQqx1Uu9YT1Z1lo2fZsRWD3cc3yxmKVnNSSwZVcmFMiTAVLjhkvjz/a4NREiAE9be42QrGSIxbP1nNk/IkY0QWdnKOym0T8496pfsUiblAqVqw7dolQ7yGB4e3e0tN+4r8Pk2zl0r3IvxSmM/GxK0VXTpDhocBtZ9LGNv+0rfwjtgpLRdiE2909Pgx8ZZiw4AeG+RNHZFowyDL9fVrfqHvZCN1WG/Dkzj9zh5db9vfgi7dML/HC3NHHMISLW6VxGyP89U1iyiBK1Cz9xZHs0oFYXcZ6SaVXEmAc/EnuXQdfFkpy1HgTuF4iWSqSXG56GwYHjGS3oQ4FZBDVuE/VQmIU0N54TZJ+uTYjLzZ730ER9NiSvKqgsSJ2yHzPXxzPv4HBohp/jXDi3ih9ILJNva0=
x-ms-exchange-transport-forked: True
Content-Type: text/plain; charset="us-ascii"
Content-ID: <831EA01B8E4CE344A46BB8E69CBEC271@namprd11.prod.outlook.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-MS-Exchange-CrossTenant-Network-Message-Id: 4afaf52b-6662-48e4-9a5f-08d76ed77179
X-MS-Exchange-CrossTenant-originalarrivaltime: 21 Nov 2019 23:06:29.3135 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 5ae1af62-9505-4097-a69a-c1553ef7840e
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: Ux+ELdGXjr+Lf5Cgqcgl6nfif+P+Lmiu4DRk+LzZbr3f0vuRtusz79hzpNewwPg1KgX+975GbmfqNK1qWChdfw==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN6PR11MB3986
X-OriginatorOrg: cisco.com
X-Outbound-SMTP-Client: 173.37.102.19, xch-rcd-009.cisco.com
X-Outbound-Node: rcdn-core-10.cisco.com
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/MF3J7Ydt2RxUMgNgoml_61_QDZw>
Subject: Re: [AVTCORE] I-D Action: draft-ietf-avtext-framemarking-08.txt
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 21 Nov 2019 23:06:37 -0000

Dale,

Thank you very much for the detailed review and comments.
Version -10 has updates for all your feedback below.

https://tools.ietf.org/html/draft-ietf-avtext-framemarking-10


See Mo: inline below for details of the update.
And please confirm the updates in -10 adequately resolve all your issues.
Finally, sincere apologies for not getting this update out sooner.

Thanks,
Mo

On 4/3/19, 7:32 AM, "Dale R. Worley" <worley@ariadne.com> wrote:

>I'm no expert on this field, but as the frame marking extension is
>intended to be used broadly over many different video encodings, I think
>it can be usefully critiqued relative to its ambition to be a
>*generalized* frame marking mechanism.  In particular, a number of its
>features seem to reference ideas which generally apply to multiple
>encodings.  But this leaves a great deal of room for lack of alignment
>as to the exact semantics of the features, which could easily lead to a
>lot of subtle interoperation problems.  So I am here pushing for a
>clearer definition of what is and is not meant by the features.
>
>There is some oddity in how the sections are structured.  The short form
>is defined in 3.1, and the long form is defined in 3.2.  The three
>mapping for specific codecs are listed in 3.2.1.1, 3.2.1.2, and 3.2.1.3,
>and 3.2.1.4.  It would be better to group the two definitional sections
>together and group the four example sections together.

Mo: This structure was intentional to show that Layer ID mappings only
apply to Scalable Streams.

Section 3.1 defines the Short Extension for Non-Scalable Streams, where no
Layer IDs apply.

Section 3.2 defines the Long Extension for Scalable Streams, which
contains Layer IDs.

Section 3.2.1 defines the Layer ID Mappings for Scalable Streams, with
subsections for specific codecs. Note these subsections are not merely
examples; they are normative / "definitional".

I added the following text to section 3.2.1 to help clarify this:

"This section maps the specific Layer ID information contained in specific
scalable codecs to the generic LID and TID fields. Note that non-scalable
streams have no Layer ID information and thus no mappings."

Does this grouping make sense now, given the clarifying text, or do you
still think it is odd?

>
>Also, 3.2.1.3 (H264 (AVC) LID Mapping) and 3.2.1.4 (VP8 LID Mapping)
>don't specify how the S, E, I, D, and B bits are determined from the
>codec's output packets.

Mo: This is specified in section 3.2.
>
>Regarding the multiple (four, actually) formats of the extension, it
>helps specifying them if they can all be mapped into the same semantic
>data structure.  For example,
>
>    TID is the temporal layer index.  It is implicitly 0 if the short
>    format is used.
>
>    LID is the (spacial) layer index.  It is implicitly 0 if the short
>    format is used or the L=1 form of the long format is used.
>
>    TL0PICIDX:  When TID is 0:  If present, it is a cyclic counter
>    labeling the frames.  If not present, the frames have no such labels.
>    When TID is not 0, it indicates that this frame in this layer
>    depends on the frame with this label in the layer with TID 0.

Mo: Correct. I added this wording to help clarify things in section 3.2.
>o  TID: Temporal ID (3 bits) - The base temporal layer starts with 0,
>      and increases with 1 for each higher temporal layer/sub-layer.  If
>      no scalability is used, this MUST be 0.  It is implicitly 0 in the
>      short extension format.
>   o  LID: Layer ID (8 bits) - Identifies the spatial and quality layer
>      encoded, starting with 0 and increasing with higher fidelity.  If
>      no scalability is used, this MUST be 0 or omitted to reduce
>      length.  When omitted, TL0PICIDX MUST also be omitted.  It is
>      implicitly 0 in the short extension format or when omitted in the
>      long extension format.
>o  TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - When TID is 0
>      and LID is 0, this is a cyclic counter labeling base layer frames.
>      When TID is not 0 or LID is not 0, this indicates a dependency on
>      the given index, such that this frame in this layer depends on the
>      frame with this label in the layer with TID 0 and LID 0.  If no
>      scalability is used, or the cyclic counter is unknown, this MUST
>      be omitted to reduce length.  Note that 0 is a valid index value
>      for TL0PICIDX.
>
>
>
>Notice that a missing TL0PICIDX has different semantics than a missing
>TID or LID.
>
>Given the similarity of "temporal layer index" and "layer ID", it seems
>like you want a more distinctive phrase for the latter.  Could it be
>changed to "spatial layer" or "resolution layer"?

Mo: Earlier codecs called this a "dependency layer", which could be a
spatial or quality layer. Later codecs called this simply a layer! (And
demoted temporal layers to "sublayers".) We should probably stay aligned
with the later codecs.
>
>There seems to be no way to signal whether the short form is used
>vs. the L=0 version of the long form (if B=0 and TID=0) -- The ID value
>for both is signaled in SDP by
>
>      a=extmap:3 urn:ietf:params:rtp-hdrext:framemarking

Mo: Correct.
>
>This doesn't cause a problem, as the semantics of the two alternatives
>are the same, but it prevents the 4 reserved bits in the short form from
>being defined in the future for any purpose other than B and LID. --
>Alternatively, is the short form simply what the extension is reduced to
>when using non-scalable streams, as those must necessarily have B=0 and
>LID=0?  (Also see my query below regarding the "default" value of B.)

Mo: Correct, except I think you meant TID=0 (not LID=0) here. I have
clarified that the 4 reserved bits are fixed to 0 and not for future
use/extension.

>It seems that the intention is that the video stream can be divided into
>substreams of RTP packets, called "layers", each of which is identified
>by a particular TID and LID, that is, TID/LID defines a
>*two-dimensional* hierarchy.  "They convey a layer hierarchy with [the
>layer with] TID=0 and LID=0 identifying the base layer."
>
>My guess is that the special case for interpreting the TL0PICIDX value
>is actually when both TID and LID = 0, that is, the base layer, not just
>TID = 0 as stated in the text.  (If I'm wrong, the structure here is

Mo: Correct. I fixed the text to explicitly state this (TID=LID=0).

>more complicated than I'm describing, with the TL0PICIDX labels of an
>upper layer referring to the label with the *same* LID but TID = 0.)
>
>The idea seems to be that one can "efficiently" discard layers from the
>RTP stream, as long as: if one keeps a layer with a particular TID and
>LID, one keeps all layers with lesser or equal TID and LID.  I can't
>quite see how best to define "efficiently" here, but it seems to be the
>central reason for labeling the layers -- that a receiver can
>successfully decode all of the data in all of the layers that remain
>present.

Mo: Correct.
>
>Things are more interesting in regard to what packets can be discarded
>from a layer "efficiently".  The S, E, I, D, and B bits seem to be
>intended to guide a device that needs to discard packets.  The use of D
>bits is specified:
>
>   When an RTP switch needs to discard a received video frame due to
>   congestion control considerations, it is RECOMMENDED that it
>   preferably drop frames marked with the D (Discardable) bit set [...]
>
>And I suspect that it is implied that if packets are dropped from one
>frame, further packets from the same frame are preferred to be dropped.
>The S and E bits are intended to help with this process.
>
>But dropping whole frames to some degree conflicts with the fact that
>small losses from video layers can often be recovered from, either due
>to redundancy in the layer, or by loss-reconstruction strategies in the
>receiver.  However, if one drops a *lot* of packets from one frame, one
>might as well discard the remainder of them.

Mo: Correct. We decided against being normative or even prescriptive about
any aspect of this as it is highly implementation dependent and subjective.
>
>The I bit suggests that there are provisions for dependency between the
>frames in a single layer, and dependency between frames is not the same
>for all frames.  It appears that if one frame of a layer is dropped, the
>following frames are preferred to be dropped until a frame with I = 1 is
>seen.

Mo: Again, that is highly subjective and implementation dependent, so we
don't attempt guidance or requirements here.

>I am less clear on what the B bit means -- presumably all layers with
>lower TID and LID than the layer containing the frame in question are
>retained, B doesn't seem to carry useful information.

Mo: B is used to break intra-layer dependencies in upper layers that have
loss, and fallback to the base layer which is often protected better
(FEC/RTX) and thus more reliable. Again, this is more dark art that we
can't prescribe or even describe accurately for most implementations (as
they vary widely).
>
>And there seems to be a problem with "defaulting" the value of B to 0
>when there is no scalability.
>
>As stated:
>
>   o  B: Base Layer Sync (1 bit) - MUST be 1 if the sender knows this
>      frame only depends on the base temporal layer; otherwise MUST be 0.
>
>This can be stated equivalently:
>
>      MUST be 1 if the sender knows this frame does not depend on any
>      frames that do not have TID=0.
>
>Now if the frame itself has TID=0, then it cannot (by the ordering of
>the layers) depend on any frame that does not also have TID=0.  The
>consequence is that the "natural" value of B in TID=0 layers is 1.  And
>when there is no scalability, the only layer has TID=0.

Mo: I fixed this unfortunate circumstance by forcing B=0 when TID=0. B is
useless when TID=0, so we can force it to 0 or 1 arbitrarily. The
"natural" value for something to ignore is 0, so I just enforced this in
the following text.
o B: Base Layer Sync (1 bit) - When TID is not 0, this MUST be 1 if
the sender knows this frame only depends on the base temporal
layer; otherwise MUST be 0. When TID is 0 or if no scalability is
used, this MUST be 0.


>I think what is going on is that there's an implicit structure of
>dependencies between the frames of a layer (a frame depends on earlier
>frames), and between the frames of different layers (a frame can depend
>on frames with lower TID/LID and no later in time), and the various bits
>are used to signal the *lack* of certain possible dependencies, but how
>the bits do this needs to be clarified.  (The meaning of TL0PICIDX
>particularly needs to be specified.)  But the implicit dependency
>structure isn't spelled out.  That makes things harder in two ways:  (1)
>it is not clear what sorts of dependencies future codecs are *not*
>allowed tointroduce, and (2) it is difficult to state exactly what
>dependencies are *removed* by particular signaling.

Mo: It is impossible to capture and signal all the dependencies in a
modern video stream efficiently. Far too much context required, and far
too much variation allowed within video specs, although most
implementations settle on simpler dependency structures. We decided to
limit the scope in 3.4.2 to "nested hierarchical temporal layering
structures" since that is the most widely deployed, and more complex
structures would require more complex and inefficient signaling to
describe all dependencies. This was a WG decision that was revisited and
upheld several times.
>
>3.2.1.  Layer ID Mappings for Scalable Streams
>
>All of the descriptions for specific codecs contain "ID=2", whereas the
>generic descriptions of the extension formats show "ID=?".  The latter
>is correct, since the ID value is negotiated for every RTP stream.

Mo: Good catch. It doesn't even match the example extmap:3. I replaced 2
with ?.

>3.4.  Usage Considerations
>
>The switching of video streams is recommended to be done this way:
>
>   When an RTP switch wants to forward a new video stream to a receiver,
>   it is RECOMMENDED to select the new video stream from the first
>   switching point with the I (Independent) bit set in all spatial
>   layers and forward the same.  An RTP switch can request a media
>   source to generate a switching point by sending Full Intra Request
>   (RTCP FIR) as defined in [RFC5104], for example.
>
>This is difficult to implement in general, as it requires the switch to
>keep track of all the layer IDs that have been seen, then look ahead in
>the stream to see if, over a narrow range of time, all of the layers
>that have been seen have packets with I set.  If the fundamental purpose
>of I is to signal the best points to switch streams, it would be better
>to define its semantics to be that.  E.g., "If a switch intends to start
>forwarding a video stream, and within that stream, transmitting all
>frames with TID and LID less than or equal to certain values, it should
>start forwarding the stream beginning with a packet within that layer
>that has I set."  That is, I signals that at this point, the coming
>frames of this layer and all layers with lesser TID/LID can be decoded
>without dependency on any previous frames.

Mo: Media switches already do this, but by deep inspection of the payload
rather than simple inspection of a header extension. The I bit does not
signal anything about lower layers, only this layer. That is what media
switches want and expect.
>
>3.4.2.  Scalability Structures
>
>It would be more effective to state that that for "complex or irregular
>scalability structures", subdivision by TID and LID is not effective and
>so such structures should mark all packets with TID=0 and LID=0.  The
>current text suggests that the switch is required to know whether such a
>structure is in use, and if so, ignore the TID and LID fields, which
>suggests that the sender can put various values in those fields.  This
>would lead to requiring the switch to know what encoding is in use, and
>avoiding that is the point of this document.

Mo: Complex structures also use TIDs and LIDs but not necessarily in a
clean nested hierarchy. The most complex structures are total anarchy
(dynamic, unpredictable) but could still use TIDs and LIDs (within the
codec payloads, but not in this header extension). We don't see any good
way to support such complexity, so the extension is not useful for such
streams. This decision must happen at the source, not any later node like
the switch. Only the stream source truly knows the full structure of its
streams, so it decides whether or not to use this extension. Later nodes
like the switch can't reliably detect/infer this properly.
>
>Dale