Re: [AVTCORE] I-D Action: draft-ietf-avtext-framemarking-08.txt
Jonathan Lennox <jonathan@vidyo.com> Thu, 04 April 2019 15:31 UTC
Return-Path: <jonathan@vidyo.com>
X-Original-To: avt@ietfa.amsl.com
Delivered-To: avt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6CCB41203F6 for <avt@ietfa.amsl.com>; Thu, 4 Apr 2019 08:31:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.988
X-Spam-Level:
X-Spam-Status: No, score=-1.988 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=vidyo.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HF5sLRlDeB3Q for <avt@ietfa.amsl.com>; Thu, 4 Apr 2019 08:31:44 -0700 (PDT)
Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com [IPv6:2607:f8b0:4864:20::829]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DDDCC1200D6 for <avt@ietf.org>; Thu, 4 Apr 2019 08:31:43 -0700 (PDT)
Received: by mail-qt1-x829.google.com with SMTP id p20so3692170qtc.9 for <avt@ietf.org>; Thu, 04 Apr 2019 08:31:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vidyo.com; s=google; h=from:mime-version:subject:date:references:to:in-reply-to:message-id; bh=ZQzMRVYlY4ByKKYB0LR6oF5M2eB3NpyLrH/XL8lF9k0=; b=YN0T4D5FOpi7li7G61ZaGmDsr7LnjAyFcMf9QxNgMpUUMC/NsJixbcLTNLJo5td8ky z8PU4NsZ7liMdbhcAnVrlHjuUf3Eh0JL/q7K2GJlFwD3LOudFWdfmAXjHJQ3OXdMThky 7WnsuUh/+sahuXmrhEKr++Q1HMgsDADgGbjgEyl3WgnENz26tEj5R+ugVvVgrVCLphLC 78P3mdEVeRxYvaJ1m1YW7ACLvz0YEG8rgKwMba/YigSyrcW0BsDL77Rw/ZvMbHMr+Ic4 NYs2+1W6nA0R4nLvYymYEno5pTBc4N595+96P8+hQ2i2Ee/poApn4y/iwAn5P//0drnd 3JcQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:mime-version:subject:date:references:to :in-reply-to:message-id; bh=ZQzMRVYlY4ByKKYB0LR6oF5M2eB3NpyLrH/XL8lF9k0=; b=Sd8OoZDnx2d/gw1Nq5LPm5M8Z+7Wfz0fHKOVFpeDaUS+ncpuxEqI77JL2mbfB/fjrP bJyL1uzWXY218sB3thWHA4NUr/lQbKG2H0y4M2lbuBRtpYAmRbpJu7jz9ch+9oC3spV6 AO9nrnOHuS252agjHYE50UBLlOAOnAvoEfoo7X2uTpGRk9snWnMaFEvCnGmGClnqY+vy Qx3NWXvyZ3bjLiqiVOmpQNDw7G6rpB1uT3tBVGli2zufEx9cSyrWXj8JDhcD6liHWO3S bwBeV4mqFhdwVFcdE4rmIxjPlPd//a08tioJ2miYWUTtrRtZH6HVl+r2bU8lPZ1ug8M6 7LiA==
X-Gm-Message-State: APjAAAX/SwV9jSU+yLzthDAauArgb0r3L8M0MM8q3JmjRhPIdh8RiIP0 zGzaAUfvCwYbbgzV3fO7CRtA4gm2bfY=
X-Google-Smtp-Source: APXvYqyOtpOgeK0YlzjMfoT4q1IWyBEWMLdUjJBMBZ/wGIqEnZZnSoWs2XdE3li5fGAZnSdvK3TcIg==
X-Received: by 2002:ac8:1778:: with SMTP id u53mr5976170qtk.270.1554391902542; Thu, 04 Apr 2019 08:31:42 -0700 (PDT)
Received: from [172.16.1.246] ([160.79.220.2]) by smtp.gmail.com with ESMTPSA id m93sm11658726qte.74.2019.04.04.08.31.40 for <avt@ietf.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 04 Apr 2019 08:31:41 -0700 (PDT)
From: Jonathan Lennox <jonathan@vidyo.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_57A23FA0-11D3-49D0-8CD2-FF06F0C2E5E7"
Mime-Version: 1.0 (Mac OS X Mail 12.2 \(3445.102.3\))
Date: Thu, 04 Apr 2019 11:31:40 -0400
References: <23718.8872.229678.4132@paris.clic.cs.columbia.edu>
To: IETF AVTCore WG <avt@ietf.org>
In-Reply-To: <23718.8872.229678.4132@paris.clic.cs.columbia.edu>
Message-Id: <F845B05F-7761-47C2-AF21-22536D0882EA@vidyo.com>
X-Mailer: Apple Mail (2.3445.102.3)
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/JFKwrOxv0etcLOg-qwijx_L50hI>
Subject: Re: [AVTCORE] I-D Action: draft-ietf-avtext-framemarking-08.txt
X-BeenThere: avt@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Audio/Video Transport Core Maintenance <avt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/avt>, <mailto:avt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/avt/>
List-Post: <mailto:avt@ietf.org>
List-Help: <mailto:avt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/avt>, <mailto:avt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 04 Apr 2019 15:31:47 -0000
Authors — even though we’re past the nominal WGLC, please respond to this as a last call comment. I’ll hold off on writing up the publication request until this is resolved. > From: worley@ariadne.com (Dale R. Worley) > Subject: Re: [AVTCORE] I-D Action: draft-ietf-avtext-framemarking-08.txt > Date: April 3, 2019 at 7:32:15 AM EDT > To: Magnus Westerlund <magnus.westerlund@ericsson.com> > Cc: draft-ietf-avtext-framemarking@ietf.org, avt@ietf.org > > > I'm no expert on this field, but as the frame marking extension is > intended to be used broadly over many different video encodings, I think > it can be usefully critiqued relative to its ambition to be a > *generalized* frame marking mechanism. In particular, a number of its > features seem to reference ideas which generally apply to multiple > encodings. But this leaves a great deal of room for lack of alignment > as to the exact semantics of the features, which could easily lead to a > lot of subtle interoperation problems. So I am here pushing for a > clearer definition of what is and is not meant by the features. > > There is some oddity in how the sections are structured. The short form > is defined in 3.1, and the long form is defined in 3.2. The three > mapping for specific codecs are listed in 3.2.1.1, 3.2.1.2, and 3.2.1.3, > and 3.2.1.4. It would be better to group the two definitional sections > together and group the four example sections together. > > Also, 3.2.1.3 (H264 (AVC) LID Mapping) and 3.2.1.4 (VP8 LID Mapping) > don't specify how the S, E, I, D, and B bits are determined from the > codec's output packets. > > Regarding the multiple (four, actually) formats of the extension, it > helps specifying them if they can all be mapped into the same semantic > data structure. For example, > > TID is the temporal layer index. It is implicitly 0 if the short > format is used. > > LID is the (spacial) layer index. It is implicitly 0 if the short > format is used or the L=1 form of the long format is used. > > TL0PICIDX: When TID is 0: If present, it is a cyclic counter > labeling the frames. If not present, the frames have no such labels. > When TID is not 0, it indicates that this frame in this layer > depends on the frame with this label in the layer with TID 0. > > Notice that a missing TL0PICIDX has different semantics than a missing > TID or LID. > > Given the similarity of "temporal layer index" and "layer ID", it seems > like you want a more distinctive phrase for the latter. Could it be > changed to "spatial layer" or "resolution layer"? > > There seems to be no way to signal whether the short form is used > vs. the L=0 version of the long form (if B=0 and TID=0) -- The ID value > for both is signaled in SDP by > > a=extmap:3 urn:ietf:params:rtp-hdrext:framemarking > > This doesn't cause a problem, as the semantics of the two alternatives > are the same, but it prevents the 4 reserved bits in the short form from > being defined in the future for any purpose other than B and LID. -- > Alternatively, is the short form simply what the extension is reduced to > when using non-scalable streams, as those must necessarily have B=0 and > LID=0? (Also see my query below regarding the "default" value of B.) > > It seems that the intention is that the video stream can be divided into > substreams of RTP packets, called "layers", each of which is identified > by a particular TID and LID, that is, TID/LID defines a > *two-dimensional* hierarchy. "They convey a layer hierarchy with [the > layer with] TID=0 and LID=0 identifying the base layer." > > My guess is that the special case for interpreting the TL0PICIDX value > is actually when both TID and LID = 0, that is, the base layer, not just > TID = 0 as stated in the text. (If I'm wrong, the structure here is > more complicated than I'm describing, with the TL0PICIDX labels of an > upper layer referring to the label with the *same* LID but TID = 0.) > > The idea seems to be that one can "efficiently" discard layers from the > RTP stream, as long as: if one keeps a layer with a particular TID and > LID, one keeps all layers with lesser or equal TID and LID. I can't > quite see how best to define "efficiently" here, but it seems to be the > central reason for labeling the layers -- that a receiver can > successfully decode all of the data in all of the layers that remain > present. > > Things are more interesting in regard to what packets can be discarded > from a layer "efficiently". The S, E, I, D, and B bits seem to be > intended to guide a device that needs to discard packets. The use of D > bits is specified: > > When an RTP switch needs to discard a received video frame due to > congestion control considerations, it is RECOMMENDED that it > preferably drop frames marked with the D (Discardable) bit set [...] > > And I suspect that it is implied that if packets are dropped from one > frame, further packets from the same frame are preferred to be dropped. > The S and E bits are intended to help with this process. > > But dropping whole frames to some degree conflicts with the fact that > small losses from video layers can often be recovered from, either due > to redundancy in the layer, or by loss-reconstruction strategies in the > receiver. However, if one drops a *lot* of packets from one frame, one > might as well discard the remainder of them. > > The I bit suggests that there are provisions for dependency between the > frames in a single layer, and dependency between frames is not the same > for all frames. It appears that if one frame of a layer is dropped, the > following frames are preferred to be dropped until a frame with I = 1 is > seen. > > I am less clear on what the B bit means -- presumably all layers with > lower TID and LID than the layer containing the frame in question are > retained, B doesn't seem to carry useful information. > > And there seems to be a problem with "defaulting" the value of B to 0 > when there is no scalability. > > As stated: > > o B: Base Layer Sync (1 bit) - MUST be 1 if the sender knows this > frame only depends on the base temporal layer; otherwise MUST be 0. > > This can be stated equivalently: > > MUST be 1 if the sender knows this frame does not depend on any > frames that do not have TID=0. > > Now if the frame itself has TID=0, then it cannot (by the ordering of > the layers) depend on any frame that does not also have TID=0. The > consequence is that the "natural" value of B in TID=0 layers is 1. And > when there is no scalability, the only layer has TID=0. > > I think what is going on is that there's an implicit structure of > dependencies between the frames of a layer (a frame depends on earlier > frames), and between the frames of different layers (a frame can depend > on frames with lower TID/LID and no later in time), and the various bits > are used to signal the *lack* of certain possible dependencies, but how > the bits do this needs to be clarified. (The meaning of TL0PICIDX > particularly needs to be specified.) But the implicit dependency > structure isn't spelled out. That makes things harder in two ways: (1) > it is not clear what sorts of dependencies future codecs are *not* > allowed tointroduce, and (2) it is difficult to state exactly what > dependencies are *removed* by particular signaling. > > 3.2.1. Layer ID Mappings for Scalable Streams > > All of the descriptions for specific codecs contain "ID=2", whereas the > generic descriptions of the extension formats show "ID=?". The latter > is correct, since the ID value is negotiated for every RTP stream. > > 3.4. Usage Considerations > > The switching of video streams is recommended to be done this way: > > When an RTP switch wants to forward a new video stream to a receiver, > it is RECOMMENDED to select the new video stream from the first > switching point with the I (Independent) bit set in all spatial > layers and forward the same. An RTP switch can request a media > source to generate a switching point by sending Full Intra Request > (RTCP FIR) as defined in [RFC5104], for example. > > This is difficult to implement in general, as it requires the switch to > keep track of all the layer IDs that have been seen, then look ahead in > the stream to see if, over a narrow range of time, all of the layers > that have been seen have packets with I set. If the fundamental purpose > of I is to signal the best points to switch streams, it would be better > to define its semantics to be that. E.g., "If a switch intends to start > forwarding a video stream, and within that stream, transmitting all > frames with TID and LID less than or equal to certain values, it should > start forwarding the stream beginning with a packet within that layer > that has I set." That is, I signals that at this point, the coming > frames of this layer and all layers with lesser TID/LID can be decoded > without dependency on any previous frames. > > 3.4.2. Scalability Structures > > It would be more effective to state that that for "complex or irregular > scalability structures", subdivision by TID and LID is not effective and > so such structures should mark all packets with TID=0 and LID=0. The > current text suggests that the switch is required to know whether such a > structure is in use, and if so, ignore the TID and LID fields, which > suggests that the sender can put various values in those fields. This > would lead to requiring the switch to know what encoding is in use, and > avoiding that is the point of this document. > > Dale > > _______________________________________________ > Audio/Video Transport Core Maintenance > avt@ietf.org > https://www.ietf.org/mailman/listinfo/avt
- [AVTCORE] I-D Action: draft-ietf-avtext-framemark… internet-drafts
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Magnus Westerlund
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Mo Zanaty (mzanaty)
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Mo Zanaty (mzanaty)
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Mo Zanaty (mzanaty)
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Magnus Westerlund
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Dale R. Worley
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Jonathan Lennox
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Magnus Westerlund
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Mo Zanaty (mzanaty)
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Dale R. Worley
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Dale R. Worley
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Dale R. Worley
- Re: [AVTCORE] I-D Action: draft-ietf-avtext-frame… Mo Zanaty (mzanaty)