Re: [AVTCORE] WGLC for draft-ietf-avtext-framemarking

Jonathan Lennox <jonathan@vidyo.com> Fri, 05 January 2018 21:01 UTC

From: Jonathan Lennox <jonathan@vidyo.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Date: Fri, 05 Jan 2018 16:01:39 -0500
References: <9207B4E8-6531-4E7E-8F81-06CD0326CF56@vidyo.com>
To: IETF AVTCore WG <avt@ietf.org>
In-Reply-To: <9207B4E8-6531-4E7E-8F81-06CD0326CF56@vidyo.com>
Message-Id: <9DEE1EBD-6B42-4E5C-8188-F6E11FC25BC9@vidyo.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/avt/rGbHrt87jwdmjmSCPy9k9fLCLOQ>
Subject: Re: [AVTCORE] WGLC for draft-ietf-avtext-framemarking
Precedence: list

Here are my comments (as an individual) on draft-ietf-avtext-framemarking. (I’m not bothering to repeat any issues that Magnus raised; his are all good points.)

Section 3.1:

o The remaining (4 bits) - MUST be 0 for non-scalable streams.

I think this should say MUST be set to 0 by senders and MUST be ignored by receivers, for the sake of future extensibility (such as Roni’s proposal).

Section 3.2:

o I: Independent Frame (1 bit) - MUST be 1 for frames that can be
decoded independent of temporally prior frames, e.g. intra-frame,
VPX keyframe, H.264 IDR [RFC6184], H.265 IDR/CRA/BLA/RAP
[RFC7798]; otherwise MUST be 0. Note that this bit only signals
temporal independence, so it can be 1 in spatial or quality
enhancement layers that depend on temporally co-located layers but
not temporally prior frames.

One issue that might arise — in a base spatial layer refresh, i.e. this coding structure (Figure 2 from draft-ietf-avtext-lrr):

... <-- S1 <-- S1 <-- S1 <-- S1 <-- ...
| | | |
\/ \/ \/ \/
... <-- S0 <-- S0 S0 <-- S0 <-- ...

1 2 3 4
the S0 packets of Frame 3 will have the I bit set, even though this is not a true IDR / keyframe. In order for a receiver to detect a true IDR, it must look for an I bit set on every spatial layer of a frame. Is this definitely what we want? If so, Section 3.4 is wrong, and this procedure should be spelled out there.

Alternately, if I=1 and LID=0 means true IDR, this should be stated explicitly; it simplifies IDR detection but would mean that the Figure 2 coding structure cannot be described by frame marking.

o D: Discardable Frame (1 bit) - MUST be 1 for frames that can be
discarded, and still provide a decodable media stream; otherwise
MUST be 0.

I feel “MUST” for the first half of this definition is too strong here — there are scenarios where this information might not be fully available to the entity creating the frame marking, and so it might want to set the value to 0 in scenarios where it’s uncertain. I think this should be permitted.

o B: Base Layer Sync (1 bit) - MUST be 1 if this frame only depends
on the base layer; otherwise MUST be 0. If no scalability is
used, this MUST be 0.

It should be made clear whether “on the base layer” refers to the base temporal layer, the base spatial layer, or both. I think the intention is that it mean the base temporal layer, but I’m not sure.

This is another case where the entity creating the frame marking might not have full information, and so might want to set the value to 0 if it’s uncertain.

Additionally, it’s weird that “If no scalability is used, this MUST be 0”, since conceptually a stream that consists entirely of base layer frames would set this to 1. In general, I think all the “If no scalability is used” clauses in Section 3.2 should be removed, since they’re redundant with Section 3.1.

o TL0PICIDX: Temporal Layer 0 Picture Index (8 bits) - Running index
of base temporal layer 0 frames when TID is 0. When TID is not 0,
this indicates a dependency on the given index. If no scalability
is used, this MUST be 0 or omitted. When omitted, LID MUST also
be omitted.

How do we want TL0PICIDX values to work across IDR frames? Some codecs (H.264 SVC) reset their TL0PICID on IDR, others (VP8) insist on continuous values across them.

The former requires an IDRPICID (which I don’t think we want to add) for full correctness, but the latter can make splicing annoying (requiring a splicer to rewrite TL0PICIDX values forever).

Section 3.2.1.5:

I think this section needs to call out the requirement in section 3.2 that LID=0 always indicates the base layer — i.e., it’s not valid for a future LID mapping to be defined in a way that that breaks that invariant. Otherwise, receivers have no way to tell whether they’ve received the first spatial layer of a frame.

Also, I think it’s necessary — either here, or somewhere else — to require that frame marking only be used with RTP payload formats that follow the usual marker bit rule for video, that a marker bit indicates the last packet of the picture/access unit/temporal unit/whatever. Otherwise, there’s no way for a receiver to tell whether it’s received the *last* spatial layer of a frame.

Section 3.4.1:

This should mention that because frame marking can only be used with temporally-nested streams, temporal-layer LRR refreshes are unnecessary for frame-marked streams.

Other refreshes can be detected based on the I bit being set for the spatial layers in question (modulo the decision on my point about the I bit, above).

[AVTCORE] WGLC for draft-ietf-avtext-framemarking Jonathan Lennox
Re: [AVTCORE] WGLC for draft-ietf-avtext-framemar… Magnus Westerlund
Re: [AVTCORE] WGLC for draft-ietf-avtext-framemar… Jonathan Lennox
Re: [AVTCORE] WGLC for draft-ietf-avtext-framemar… Roni Even (A)
Re: [AVTCORE] WGLC for draft-ietf-avtext-framemar… Roni Even (A)