Re: [Cellar] AV1 mapping update

Andreas Rheinhardt <andreas.rheinhardt@googlemail.com> Sat, 14 July 2018 21:56 UTC

Return-Path: <andreas.rheinhardt@googlemail.com>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0434D130EFB for <cellar@ietfa.amsl.com>; Sat, 14 Jul 2018 14:56:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=googlemail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bvnDxlOjeqEY for <cellar@ietfa.amsl.com>; Sat, 14 Jul 2018 14:56:04 -0700 (PDT)
Received: from mail-qt0-x22d.google.com (mail-qt0-x22d.google.com [IPv6:2607:f8b0:400d:c0d::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 924E012F1AC for <cellar@ietf.org>; Sat, 14 Jul 2018 14:56:04 -0700 (PDT)
Received: by mail-qt0-x22d.google.com with SMTP id d4-v6so14566862qtn.13 for <cellar@ietf.org>; Sat, 14 Jul 2018 14:56:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=subject:to:references:from:message-id:date:mime-version:in-reply-to :content-transfer-encoding; bh=Ft21uOwpBg9Qci9I6XuaJMddJDskDaK30O6wYE9w6qY=; b=T9yFkNG8xqAKAsPstzPcugPgxhOZJnHtpTuP3y3a1tUbW/VXxXgHsP7f0HwFaqe3M3 c/3iEVbecG4y3YRrqWwZUIfsEFXmqmmvn1yZrtxGqAHu5Zq4WvpHcFmg4n2N/qWNMOFu 56Yawoa0wJ9fxwzVKa3vE+IpiDGfroQCZZcC46yn0xTSHsF+b4VhCEheh9rlU7rVEuyg VXO79mj2mZXf9MwCEmWhYQEdVPcULIB0qhTC0ZMIUrDzSuju0yo8J5ArbYoUoQVdGZAp JV9P0y1sx3joQr+tK5ndmUvsJvpFOTyMdCcoIoWK484jJYFhIwG6TiI90JPSTp4qctSl pyEA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-transfer-encoding; bh=Ft21uOwpBg9Qci9I6XuaJMddJDskDaK30O6wYE9w6qY=; b=fyp+SSOwALHfeoxxCsT0GPm6mR7r6W45g7aIZXvN+rXaoJDHHQEnimWHjlkVTpnuKs h19zjqZjHmx6fJ9LdroB03Ki1pl2u9B54U/B22ZLRuBgoA4fK2sMnUGqut1iGpw4umck nxU7hnqJuT92vckAIUOfHA3KMEf0qhn1i6H0IOTi8qj9wFkx4c+SzBRj9MHOKy32T7j1 6M0NrYhkMvzsq0NUveSwqnPFsplvEDaC88zXG8JWz+Q0jLkMr3P6cCcBIMbhcDjyIh6p fKg/bKUQ2GxjNPJR5bbQZWS3X96c57NhYSncrtuXLdFN+WKqlWCvq8t7FqhdJ2x7wK9+ Z4rA==
X-Gm-Message-State: AOUpUlH8ynuZzQXcOzgGMwHp9uLj0lSSEOBhJWF8tWE2HIegLtqSmVyL GS1YNfwo40wJtUr+3FPyBnptSAMj
X-Google-Smtp-Source: AAOMgpeHviCcQBJ07keP6zMtAfdDybjr3ycUBzHm4+RxWtlvunIdtPuMY197uL/RHFfgRUUGAgQ0oA==
X-Received: by 2002:a0c:80a8:: with SMTP id 37-v6mr12437030qvb.13.1531605363299; Sat, 14 Jul 2018 14:56:03 -0700 (PDT)
Received: from [127.0.0.1] ([2604:9a00:2010:a08d:10::23]) by smtp.googlemail.com with ESMTPSA id x7-v6sm15297962qtc.66.2018.07.14.14.56.01 for <cellar@ietf.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 14 Jul 2018 14:56:02 -0700 (PDT)
To: Codec Encoding for LossLess Archiving and Realtime transmission <cellar@ietf.org>
References: <CAOXsMFKHo6RS+q8KCXKoKCiBBS9pVqs92wsLgSfXZO+DT3dStQ@mail.gmail.com> <ca0f009e-a245-fcd6-95f8-f051736c9161@googlemail.com> <CAOXsMFL5-MaHQaAOyh7jSFUpCNbSEvAWKmAHcepaF+QsQuYbHw@mail.gmail.com> <fee747da-77ca-9282-a4c3-c112fd746507@googlemail.com> <CAOXsMFJtc9pq+PphRb5kF9Mp4jyS5j3LQi6vQQmHRyTDYWyQ-A@mail.gmail.com>
From: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
Message-ID: <b8486fa4-132b-f814-7046-91efb0a48ec6@googlemail.com>
Date: Sat, 14 Jul 2018 21:55:00 +0000
MIME-Version: 1.0
In-Reply-To: <CAOXsMFJtc9pq+PphRb5kF9Mp4jyS5j3LQi6vQQmHRyTDYWyQ-A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/3ff1BOzetyPrOz02eL2lcbwxAwk>
Subject: Re: [Cellar] AV1 mapping update
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Jul 2018 21:56:08 -0000

Hello,

what I have to say about the "seeking with delayed random access points"
topic will be said in a separate email as a reply to your other email.
Here is the rest:

1. "The [timing_info_present_flag] of the Sequence Header OBU SHOULD be
0. Even when it is 1 the presentation time of the Frame Header OBUs in
Blocks should be discarded. In other words, only the timestamps given by
the Matroska container MUST be used."

a) This clause should not be in the section about the `CodecPrivate` as
it pertains to all `Sequence Header OBUs`, not just the one in the
`CodecPrivate`.

b) I don't see a reason why the bitstream should be cleaned of these
values. This isn't done for H.264 or other codecs either.

c) If you want to keep this recommendation in, then we should merge it
with the recommendation to discard [temporal_point_info] (after all,
setting [timing_info_present_flag] to 0 automatically does the same with
[decoder_model_info_present_flag] and then [[temporal_point_info]]
mustn't be present anyway) that uses [frame_presentation_time]. It's
probably good to adapt the mp4 version:
"The presentation times of AV1 samples are given by the Matroska
container. The [timing_info_present_flag] in the `Sequence Header OBU`
(in the `CodecPrivate` or in the bitstream) SHOULD be set to 0. If set
to 1, the [timing_info] structure of the `Sequence Header OBU`, the
[frame_presentation_time] and [buffer_removal_time] fields of the `Frame
Header OBUs`, if present, SHALL be ignored for the purpose of timed
processing of the Matroska file."

2.

Steve Lhomme:
>> The current version is:
>> "The OBUs in the Block follow the [Low Overhead Bitstream Format
>> syntax]. They SHOULD have the [obu_has_size_field] set to 1 except for
>> the last OBU in the sample, for which [obu_has_size_field] MAY be set to
>> 0, in which case it is assumed to fill the remaining of the sample."
>>
>> If one interprets the first sentence as meaning "The OBUs in the Block
>> MUST follow the [Low Overhead Bitstream Format syntax]", then given that
>> this syntax mandates [obu_has_size_field] to be equal to 1 the first
>> part of the second sentence is redundant (given that MUST is stronger
>> than SHOULD) and the second part is again in contradiction to/voided by
>> the first sentence because the first sentence doesn't allow
>> "[obu_has_size_field]" set to zero at all.
>> If one interprets the first sentence as not conveying a MUST, then it is
>> allowed (albeit strongly discouraged) to use [obe_has_size_field] equal
>> to 0 for an OBU that is not the last OBU in the sample. This is not what
>> we want, isn't it? How about:
>> "The OBUs in the `Block` MUST follow the [Low Overhead Bitstream Format
>> syntax] with the possible exception of the last OBU of a `Block` for
>> which [obu_has_size_field] MAY be set to 0, in which case it is assumed
>> to fill the remainder of the `Block`."
> 
> MUST is wrong IMO if you add an exception. Then it should be a SHOULD
> and explain the cases where it shouldn't. I left it out of the first
> sentence on purpose because the "normative" SHOULD is on the next
> sentence and the one that should apply.

My interpretation of my proposal is: "If the exception doesn't apply,
then it is a MUST (a real MUST), otherwise the exception (with its MAY)
applies." I thought this is what is really wanted: After all, there used
to be the sentence "The OBUs in the `Block` MUST follow the __[Low
Overhead Bitstream Format syntax]__." And the commit message of the
commit that changed this part to the way it currently is reads: "av1:
the low overhead format must be used in Blocks except for the last OBU"

Anyway, given that my earlier proposal can apparently be misunderstood
in ways I didn't imagine let me rephrase it:
"If an OBU is not the last OBU in a `Block`, it MUST follow the [Low
Overhead Bitstream Format syntax] (i.e. it MUST have
[obu_has_size_field] set to 1); the last OBU in a `Block` MAY have
[obu_has_size_field] set to 0, in which case it is assumed to fill the
remainder of the `Block`."

I can't think of any ambiguity in the above wording.

3.

>>>> 4. "ReferenceBlocks inside a BlockGroup MUST reference frames according
>>>> to the [ref_frame_idx] values of frame that is neither a KEYFRAME nor an
>>>> INTRA_ONLY_FRAME.": The problem with this sentence is that
>>>> [ref_frame_idx] needn't be present. It depends upon
>>>> [frame_refs_short_signaling] and [show_existing_frame]. If one uses a
>>>> Block inside a Blockgroup and if [show_exsting_frame] equals one one
>>>> should reference the block that contained the showable frame that is now
>>>> output (and that this should be the only `ReferenceBlock` written). In
>>>> case of [frame_refs_short_signaling] == 1 the obvious candidates for
>>>> `ReferenceBlocks` are the blocks containing the `last_frame_idx` and
>>>> `gold_frame_idx` that are explicitly signalled. If I am not mistaken,
>>>> then there are also other reference frames that are not explicitly
>>>> signalled, but computed. I don't know if we should really write a
>>>> `ReferenceBlock` entry for every reference as the current proposal seems
>>>> to imply. This would be quite a bit of overhead for no gain (and
>>>> furthermore, it would complicate muxers that would have to compute the
>>>> references that are not explicitly signalled in case that
>>>
>>> This is how `ReferenceBlock` is supposed to be used. So a muxer that
>>> has no idea of any codec can cut a file and keep the relevant
>>> references. So they all have to be there. It's one of the reasons
>>> SimpleBlock was added, to simplify things a little (and reduce
>>> overhead).
>>>
>> Actually a muxer can cut a file if it just knows the keyframes, the
>> decoding order and the display order. It doesn't need to have complete
>> information about reference frames. After all, one can cut files that
>> exclusively use `SimpleBlocks`.
> 
> I'm not sure it's true for modern codec where the referenced frame may
> be older than the previous keyframes. Also the ReferenceBlock allows
> picking only the frames necessary to render that particular frame,
> regardless of the internals of the codec.
> If by cut a file one means to specify an interval of the (output) movie
that should be preserved, then I have to correct myself: The criteria I
mentioned only work if the keyframes have the property that any block
with a timestamp bigger than the timestamp of the keyframe can be
correctly decoded if one starts decoding from the keyframe on. If not,
one also needs the information from which frame on the output is ok. One
doesn't need every reference for every frame for that, one could also
convey this information with one of the ways I proposed.

4.

>>>> 5. AV1 may use spatial scalability and/or temporal scalability. What do
>>>> we make of these? They are currently not forbidden if I am not mistaken,
>>>> but if e.g. the spatial dimensions of different layers disagree, the
>>>> `PixelWidth` and `PixelHeight` values can't be true for all layers.
>>>> Matroska seems to be missing some features here.
>>>
>>> Our spec says that the Sequence Header OBU should be valid for all
>>> frames. That can't be used for spatial scalability. We don't support
>>> that mode for now.
>>>
>> Then this should be explicitly stated in the codec mapping. And I also
>> fail to see why the fact that the Sequence Header OBU should be valid
>> for all frames should be incompatible with spatial scalability (after
>> all, in my reading of the spec the various share the same Sequence
>> Header OBU).
> 
> I didn't fully understand how the spatial scalability works. If the
> same Sequence Header OBU supports it then we can support it. After all
> the PixelWidth/Height use the "maximum" width/height. But internally
> it may be less.
> 
It uses the same `Sequence Header OBU`. But it is not confined to
something internal to the codec. Different spatial layers can have
different output resolutions; in fact even without this the frame
dimensions may change from frame to frame. And if I am not mistaken,
then this is not something internal to the codec: The output pictures
(that have to be scaled by the renderer) can have varying dimensions,
too. Currently the segment restrictions contain the following clause:

"Matroska doesn't allow dynamic changes within a codec for the whole
Segment. The parameters that should not change for a video Track are the
dimensions and the CodecPrivate."

I take this as meaning that all output frames must have the same
dimension and I think (this point is not clearly stated) it is meant to
be that said output dimensions coincide with what must be put into
`DisplayWidth` and `DisplayHeight` so that one actually has the added
restriction for AV1 in Matroska that every output frame has the same
width as [max_frame_width_minus_1]+1 (similar for height). But what I'd
like to know is how much of AV1 will probably be excluded from Matroska
by these requirements? Could we contact some of the codec designers and
ask them about their opinions on this? (Honestly, "we" probably means
"you" as you are the chairman of Matroska.) And maybe other things where
we might have questions for them. (We should collect the questions first.)

5.

>>> My wording would be:
>>> "Upon seeking to a keyframe the player/demuxer MUST prepend the `Block`
>>> with the `Sequence Header OBU` contained in the `CodecPrivate`.
>>> A muxer MUST make sure that the correct `Sequence Header OBU` is in
>>> force both during linear access and also after seeking to a keyframe
>>> `Block`. So in particular a keyframe `Block` where a `Sequence Header
>>> OBU` that is not bit-identical to the one in the `CodecPrivate` is in
>>> force for the decoding of the first frame contained in said `Block` MUST
>>> contain the `Sequence Header OBU` that is in force for the decoding of
>>> the first frame contained in said `Block` in front of the first frame
>>> contained in said `Block`."
>>>
>>> One could also relax this a bit and only make the correct linear access
>>> a MUST and the rest a SHOULD. This might be useful for applications
>>> where seeking isn't desired (although it really should be included even
>>> for those scenarios to support resuming playback after a transmission
>>> error).
>>
>> OK, I'll try to add something for seeking.
>
> Actually we already have it in the Segment Restrictions:
>
> Given a `Sequence Header OBU` can be omitted from a `Block` if
> __[decoder_model_info_present_flag]__ is 0 and it is bit identical to
> the one found in `CodecPrivate`, when seeking to a keyframe, that
> omitted `Sequence Header OBU` MUST be added back to the bitstream for
> compliance with the Random Access Decoding section of the [AV1
> Specifiations](#av1-specifications).

No, we haven't. But before I come to this here are two more points that
I would like to mention first:

a) [decoder_model_info_present_flag] controls more than just whether
[operating_parameters_info] is present: It also influences the presence
of [temporal_point_info] which is AV1's way of storing the timestamps
(possibly VFR) of the video. The real condition whether the `Sequence
Header OBUs` of a CVS can change is whether
[decoder_model_present_for_this_op[i]] is 1 for an operating point. But
in my proposal it doesn't matter anymore anyway, as the way of
specifying when a `Sequence Header OBU` has to be present is entirely
independent of the one track = one CVS restriction (it would work just
as well if this restriction were dropped).

b) The current proposal mandates that one track can only consist of one
(subset of a) CVS in a suboptimal manner (if it mandates it at all): It
says:

"Matroska doesn't allow dynamic changes within a codec for the whole
Segment. The parameters that should not change for a video Track are the
dimensions and the CodecPrivate."

The `CodecPrivate` can't change in a valid Matroska track anyway,
because it can only have one. Therefore this requirement is empty and
the only requirement of one track = one CVS is in the next sentence:

"The first Sequence Header OBU of a CVS is stored in the CodecPrivate of
a Track. So this AV1 Track has the same requirements as the CVS."

Here the "so" is currently a non-sequitur.
Better wording:

"Matroska doesn't allow dynamic changes within a `Track` for the whole
`Segment`. Therefore the `Sequence Header OBUs` of the `Track` (both the
one in the `CodecPrivate` as well as the in-band `Sequence Header OBUs`)
MUST adhere to the restriction that the `Sequence Header OBUs` of a
`CVS` must fulfill: Their content MUST be bit-identical each time a
`Sequence Header OBU` appears except for the contents of
[operating_parameters_info]. Furthermore the dimensions of all output
frames MUST be equal."



c) Here is what the proposal currently has to say about `Sequence Header
OBUs`:
"Sequence Header OBUs SHOULD be omitted when they are bit-identical to
the one found in CodecPrivate and [decoder_model_info_present_flag] is 0
and the previous Sequence Header OBUs in the bistream was also
bit-identical to the one found in CodecPrivate. They can be kept when
encryption constraints require it."

"The first Sequence Header OBU of a CVS is stored in the CodecPrivate of
a Track. So this AV1 Track has the same requirements as the CVS."

"If the [decoder_model_info_present_flag] of this Sequence Header OBU is
set to 1 then each keyframe Block MUST contain a Sequence Header OBU
before the Frame Header OBUs."

"Given a Sequence Header OBU can be omitted from a Block if
[decoder_model_info_present_flag] is 0 and it is bit identical to the
one found in CodecPrivate, when seeking to a keyframe, that omitted
Sequence Header OBU MUST be added back to the bitstream for compliance
with the Random Access Decoding section of the AV1 Specifiations."

"A SimpleBlock MUST be marked as a Keyframe only if the first Frame OBU
in the Block has a [frame_type] of KEY_FRAME and the SimpleBlock
contains a Sequence Header OBU or if the Sequence Header OBU is
correctly omitted (see above)."

"A Block inside a BlockGroup MUST use ReferenceBlock elements if the
first Frame OBU in the Block has a [frame_type] other than KEY_FRAME or
the Block doesn't contain a Sequence Header OBU when it should not be
omitted."

Clause 1 and clause 4 only apply when [decoder_model_info_present_flag]
is 0. In this case (by our requirements of each track only containing a
single CVS) all `Sequence Header OBUs` must be bit-identical; in
particular, the part of clause 1 dealing with the previous `Sequence
Header OBU` is unnecessary because it is automatically fulfilled.

Clause 3 directly contradicts your claim that the current proposal
already allows to strip away some of the `Sequence Header OBUs` when
they are bit-identical to the one in the `CodecPrivate` and redundant
for linear access.

Clause 2 is actually an unnecessary restriction. It might be that the
very first `Sequence Header OBU` is rare and that one could save more
bytes when putting a different one in the `CodecPrivate`. In any case I
fail to see why the first `Sequence Header OBU` should be treated
specially in the standard; that it allows for simpler muxers will of
course make most muxer put the first `Sequence Header OBU` in the
`CodecPrivate`, no doubt about that. But if one requires this, then
splitting AV1 tracks in Matroska will be more complicated: One has to
change the `CodecPrivate` to what is the `Sequence Header OBU` used by
the first frame.

Even ignoring this I regard the whole requirements as confusing. The
requirements should not be stated in terms of
[decoder_model_info_present_flag] (or
[decoder_model_present_for_this_op[i]]); instead one should state them
as a corollary of the general principle that the necessary codec
initialisation data/extradata must be available to the decoder during
linear access as well as after random access (to a keyframe) and not
hide this fact behind some conditions involving the aforementioned
flags. It's much more intelligible this way. And it encourages muxers
not to hard-code to treat the cases of [decoder_model_info_present_flag]
being 0 or 1 (or [decoder_model_present_for_this_op[i]] being 0 for all
applicable i or not) separately, but to use a system that works even if
multiple CVS were allowed in the future.

d) Here is a proposal for said section in case we make it a MUST that
keyframes must have the correct `Sequence Header OBU` (whether because
it is there or because it is prepended from the `CodecPrivate`):

"Upon seeking to a random access point (RAP) or starting playback from
the beginning the player/demuxer MUST prepend the `Block` where decoding
starts with the `Sequence Header OBU` contained in the `CodecPrivate`.
Afterwards playback proceeds normally including consuming any `Sequence
Header OBUs` that are found in the bitstream.
Seeking to a non-RAP is undefined and not recommended.

A muxer MUST make sure that when using a conformant demuxer/player the
correct `Sequence Header OBU` is active both during linear access and
also after seeking to a random access `Block`. In particular, if a
`Sequence Header OBU` that differs from the `Sequence Header OBU` in the
`CodecPrivate` is active during consuming of the first `Frame Header
OBU` of a random access point sample, then said sample MUST contain said
`Sequence Header OBU` in front of the first `Frame Header OBU`.

The muxer MAY omit (strip away and discard) `Sequence Header OBUs`
provided the above criteria are still fulfilled afterwards.

Note: A `Sequence Header OBUs` that fulfills any of these criteria can
be omitted without changing compliance to the above criteria:

If, after potentially stripping away all `Temporal Delimiter OBUs` there
are more than one `Sequence Header OBUs` immediately after each other,
then all but the last of these `Sequence Header OBUs` can be omitted.

A `Sequence Header OBU` that differs from the `Sequence Header OBU` in
the `CodecPrivate` can be omitted if there was a preceding `Sequence
Header OBU` in the bitstream, if the preceding `Sequence Header OBU` was
bit-identical to the current `Sequence Header OBU` and if the current
`Sequence Header OBU` is not the first OBU after a `Temporal Delimiter
OBU` that starts a temporal unit whose corresponding `Block` is a random
access point.

A `Sequence Header OBU` that is bit-identical to the `Sequence Header
OBU` in the `CodecPrivate` can be omitted if there was no preceding
`Sequence Header OBU` or if there was a preceding `Sequence Header OBU`
and the previous `Sequence Header OBU` was bit-identical to the current
`Sequence Header OBU`."

If this is adopted, clauses 2-4 from c) above should be omitted and
clauses 5 and 6 can be adapted, too.

That's it from me today.

- Andreas Rheinhardt