Re: [Cellar] AV1 mapping update
Andreas Rheinhardt <andreas.rheinhardt@googlemail.com> Sat, 14 July 2018 21:56 UTC
Return-Path: <andreas.rheinhardt@googlemail.com>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0434D130EFB for <cellar@ietfa.amsl.com>; Sat, 14 Jul 2018 14:56:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=googlemail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bvnDxlOjeqEY for <cellar@ietfa.amsl.com>; Sat, 14 Jul 2018 14:56:04 -0700 (PDT)
Received: from mail-qt0-x22d.google.com (mail-qt0-x22d.google.com [IPv6:2607:f8b0:400d:c0d::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 924E012F1AC for <cellar@ietf.org>; Sat, 14 Jul 2018 14:56:04 -0700 (PDT)
Received: by mail-qt0-x22d.google.com with SMTP id d4-v6so14566862qtn.13 for <cellar@ietf.org>; Sat, 14 Jul 2018 14:56:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20161025; h=subject:to:references:from:message-id:date:mime-version:in-reply-to :content-transfer-encoding; bh=Ft21uOwpBg9Qci9I6XuaJMddJDskDaK30O6wYE9w6qY=; b=T9yFkNG8xqAKAsPstzPcugPgxhOZJnHtpTuP3y3a1tUbW/VXxXgHsP7f0HwFaqe3M3 c/3iEVbecG4y3YRrqWwZUIfsEFXmqmmvn1yZrtxGqAHu5Zq4WvpHcFmg4n2N/qWNMOFu 56Yawoa0wJ9fxwzVKa3vE+IpiDGfroQCZZcC46yn0xTSHsF+b4VhCEheh9rlU7rVEuyg VXO79mj2mZXf9MwCEmWhYQEdVPcULIB0qhTC0ZMIUrDzSuju0yo8J5ArbYoUoQVdGZAp JV9P0y1sx3joQr+tK5ndmUvsJvpFOTyMdCcoIoWK484jJYFhIwG6TiI90JPSTp4qctSl pyEA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-transfer-encoding; bh=Ft21uOwpBg9Qci9I6XuaJMddJDskDaK30O6wYE9w6qY=; b=fyp+SSOwALHfeoxxCsT0GPm6mR7r6W45g7aIZXvN+rXaoJDHHQEnimWHjlkVTpnuKs h19zjqZjHmx6fJ9LdroB03Ki1pl2u9B54U/B22ZLRuBgoA4fK2sMnUGqut1iGpw4umck nxU7hnqJuT92vckAIUOfHA3KMEf0qhn1i6H0IOTi8qj9wFkx4c+SzBRj9MHOKy32T7j1 6M0NrYhkMvzsq0NUveSwqnPFsplvEDaC88zXG8JWz+Q0jLkMr3P6cCcBIMbhcDjyIh6p fKg/bKUQ2GxjNPJR5bbQZWS3X96c57NhYSncrtuXLdFN+WKqlWCvq8t7FqhdJ2x7wK9+ Z4rA==
X-Gm-Message-State: AOUpUlH8ynuZzQXcOzgGMwHp9uLj0lSSEOBhJWF8tWE2HIegLtqSmVyL GS1YNfwo40wJtUr+3FPyBnptSAMj
X-Google-Smtp-Source: AAOMgpeHviCcQBJ07keP6zMtAfdDybjr3ycUBzHm4+RxWtlvunIdtPuMY197uL/RHFfgRUUGAgQ0oA==
X-Received: by 2002:a0c:80a8:: with SMTP id 37-v6mr12437030qvb.13.1531605363299; Sat, 14 Jul 2018 14:56:03 -0700 (PDT)
Received: from [127.0.0.1] ([2604:9a00:2010:a08d:10::23]) by smtp.googlemail.com with ESMTPSA id x7-v6sm15297962qtc.66.2018.07.14.14.56.01 for <cellar@ietf.org> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 14 Jul 2018 14:56:02 -0700 (PDT)
To: Codec Encoding for LossLess Archiving and Realtime transmission <cellar@ietf.org>
References: <CAOXsMFKHo6RS+q8KCXKoKCiBBS9pVqs92wsLgSfXZO+DT3dStQ@mail.gmail.com> <ca0f009e-a245-fcd6-95f8-f051736c9161@googlemail.com> <CAOXsMFL5-MaHQaAOyh7jSFUpCNbSEvAWKmAHcepaF+QsQuYbHw@mail.gmail.com> <fee747da-77ca-9282-a4c3-c112fd746507@googlemail.com> <CAOXsMFJtc9pq+PphRb5kF9Mp4jyS5j3LQi6vQQmHRyTDYWyQ-A@mail.gmail.com>
From: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
Message-ID: <b8486fa4-132b-f814-7046-91efb0a48ec6@googlemail.com>
Date: Sat, 14 Jul 2018 21:55:00 +0000
MIME-Version: 1.0
In-Reply-To: <CAOXsMFJtc9pq+PphRb5kF9Mp4jyS5j3LQi6vQQmHRyTDYWyQ-A@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/3ff1BOzetyPrOz02eL2lcbwxAwk>
Subject: Re: [Cellar] AV1 mapping update
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 14 Jul 2018 21:56:08 -0000
Hello, what I have to say about the "seeking with delayed random access points" topic will be said in a separate email as a reply to your other email. Here is the rest: 1. "The [timing_info_present_flag] of the Sequence Header OBU SHOULD be 0. Even when it is 1 the presentation time of the Frame Header OBUs in Blocks should be discarded. In other words, only the timestamps given by the Matroska container MUST be used." a) This clause should not be in the section about the `CodecPrivate` as it pertains to all `Sequence Header OBUs`, not just the one in the `CodecPrivate`. b) I don't see a reason why the bitstream should be cleaned of these values. This isn't done for H.264 or other codecs either. c) If you want to keep this recommendation in, then we should merge it with the recommendation to discard [temporal_point_info] (after all, setting [timing_info_present_flag] to 0 automatically does the same with [decoder_model_info_present_flag] and then [[temporal_point_info]] mustn't be present anyway) that uses [frame_presentation_time]. It's probably good to adapt the mp4 version: "The presentation times of AV1 samples are given by the Matroska container. The [timing_info_present_flag] in the `Sequence Header OBU` (in the `CodecPrivate` or in the bitstream) SHOULD be set to 0. If set to 1, the [timing_info] structure of the `Sequence Header OBU`, the [frame_presentation_time] and [buffer_removal_time] fields of the `Frame Header OBUs`, if present, SHALL be ignored for the purpose of timed processing of the Matroska file." 2. Steve Lhomme: >> The current version is: >> "The OBUs in the Block follow the [Low Overhead Bitstream Format >> syntax]. They SHOULD have the [obu_has_size_field] set to 1 except for >> the last OBU in the sample, for which [obu_has_size_field] MAY be set to >> 0, in which case it is assumed to fill the remaining of the sample." >> >> If one interprets the first sentence as meaning "The OBUs in the Block >> MUST follow the [Low Overhead Bitstream Format syntax]", then given that >> this syntax mandates [obu_has_size_field] to be equal to 1 the first >> part of the second sentence is redundant (given that MUST is stronger >> than SHOULD) and the second part is again in contradiction to/voided by >> the first sentence because the first sentence doesn't allow >> "[obu_has_size_field]" set to zero at all. >> If one interprets the first sentence as not conveying a MUST, then it is >> allowed (albeit strongly discouraged) to use [obe_has_size_field] equal >> to 0 for an OBU that is not the last OBU in the sample. This is not what >> we want, isn't it? How about: >> "The OBUs in the `Block` MUST follow the [Low Overhead Bitstream Format >> syntax] with the possible exception of the last OBU of a `Block` for >> which [obu_has_size_field] MAY be set to 0, in which case it is assumed >> to fill the remainder of the `Block`." > > MUST is wrong IMO if you add an exception. Then it should be a SHOULD > and explain the cases where it shouldn't. I left it out of the first > sentence on purpose because the "normative" SHOULD is on the next > sentence and the one that should apply. My interpretation of my proposal is: "If the exception doesn't apply, then it is a MUST (a real MUST), otherwise the exception (with its MAY) applies." I thought this is what is really wanted: After all, there used to be the sentence "The OBUs in the `Block` MUST follow the __[Low Overhead Bitstream Format syntax]__." And the commit message of the commit that changed this part to the way it currently is reads: "av1: the low overhead format must be used in Blocks except for the last OBU" Anyway, given that my earlier proposal can apparently be misunderstood in ways I didn't imagine let me rephrase it: "If an OBU is not the last OBU in a `Block`, it MUST follow the [Low Overhead Bitstream Format syntax] (i.e. it MUST have [obu_has_size_field] set to 1); the last OBU in a `Block` MAY have [obu_has_size_field] set to 0, in which case it is assumed to fill the remainder of the `Block`." I can't think of any ambiguity in the above wording. 3. >>>> 4. "ReferenceBlocks inside a BlockGroup MUST reference frames according >>>> to the [ref_frame_idx] values of frame that is neither a KEYFRAME nor an >>>> INTRA_ONLY_FRAME.": The problem with this sentence is that >>>> [ref_frame_idx] needn't be present. It depends upon >>>> [frame_refs_short_signaling] and [show_existing_frame]. If one uses a >>>> Block inside a Blockgroup and if [show_exsting_frame] equals one one >>>> should reference the block that contained the showable frame that is now >>>> output (and that this should be the only `ReferenceBlock` written). In >>>> case of [frame_refs_short_signaling] == 1 the obvious candidates for >>>> `ReferenceBlocks` are the blocks containing the `last_frame_idx` and >>>> `gold_frame_idx` that are explicitly signalled. If I am not mistaken, >>>> then there are also other reference frames that are not explicitly >>>> signalled, but computed. I don't know if we should really write a >>>> `ReferenceBlock` entry for every reference as the current proposal seems >>>> to imply. This would be quite a bit of overhead for no gain (and >>>> furthermore, it would complicate muxers that would have to compute the >>>> references that are not explicitly signalled in case that >>> >>> This is how `ReferenceBlock` is supposed to be used. So a muxer that >>> has no idea of any codec can cut a file and keep the relevant >>> references. So they all have to be there. It's one of the reasons >>> SimpleBlock was added, to simplify things a little (and reduce >>> overhead). >>> >> Actually a muxer can cut a file if it just knows the keyframes, the >> decoding order and the display order. It doesn't need to have complete >> information about reference frames. After all, one can cut files that >> exclusively use `SimpleBlocks`. > > I'm not sure it's true for modern codec where the referenced frame may > be older than the previous keyframes. Also the ReferenceBlock allows > picking only the frames necessary to render that particular frame, > regardless of the internals of the codec. > If by cut a file one means to specify an interval of the (output) movie that should be preserved, then I have to correct myself: The criteria I mentioned only work if the keyframes have the property that any block with a timestamp bigger than the timestamp of the keyframe can be correctly decoded if one starts decoding from the keyframe on. If not, one also needs the information from which frame on the output is ok. One doesn't need every reference for every frame for that, one could also convey this information with one of the ways I proposed. 4. >>>> 5. AV1 may use spatial scalability and/or temporal scalability. What do >>>> we make of these? They are currently not forbidden if I am not mistaken, >>>> but if e.g. the spatial dimensions of different layers disagree, the >>>> `PixelWidth` and `PixelHeight` values can't be true for all layers. >>>> Matroska seems to be missing some features here. >>> >>> Our spec says that the Sequence Header OBU should be valid for all >>> frames. That can't be used for spatial scalability. We don't support >>> that mode for now. >>> >> Then this should be explicitly stated in the codec mapping. And I also >> fail to see why the fact that the Sequence Header OBU should be valid >> for all frames should be incompatible with spatial scalability (after >> all, in my reading of the spec the various share the same Sequence >> Header OBU). > > I didn't fully understand how the spatial scalability works. If the > same Sequence Header OBU supports it then we can support it. After all > the PixelWidth/Height use the "maximum" width/height. But internally > it may be less. > It uses the same `Sequence Header OBU`. But it is not confined to something internal to the codec. Different spatial layers can have different output resolutions; in fact even without this the frame dimensions may change from frame to frame. And if I am not mistaken, then this is not something internal to the codec: The output pictures (that have to be scaled by the renderer) can have varying dimensions, too. Currently the segment restrictions contain the following clause: "Matroska doesn't allow dynamic changes within a codec for the whole Segment. The parameters that should not change for a video Track are the dimensions and the CodecPrivate." I take this as meaning that all output frames must have the same dimension and I think (this point is not clearly stated) it is meant to be that said output dimensions coincide with what must be put into `DisplayWidth` and `DisplayHeight` so that one actually has the added restriction for AV1 in Matroska that every output frame has the same width as [max_frame_width_minus_1]+1 (similar for height). But what I'd like to know is how much of AV1 will probably be excluded from Matroska by these requirements? Could we contact some of the codec designers and ask them about their opinions on this? (Honestly, "we" probably means "you" as you are the chairman of Matroska.) And maybe other things where we might have questions for them. (We should collect the questions first.) 5. >>> My wording would be: >>> "Upon seeking to a keyframe the player/demuxer MUST prepend the `Block` >>> with the `Sequence Header OBU` contained in the `CodecPrivate`. >>> A muxer MUST make sure that the correct `Sequence Header OBU` is in >>> force both during linear access and also after seeking to a keyframe >>> `Block`. So in particular a keyframe `Block` where a `Sequence Header >>> OBU` that is not bit-identical to the one in the `CodecPrivate` is in >>> force for the decoding of the first frame contained in said `Block` MUST >>> contain the `Sequence Header OBU` that is in force for the decoding of >>> the first frame contained in said `Block` in front of the first frame >>> contained in said `Block`." >>> >>> One could also relax this a bit and only make the correct linear access >>> a MUST and the rest a SHOULD. This might be useful for applications >>> where seeking isn't desired (although it really should be included even >>> for those scenarios to support resuming playback after a transmission >>> error). >> >> OK, I'll try to add something for seeking. > > Actually we already have it in the Segment Restrictions: > > Given a `Sequence Header OBU` can be omitted from a `Block` if > __[decoder_model_info_present_flag]__ is 0 and it is bit identical to > the one found in `CodecPrivate`, when seeking to a keyframe, that > omitted `Sequence Header OBU` MUST be added back to the bitstream for > compliance with the Random Access Decoding section of the [AV1 > Specifiations](#av1-specifications). No, we haven't. But before I come to this here are two more points that I would like to mention first: a) [decoder_model_info_present_flag] controls more than just whether [operating_parameters_info] is present: It also influences the presence of [temporal_point_info] which is AV1's way of storing the timestamps (possibly VFR) of the video. The real condition whether the `Sequence Header OBUs` of a CVS can change is whether [decoder_model_present_for_this_op[i]] is 1 for an operating point. But in my proposal it doesn't matter anymore anyway, as the way of specifying when a `Sequence Header OBU` has to be present is entirely independent of the one track = one CVS restriction (it would work just as well if this restriction were dropped). b) The current proposal mandates that one track can only consist of one (subset of a) CVS in a suboptimal manner (if it mandates it at all): It says: "Matroska doesn't allow dynamic changes within a codec for the whole Segment. The parameters that should not change for a video Track are the dimensions and the CodecPrivate." The `CodecPrivate` can't change in a valid Matroska track anyway, because it can only have one. Therefore this requirement is empty and the only requirement of one track = one CVS is in the next sentence: "The first Sequence Header OBU of a CVS is stored in the CodecPrivate of a Track. So this AV1 Track has the same requirements as the CVS." Here the "so" is currently a non-sequitur. Better wording: "Matroska doesn't allow dynamic changes within a `Track` for the whole `Segment`. Therefore the `Sequence Header OBUs` of the `Track` (both the one in the `CodecPrivate` as well as the in-band `Sequence Header OBUs`) MUST adhere to the restriction that the `Sequence Header OBUs` of a `CVS` must fulfill: Their content MUST be bit-identical each time a `Sequence Header OBU` appears except for the contents of [operating_parameters_info]. Furthermore the dimensions of all output frames MUST be equal." c) Here is what the proposal currently has to say about `Sequence Header OBUs`: "Sequence Header OBUs SHOULD be omitted when they are bit-identical to the one found in CodecPrivate and [decoder_model_info_present_flag] is 0 and the previous Sequence Header OBUs in the bistream was also bit-identical to the one found in CodecPrivate. They can be kept when encryption constraints require it." "The first Sequence Header OBU of a CVS is stored in the CodecPrivate of a Track. So this AV1 Track has the same requirements as the CVS." "If the [decoder_model_info_present_flag] of this Sequence Header OBU is set to 1 then each keyframe Block MUST contain a Sequence Header OBU before the Frame Header OBUs." "Given a Sequence Header OBU can be omitted from a Block if [decoder_model_info_present_flag] is 0 and it is bit identical to the one found in CodecPrivate, when seeking to a keyframe, that omitted Sequence Header OBU MUST be added back to the bitstream for compliance with the Random Access Decoding section of the AV1 Specifiations." "A SimpleBlock MUST be marked as a Keyframe only if the first Frame OBU in the Block has a [frame_type] of KEY_FRAME and the SimpleBlock contains a Sequence Header OBU or if the Sequence Header OBU is correctly omitted (see above)." "A Block inside a BlockGroup MUST use ReferenceBlock elements if the first Frame OBU in the Block has a [frame_type] other than KEY_FRAME or the Block doesn't contain a Sequence Header OBU when it should not be omitted." Clause 1 and clause 4 only apply when [decoder_model_info_present_flag] is 0. In this case (by our requirements of each track only containing a single CVS) all `Sequence Header OBUs` must be bit-identical; in particular, the part of clause 1 dealing with the previous `Sequence Header OBU` is unnecessary because it is automatically fulfilled. Clause 3 directly contradicts your claim that the current proposal already allows to strip away some of the `Sequence Header OBUs` when they are bit-identical to the one in the `CodecPrivate` and redundant for linear access. Clause 2 is actually an unnecessary restriction. It might be that the very first `Sequence Header OBU` is rare and that one could save more bytes when putting a different one in the `CodecPrivate`. In any case I fail to see why the first `Sequence Header OBU` should be treated specially in the standard; that it allows for simpler muxers will of course make most muxer put the first `Sequence Header OBU` in the `CodecPrivate`, no doubt about that. But if one requires this, then splitting AV1 tracks in Matroska will be more complicated: One has to change the `CodecPrivate` to what is the `Sequence Header OBU` used by the first frame. Even ignoring this I regard the whole requirements as confusing. The requirements should not be stated in terms of [decoder_model_info_present_flag] (or [decoder_model_present_for_this_op[i]]); instead one should state them as a corollary of the general principle that the necessary codec initialisation data/extradata must be available to the decoder during linear access as well as after random access (to a keyframe) and not hide this fact behind some conditions involving the aforementioned flags. It's much more intelligible this way. And it encourages muxers not to hard-code to treat the cases of [decoder_model_info_present_flag] being 0 or 1 (or [decoder_model_present_for_this_op[i]] being 0 for all applicable i or not) separately, but to use a system that works even if multiple CVS were allowed in the future. d) Here is a proposal for said section in case we make it a MUST that keyframes must have the correct `Sequence Header OBU` (whether because it is there or because it is prepended from the `CodecPrivate`): "Upon seeking to a random access point (RAP) or starting playback from the beginning the player/demuxer MUST prepend the `Block` where decoding starts with the `Sequence Header OBU` contained in the `CodecPrivate`. Afterwards playback proceeds normally including consuming any `Sequence Header OBUs` that are found in the bitstream. Seeking to a non-RAP is undefined and not recommended. A muxer MUST make sure that when using a conformant demuxer/player the correct `Sequence Header OBU` is active both during linear access and also after seeking to a random access `Block`. In particular, if a `Sequence Header OBU` that differs from the `Sequence Header OBU` in the `CodecPrivate` is active during consuming of the first `Frame Header OBU` of a random access point sample, then said sample MUST contain said `Sequence Header OBU` in front of the first `Frame Header OBU`. The muxer MAY omit (strip away and discard) `Sequence Header OBUs` provided the above criteria are still fulfilled afterwards. Note: A `Sequence Header OBUs` that fulfills any of these criteria can be omitted without changing compliance to the above criteria: If, after potentially stripping away all `Temporal Delimiter OBUs` there are more than one `Sequence Header OBUs` immediately after each other, then all but the last of these `Sequence Header OBUs` can be omitted. A `Sequence Header OBU` that differs from the `Sequence Header OBU` in the `CodecPrivate` can be omitted if there was a preceding `Sequence Header OBU` in the bitstream, if the preceding `Sequence Header OBU` was bit-identical to the current `Sequence Header OBU` and if the current `Sequence Header OBU` is not the first OBU after a `Temporal Delimiter OBU` that starts a temporal unit whose corresponding `Block` is a random access point. A `Sequence Header OBU` that is bit-identical to the `Sequence Header OBU` in the `CodecPrivate` can be omitted if there was no preceding `Sequence Header OBU` or if there was a preceding `Sequence Header OBU` and the previous `Sequence Header OBU` was bit-identical to the current `Sequence Header OBU`." If this is adopted, clauses 2-4 from c) above should be omitted and clauses 5 and 6 can be adapted, too. That's it from me today. - Andreas Rheinhardt
- [Cellar] AV1 mapping update Steve Lhomme
- Re: [Cellar] AV1 mapping update Andreas Rheinhardt
- Re: [Cellar] AV1 mapping update Steve Lhomme
- Re: [Cellar] AV1 mapping update Steve Lhomme
- Re: [Cellar] AV1 mapping update Andreas Rheinhardt
- Re: [Cellar] AV1 mapping update Steve Lhomme
- Re: [Cellar] AV1 mapping update Steve Lhomme
- Re: [Cellar] AV1 mapping update Timothy B. Terriberry
- Re: [Cellar] AV1 mapping update Andreas Rheinhardt
- Re: [Cellar] AV1 mapping update Timothy B. Terriberry
- Re: [Cellar] AV1 mapping update Andreas Rheinhardt
- Re: [Cellar] AV1 mapping update Timothy B. Terriberry
- Re: [Cellar] AV1 mapping update Timothy B. Terriberry
- Re: [Cellar] AV1 mapping update Steve Lhomme