Re: [Cellar] AV1 mapping update
Andreas Rheinhardt <> Wed, 11 July 2018 13:48 UTC
Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 43F18130DC9 for <>; Wed, 11 Jul 2018 06:48:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id qSC5XIH8DNa9 for <>; Wed, 11 Jul 2018 06:48:13 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4864:20::42f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 1F7BC130E19 for <>; Wed, 11 Jul 2018 06:48:13 -0700 (PDT)
Received: by with SMTP id b15-v6so18280305wrv.10 for <>; Wed, 11 Jul 2018 06:48:13 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=subject:to:references:from:message-id:date:mime-version:in-reply-to :content-transfer-encoding; bh=M1ETHoJD99r24YvH9zYnG7QAEeUXb0ON7oKZMe1OYok=; b=dQ03co0KTsEmMlkd8Fd3OOgdn/R9GHsmiLtK7DdIKd060zom4gWzDcsdQZYDjVqxjQ bfYLmCNY4LnRjbCp5nXQyIACjux2d8Yux15cejpYh183P7D5Iv+0ifgPmvrk9k79/7oP 6+If4Dq7Odv7tvleS0z210hcJLFX1sHybWzo1xuNLqrFPTbFzmHRkt2RfqWeRQv4pn8E AHKrCRx4357JLbr3qVFTHYvzWbGCAT+Owec4Up2HCLGNSlTOnyZpgGvwgqVzNiNVJGqC aN7gRfKnjA31PzWKSEMJx1ZUAuenXpNBKmxTA68S4DL1wLPv9k7erknnV3yHOKMtEl7r W4TA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-transfer-encoding; bh=M1ETHoJD99r24YvH9zYnG7QAEeUXb0ON7oKZMe1OYok=; b=FNYXct0djluQYEuKgwRj/zFKm2o7T++SR8Hu9jwJAZZB59FaaSIcC+ne6LPbSeSXQd qywPHB1/PYrTOXFSdpvK5hzsPHVy71QzI1IY1GiWxYt9+yClXiAjiG49PBqY3xasc52i a5KnUHwNCX58m2eAB4YZAWMl199DS5N6ve7SPOB2AxPL0BtuB/Fu1V701ucFoQmjAQ3u 8yU5pUR69rmXlfEbGNHEdu2Jyga/aeRu0GAeh4aWDlGYqBoOgXwniznXxqdIXnBPqxeA 4J3NiVbp8xmMdUMfCRqd4EFKlKROAP02amJS6wQfpdnzIqjbVtmGrR4Rqn9Fx9IHlMnZ TJZw==
X-Gm-Message-State: APt69E0prjOY0UVG4pFe4WJbbOrlZOGoQnuOc0o1NK0UEGgY19m59zYe x5ufRmf/Qm9+q7PICci+qEykAd4M
X-Google-Smtp-Source: AAOMgpehknHN3MOaSfTBxFdE8Cyy8r2iLjgPuMLfMCERkiupHeNaoqyUr4qQfSk33rlrJ7zdr1Hv9A==
X-Received: by 2002:adf:ac66:: with SMTP id v93-v6mr20251637wrc.7.1531316891238; Wed, 11 Jul 2018 06:48:11 -0700 (PDT)
Received: from [] ([]) by with ESMTPSA id y203-v6sm3321849wme.42.2018. for <> (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 11 Jul 2018 06:48:10 -0700 (PDT)
References: <>
From: Andreas Rheinhardt <>
Message-ID: <>
Date: Wed, 11 Jul 2018 13:47:00 +0000
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
Archived-At: <>
Subject: Re: [Cellar] AV1 mapping update
X-Mailman-Version: 2.1.27
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 11 Jul 2018 13:48:18 -0000
Steve Lhomme: > I updated the AV1 mapping to clean a few sentences. > > > > and the list of changes can be found here > > 1. Whether `DisplayWidth` and `DisplayHeight` needs to be written actually depends on the value of `DisplayUnit`. 2. You forgot `OBU_PADDING` in the list of OBU types that mustn't be in the `CodecPrivate`. (Either that or your sentence that only `OBU_SEQUENCE_HEADER` and `OBU_METADATA` are currently allowed in the `CodecPrivate` should be changed.) 3. "They SHOULD have the [obu_has_size_field] set to 1 except for the last OBU in the sample, for which [obu_has_size_field] MAY be set to 0, in which case it is assumed to fill the remaining of the sample." "The OBUs in the Block MUST follow the [Low Overhead Bitstream Format syntax]." The first sentence leaves the possibility that [obu_has_size_field] is 0 for OBUs other than the last OBU of a block (only a SHOULD). And the requirement in the second sentence actually makes MUST out of the SHOULD in the first sentence (making this part of the first sentence redundant) and contradicts/voids the MAY part of the first sentence. In other words, the two sentences should be merged to something like: "The `OBUs` in the block must follow the `Low Overhead Bitstream Format` (in which [obu_has_size_field] MUST be equal to one for every OBU) for every `OBU` with the possible exception of the very last `OBU` in which [obu_has_size_field] MAY be set to 0, in which case the `OBU` is assumed to consist of the remainder of the block." 4. "ReferenceBlocks inside a BlockGroup MUST reference frames according to the [ref_frame_idx] values of frame that is neither a KEYFRAME nor an INTRA_ONLY_FRAME.": The problem with this sentence is that [ref_frame_idx] needn't be present. It depends upon [frame_refs_short_signaling] and [show_existing_frame]. If one uses a Block inside a Blockgroup and if [show_exsting_frame] equals one one should reference the block that contained the showable frame that is now output (and that this should be the only `ReferenceBlock` written). In case of [frame_refs_short_signaling] == 1 the obvious candidates for `ReferenceBlocks` are the blocks containing the `last_frame_idx` and `gold_frame_idx` that are explicitly signalled. If I am not mistaken, then there are also other reference frames that are not explicitly signalled, but computed. I don't know if we should really write a `ReferenceBlock` entry for every reference as the current proposal seems to imply. This would be quite a bit of overhead for no gain (and furthermore, it would complicate muxers that would have to compute the references that are not explicitly signalled in case that [frame_refs_short_signaling] is 1). One `ReferenceBlock` would be enough to distinguish keyframes from non-keyframes. By the way: If a temporal unit contains multiple frames with references, whose references should end up as `ReferenceBlocks`? Or may the muxer choose some? 5. AV1 may use spatial scalability and/or temporal scalability. What do we make of these? They are currently not forbidden if I am not mistaken, but if e.g. the spatial dimensions of different layers disagree, the `PixelWidth` and `PixelHeight` values can't be true for all layers. Matroska seems to be missing some features here. 6. Depending on [frame_size_override_flag] there is even the possibility that the size of the frames differs even without scalability (if I am not mistaken). Should this be allowed? 7. Then there is another thing with keyframes and cues (for this point it is always presumed that the relevant sequence header OBUs are available regardless of whether this is done in-band or via CodecPrivate): a) The proposal currently does not take into account that key frames reset the decoder when they are output, not when they are decoded. A key frame needn't be immediately output; if it is (i.e. [show_frame] equaling 1), it is called a "key frame random access point" in section 7.6 of the standard and is the equivalent of an IDR frame in H.264. Everything's fine here. But a key frame can also be declared a showable_frame (but only if [show_frame] equals 0) and output later via the show_existing_frame mechanism. This is similar to an open GOP in other codecs (but in contrast to them, the block that contains the coded keyframe doesn't have the same timestamp (pts) as the first frame that can be output after a seek). The coded key frame with [show_frame] equal to zero is called a delayed random access point and a key frame dependent recovery point is a frame where a key frame with [showable_frame] equal to 1 is output via the show_existing_frame mechanism. If one starts decoding at the delayed random access point, all the output frames up to but not including the key frame dependent recovery point can depend both on the delayed random access point frame and on other earlier frames so that these frames can't be correctly decoded in general. But all the frames from the key frame dependent recovery point onwards can be correctly decoded if one starts decoding at the delayed random access point (because the decoder is reset after displaying the key frame). If one starts decoding at the key frame dependent recovery point, one doesn't have the key frame that should be shown via the show_existing_frames mechanism at all, so that this frame is simply not a real key frame. b) But although a key frame dependent recovery point is not a "real" key frame, it has the same [frame_type] as the frame that is output, i.e. its [frame_type] is KEY_FRAME. According to our current proposal this would mean that it should be treated as a keyframe in Matroska which is obviously wrong. c) Marking a delayed random access point as keyframe deviates from the way that flag has been traditionally understood: If one starts decoding at this point, one doesn't get the frame that should be output for the temporal unit containing the delayed random access point. But I nevertheless think that these are the right keyframes, because they are the points at which random access has to begin when there aren't key frame random access points available; this also means that one can split the stream at this point and the second part will still play so that a muxer like mkvmerge needn't be rewritten too much. d) A consequence of this is that a `Blockgroup` containing a delayed random access point mustn't contain a `ReferenceBlock` (although the actual frame that is output for that temporal unit very likely uses other reference frames than the key frame that is contained in the same temporal unit). e) Yes, this proposal means that it is impossible to tell from Matroska alone (well, from the block structure that is; see f) for a way for which one could put this information into the Cues) whether it is a key frame random access point or a delayed random access point. One will have to decode it (or parse deeper) to know. f) This also leads to problems with seeking: If one simply added a CuePoint for the keyframe (i.e. for the delayed random access point) and the user wants to seek to a point between the delayed random access point (inclusive) and the dependent recovery point (exclusive) and the player used the cues to seek to the nearest keyframe in front of the desired point, then decoding at the point referenced in the cues would not yield the desired frame (it would be either corrupted or not output at all). Therefore I think it is best to add a CuePoint for every key frame random access point and every key frame dependent recovery point. The CuePoint for the key frame random access point would be an ordinary CuePoint as usual. But the CuePoint for the key frame dependent recovery point wouldn't be (my favourite is iv) (and if I were allowed to play God it would be i))): i) A comprehensive way of doing it is this: The CueTime would be the timestamp of the block containing the dependent recovery point; it would include a CueTrackPositions for the video track we are talking about that contains the right CueTrack, the CueClusterPosition containing the position of the dependent recovery point block and a CueReference with CueRefTime and CueRefCluster, both corresponding to the valus of the delayed random access point. This proposal has several downsides: It uses Cue elements that are deprecated in Matroska and not part of Webm. So this would require a quite nontrivial change in both projects. (Btw: If one does this, one should add a default value for `CueRefCluster`: It should be the same as `CueClusterPosition` as both blocks that we are talking about will probably end up in the same cluster anyway.) ii) One uses the CueTime of the dependent recovery point, but the position of the Cluster of the delayed access point (and `CueRelativePosition` (if used) should also point to this block). Pro: It only uses elements that are supported by both Matroska and Webm. Furthermore, the specs only say that `CueClusterPosition` should point to the cluster containing the "required block; they don't explicitly say that said block needs to have the same timestamp as `CueTime`. Contra: How does a demuxer know from which block onwards it should feed the data to the decoder? It might use the `CueRelativePosition`, but probably a lot of demuxers would simply read the cluster until they come to the block with timestamp `CueTime` (i.e. they interpret the specs so that the "required block" is the block with the timestamp `CueTime`) and then they would either deliver this to the decoder or conclude that the file is damaged (because the block they found is no keyframe). iii) The last is the same as i) with the difference that `CueRefCluster` is omitted. It is also incompatible with current Webm, but at least it has the advantage that it doesn't use any currently deprecated elements of Matroska. One could add a requirement that the delayed random access point and the dependent recovery point need to be in the same cluster and then omitting `CueRefCluster` is not a problem any more. iv) And then there is the possibility of creating a normal CuePoint for the dependent recovery point, writing the dependent recovery point as a Block in a Blockgroup with exactly one ReferenceBlock which points to the delayed access point block and let the demuxer seek backwards from the dependent recovery point to the delayed access point. Pro: Would only use things that are already supported by Matroska and Webm. It would also not be AV1 specific. The demuxer doesn't need to know anything about AV1, everything is signalled at the container level. Contra: Demuxers would have to be adapted not to expect any more that only keyframes are referenced in the cues. They would also have to be adapted to actually make use of the value of `ReferenceBlock` and seek backwards. This also implies more seeks, but this should be quite limited when one puts both the delayed random access point and the dependent recovery point in the same cluster -- hopefully the data is still cached. (Maybe one should add a SHOULD clause that says that both blocks should be in the same cluster.) g) Of course there are two easy alternative solutions: i) Restrict the type of AV1 that is allowed in Matroska even further so that all key frames are of key frame random access type. (This could exclude quite a lot of AV1 and therefore I recommend not doing so.) ii) Create cues as usual, i.e. reference every delayed random access point, and don't care about the fact that seeking will be partially broken in this case. h) It should be noted that exactly the same situation exists with periodic intra refresh in general. There was a short discussion on the Matroska developer mailing list in April 2011, but nothing came out of it. Every solution I outlined here for AV1 is also applicable for this case. Steve Lhomme: > Since we allow stripping the Sequence Header OBU from the stream when > it's equal to the CodecPrivate one, we need to add it back to the > bitstream for compliance. At least when seeking on keyframes. So I > added a section to explain that. > > IMO that's an extra feature of the CodecPrivate that it's meant to be > added to the bistream as-is. And in this case on startup and when > seeking. I wonder if we should add an element next to the CodecPrivate > to describe that. Because in this case it's not entirely opaque to the > demuxer. Or maybe it's implied by the CodecID and is up to the decoder > to use it how it's supposed to be (in this case detecting keyframes > and possibly adding back the Sequence Header OBU). 8. I think we can relax the requirements on the existence of in-band sequence header OBUs a bit: If a keyframe (i.e. a key random access point or a delayed random access point, not a dependent recovery point) uses the same sequence header OBU as in the CodecPrivate (including the same operating_parameters_info), then the sequence header OBU needn't be prepended to the block with the keyframe, because seeking already works without it provided one always adds the sequence header OBU from the CodecPrivate back in the bitstream on seeking. For example, consider the following scenario: One has an elementary stream that uses two different sequence header OBUs A and B that only differ in the operating_parameters_info. The first three keyframes use A, between the third and the B is contained in a temporal unit between the third and the fourth keyframe. Between the sixth and the seventh keyframe is a temporal unit containing sequence header A again. Then a muxer that wants to put this elementary stream into Matroska may put A in the CodecPrivate can strip the very first occurence of A away; it must leave B inside the temporal unit that it was in (so that a player that plays the file linearly is notified about the change) and has to make sure that keyframes #4 to #6 contain sequence header B (so that one has the correct sequence header when seeking to said keyframes). It mustn't strip A between the sixth and the seventh keyframe away (so that a player that plays the file linearly notices the change of sequence header), but it needn't preprend keyframes #7 and following with A. That way one can save a few bytes. This is consistent with an interpretation of the CodecPrivate as the default extradata/header (but it is not necessarily a truly global header). Before `CueCodecState` was deprecated it had a default value of 0 that mandated that one should look in the CodecPrivate for the CodecState upon seking. So the only thing specific to AV1 in this case is the "as-is" part; that one should reset the decoder to whatever initialization information is contained in the CodecPrivate upon seeking is nothing new. 9. Image that future AV1 encoders would find out that changing the sequence header (with a change of the CVS) enables better compression. What would we do in this case? Simply lift the restriction of one track = one CVS even when this means that some players that can't cope with changing coded video sequences won't know in advance whether they can play the files at all? I'm asking because we don't include a version field in the CodecPrivate (in contrast to how it is done in mp4 with avcC). Steve Lhomme: > Let me know what you think so we can settle this spec for good. > I think we are not even close to settling this for good. - Andreas Rheinhardt
- [Cellar] AV1 mapping update Steve Lhomme
- Re: [Cellar] AV1 mapping update Andreas Rheinhardt
- Re: [Cellar] AV1 mapping update Steve Lhomme
- Re: [Cellar] AV1 mapping update Steve Lhomme
- Re: [Cellar] AV1 mapping update Andreas Rheinhardt
- Re: [Cellar] AV1 mapping update Steve Lhomme
- Re: [Cellar] AV1 mapping update Steve Lhomme
- Re: [Cellar] AV1 mapping update Timothy B. Terriberry
- Re: [Cellar] AV1 mapping update Andreas Rheinhardt
- Re: [Cellar] AV1 mapping update Timothy B. Terriberry
- Re: [Cellar] AV1 mapping update Andreas Rheinhardt
- Re: [Cellar] AV1 mapping update Timothy B. Terriberry
- Re: [Cellar] AV1 mapping update Timothy B. Terriberry
- Re: [Cellar] AV1 mapping update Steve Lhomme