Re: [Cellar] shepherd review of Matroska: part 1

Steve Lhomme <> Sun, 12 June 2022 08:01 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 8209BC15AADA for <>; Sun, 12 Jun 2022 01:01:47 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -8.283
X-Spam-Status: No, score=-8.283 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-1.876, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URI_NOVOWEL=0.5] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id LfiZfnD7nXpj for <>; Sun, 12 Jun 2022 01:01:43 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4864:20::32c]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by (Postfix) with ESMTPS id E5512C15AAD9 for <>; Sun, 12 Jun 2022 01:01:42 -0700 (PDT)
Received: by with SMTP id r123-20020a1c2b81000000b0039c1439c33cso1565116wmr.5 for <>; Sun, 12 Jun 2022 01:01:42 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :references:from:in-reply-to:content-transfer-encoding; bh=gqkc3dWt/4Xxqw2Nhnx2Ng9sKj1zATAhM6OX0ciqD/k=; b=g76+NC4QSUjr0nOS3bNtZFEVXey1yrfqk+VAyQfE70p1gZvwxCguQZkUqXg+eGgEIM sOtIug3bCF7fvsQqE/ZgQcZG6W+JgefrAX5Mi5P11dSATU8vWf8aCuyz/cfizzAX10Jq CouweeOTJE8h3DhFHlM+TqJoSOMzrgjFi7qH8Fmuoyr4sNLatItHLOQ8EF7ic7e8T4Vm 3OUOTKp8xDrF3gQ0X6/qS5JNMrjcWn6MdRagAvrhqluYhb9BlUew6WGKMVrhW//tTam8 TIvAH+84+9lhgBnOj/fIILhE+LrMSieMEw77r4h0PWeCysUPYrLNjvScmOT7ObDv+uXk Emkg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=gqkc3dWt/4Xxqw2Nhnx2Ng9sKj1zATAhM6OX0ciqD/k=; b=72UPn1oEzxXw0UMd8MrJeZlMUyzA10fc/NPOFzmRsIQhk1rWuiGbIV3t0VTQeFvuI/ 5ri4NxKjLvI9K/0lmkBdYi9YBGthFqbGnx8rTtSHz+Rwz+8ZFguUyCYLmx636IoYv7F4 /nOC2AWgoMjbhN+kBrJ/Ts2mwb8L7uc3ssSxzK1xVBoOL7CCxYuH8dQ6XvZ0T1iDMnmr md4xhaIoJj0w4b6P7M1MF/xCk7E8YPV1FhwXsuwiF2FECzhxhAaMQQlYPj3w82QYLkSc TTPm6pmJDdMpxLb7Y8BXULP+mOENl5RrjvbrgX2jWS+phk/ZkTZya4gZdKNJVuCWx2N8 VE2w==
X-Gm-Message-State: AOAM533mZHiIN8vhLD4ny8aBqFUWUaPWndjsOtoKLd3zGOz1/KOqteyv KIm6SuRB2SJLXKVBgdL0Dnrj9dja8Sgi3A==
X-Google-Smtp-Source: ABdhPJywiOuCBFgCnD1KknLD3//YqYsbEyLndcQ5PSHwXkVLMxNL6PwyRzZgkCRQ2Xy6zno4oFATzQ==
X-Received: by 2002:a05:600c:3493:b0:39c:8731:84c3 with SMTP id a19-20020a05600c349300b0039c873184c3mr5480515wmq.45.1655020900827; Sun, 12 Jun 2022 01:01:40 -0700 (PDT)
Received: from ?IPV6:2a01:cb0c:20:e900:b431:3971:67c5:8ba6? ( [2a01:cb0c:20:e900:b431:3971:67c5:8ba6]) by with ESMTPSA id m7-20020a5d4a07000000b00213abce60e4sm4631130wrq.111.2022. for <> (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 12 Jun 2022 01:01:40 -0700 (PDT)
Message-ID: <>
Date: Sun, 12 Jun 2022 10:01:39 +0200
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1
Content-Language: en-US
References: <1216608.1654521891@dooku>
From: Steve Lhomme <>
In-Reply-To: <1216608.1654521891@dooku>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <>
Subject: Re: [Cellar] shepherd review of Matroska: part 1
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 12 Jun 2022 08:01:47 -0000

Hi Michael,

Thanks a lot for your lenghty review. My comments below:

On 2022-06-06 15:24, Michael Richardson wrote:
> This email is based upon my top-to-bottom read of the document as part of the
> Shepherd review.  I'm concerned mostly about things that will attract
> IESG comments.
> I shall come back to this email and reply with issue numbers, but I'm working
> mostly offline as I read this, but my preference would be to discuss most
> issues on the list.
> I got as far as Section 6 in this email, and I continued in more emails.
> section 1:
> I'm concerned about the claim "THE" format.

That's a leftover from when it was on the website. I propose to replace by:
"Matroska is a multimedia container format."

> I suspect that it may have already achieved this, but I wonder about how to
> support that statement.

That was just the original goal. IMO it has no place in a technical 

> "Menus (like DVDs have)"        <- is an informative reference useful?

There are plenty of references to the DVD system/format. It shows at 
what period the format was defined. Things have changed a bit since then...
The Wikipedia references 
( of the DVD 
books are not even available online anymore.

Some of our references are non technical. Maybe we don't need a 
reference document for that. For the technical ones (menu) we could use 
a reference to the "DVD-Video Book". This PDF shows the version numbers. 
I'm not sure we can get more than that.

> the section:
> "Matroska is an open standards project. This means for personal use it is
> absolutely free to use and that the technical specifications describing the
> bitstream are open to everybody, even to companies that would like to support
> it in their products. "
> I'm not sure why the relevance of "personal use". In fact, it's a bit of a concern!
> I suggest we write:
>    "Matroska is an open standards project published as an IETF Standard Track
>    RFC.  As per the terms of BCP78 [RFC5378], the technical specifications
>    describing the bitstream are open to everybody.  This specification falls
>    under the terms of BCP79 with respect to IPR claims.  While there are
>    patent claims associated with some CODECs that can be contained within
>    a Matroska container, the container itself is free of all such claims"

Sounds good.

> (Assuming that this is true)

As far as I know yes. A lot of things that we deprecated were also put 
in there to avoid patents (or deter them) on some ideas we had, creating 
a prior art for it.

> section 2:
> "This document covers Matroska versions 1, 2, 3 and 4. Matroska v4 is the
> current version. Matroska 1 to 3 are no longer maintained. No new elements
> are expected in files with theses version numbers. There MAY be further
> additions to Matroska v4."
> Should we references places/points when version 1,2,3 were
> published/stabilized?   Something like:
>    } This document covers Matroska versions 1, 2, 3 and 4.
>    } Matroska v4 is the current version.
>    } Matroska v1 first appeared around YEAR in PRODUCT [reference].
>    } Matroska v2 first appeared around YEAR in PRODUCT [reference].
>    } Matroska v3 first appeared around YEAR in PRODUCT [reference].

I don't think this versioning is as clear cut as you put it. Nor can we 
trace back products in which some Matroska features appeared.

>    } Matroska 1 to 3 are no longer maintained.
>    } No new elements are expected in files with version numbers 1,2, or 3.
>    } New additions are expected to be added to Matroska v4 via the Extensions
>    } mechanisms described in Section 25. IANA Considerations.

I'm not sure what you refer to here. Section 25 is the IANA 
considerations. To extend Matroska v4, one just creates an element with 
an ID. But it should either be done through the IETF to update the v4 
RFC, or it goes in v5. Or it's not part of Matroska.

We currently don't support defining extra elements out of the spec that 
could be mentioned in the EBML header.

> section 4.4:
>     "The Matroska EBML Schema defines eight Top Level Elements: SeekHead,
>     Info, Tracks, Chapters, Cluster, Cues, Attachments, and Tags."
> maybe insert forward references?  Maybe arrange it as a bullet list?
>     The Matroska EBML Schema defines eight Top Level Elements:
>          * SeekHead []
>          * Info [],
>          * Tracks [],
>          * Chapters [],
>          * Cluster [],
>          * Cues [],
>          * Attachments [],
>          * Tags []
>     which may appear in any order, and may be repeated.


> that would flow better into the next paragraph that explains that there is an
> index.  The next paragraph says that the MetaSeek is RECOMMENDED (which is
> aka SHOULD).
> When we write SHOULD there is usually some exception conditions by which the
> writer might omit that.  I think that it is a MUST that the reader/player be
> able to cope without the MetaSeek?

Isn't RECOMMENDED slightly less enticing than SHOULD ? In the case of 
MetaSeek it cannot be MUST because it's not usable in a "live" (size 
unknown) stream. I think in all other cases it's beneficial to add it. 
But I think very simple writers (maybe the one found in some codec 
libraries) shouldn't have to go through the trouble of supporting it 
just for making parsing slightly faster.

Maybe that could be another expection for the SHOULD. But that means we 
have to define what is an intermediate or work file. And since it's 
normative we have to be very careful in the explanation.

So I would prefer RECOMMENDED and explain that it helps seeking in 
files. This is more or less what we have already:

"Without a SeekHead Element, a Matroska parser would have to search the 
entire file to find all of the other Top Level Elements. "

> Figure 4 suggests that only Audio/Video components are supported. I think
> that's not the case.  Could the diagram imply something else, or some text
> say so?

There are some subtitle tracks. But they don't have sub sections like 
Audio or Video Elements because there was never a need for that.

> Please review all the SHOULD, I suspect many of them are MUSTs.
> (think: is there a reasonable exception for the writer?)
> Perhaps, instead of specifying what the file SHOULD contain (with a vague
> exception), can we say what a reader MUST tolerate?
> For instance:
>     There SHOULD be one or more BlockGroup or SimpleBlock Element in each
>     Cluster Element.
> Is there a case where a Cluster Element might contain ZERO BlockGroup's or
> SimpleBlocks?    Maybe just saying:
>     Cluster Element contain one or more BlockGroup or SimpleBlock Elements.
> My guess:
>     In some situations (live recordings which are never unpaused???) a Cluster
>     Element MAY be empty if no data was collected.

So far a Cluster exists to put BlockGroup or SimpleBlock in it. But 
maybe in the future there could be others (in the past we had 
EncryptedBlock too). Restricting the list to only those means future 
files with other elements wouldn't be valid according to these v4 spec ?

Maybe something slightly more vague like

"Cluster Elements contain one or more block Elements, like BlockGroup or 
SimpleBlock Elements."

There could be a MUST or a SHOULD in there, otherwise it's not really 
normative. Or maybe we don't want to make it normative to allow more 
variants of block in the future ? In that case your proposal might be 

> Another:
>     The Timestamp Element SHOULD be the first Element in the Cluster.

In it says

"This element SHOULD be the first child element of the Cluster it 
belongs to, or the second if that Cluster contains a CRC-32 element"

Maybe we can copy that text in the more generic text ? Or move it there?

> Stupid question: can a file contain audio and video in different codecs?
> For instance, can the English audio be in CODEC A, while the German is in
> CODEC B?   Can more than one audio or video track be played at the same time?

The player could decide to play everything at the same time. But it is 
assumed that the player/user only selects a single track.

We do have TrackOverlay ( in which case you can have 2 tracks 
used at the same time.

> section 5.
> I really hate the layout of these sections.
> The empty paragraph line between items makes for far too much vertical space
> in the HTML, and the same in the TXT.  I wonder if we can turn these into
> some kind of table.
> Could the definition go just after the name?

This is generated from the XML file. So changing to format is (mostly) 
easy. We used that format for EBML:

> Like, I'd want to read it like this in text:
> name:  Segment         id:       0x18538067
> path:  \Segment
> definition:
>     The Root Element that contains all other Top-Level Elements (Elements
>     defined only at Level 1). A Matroska file is composed of 1 Segment.
> minOccurs: 1     maxOccurs: 1  type:  master
> unknownsizeallowed: 1

We can compress those vertically. I would put the type earlier as it's a 
major information you want to have quickly.

We can also omit default values, like minOccurs=0, unknownsizeallowed=0, 
recursive=0, etc

> Usage notes:
> I think that "unknownsizeallowed" is really a true/false?

unknownsizeallowed, recursive, recurring are boolean attributes. We can 
write true/false for more clarity/less confusion.

> What is the difference between Usage Notes and Rationale?
rationale 	An explanation about the reason or catalyst for the 
definition of the Element.

usage notes 	Recommended practices or guidelines for both reading, 
writing, or interpreting the Element.

> (Does it have an "e" in English?)

Yup :)

> About:
>    definition:
>    The Segment Position of the Element.
> maybe it could say relative to what?
> I think the units are bytes?
> Should the units be part of the type?

There are actually a few elements that use that terminology. I'll have 
to review them to clarify each.

> SegmentUID Element
> name:  SegmentUID
> I wonder if it's really a *uuid*?

It's a "A randomly generated unique ID to identify the Segment amongst 
many others (128 bits)"

A UUID is "A universally unique identifier (UUID) is a 128-bit label 
used for information in computer systems. The term globally unique 
identifier (GUID) is also used."

So yes indeed, it is exactly a UUID, stored in binary.

We should reference RFC 4122 for that. We could also rename the element.

> Should SimpleBlock be listed after Block?

I think it's better to have it before otherwise it would be lost after 
all the sub elements of BlockGroup. SimpleBlock is at the level of 
BlockGroup, not Block.

> } BlockAddID Element
> }name: BlockAddID
> }definition:
> }An ID to identify the BlockAdditional level. If BlockAddIDType of the
> }corresponding block is 0, this value is also the value of BlockAddIDType for
> }the meaning of the content of BlockAdditional.
> Something tricky here, needs more explanation, or a forward reference to other
> discussion...

A paragraph explaining how to use the BlockAddition feature is probably 
needed. And then we reference to that section.

> } TrackTimestampScale Element
> }name: TrackTimestampScale
> }definition: DEPRECATED, DO NOT USE.
> should it go into the Appendix?

No, that text should be removed. It's used for Track ticks and Block 
Timestamps, although the value is usually 1. Unlike Deprecated elements 
that we assume noone used, this one is used. Maybe not in files but also 
in code to process the data.

We could use a "usage notes" to explain that people should not use it.

> } LanguageIETF Element
> }name: LanguageIETF
> I suggest you name it LanguageBCP47, which would be a better reference.
> (or LanguageIETFBCP47 )


> } TrackTranslate Element
> }rationale:
> }Chapter Codec may need to address content in specific track, but they may not know of the way to
> s/Chapter Codec/The Chapter Codec/
> s/in specific track/in a specific track/
> s/they/it/


> } StereoMode Element
> has an enumerated value.  It may need to have some IANA Considerations.

Should that be the case for all the enums we define ? (others are just 
values found in ISO specs)

> I'm also concerned that section 18.10 has some v2-compatibility stuff kinda
> hidden in it.

How come ? StereoMode is v3, so relying on v2 elements is fine.

> Should 0x53B9 be mentioned in this section too?

In 18.10 yes. And the DEPRECATED DO NOT USE should be a "usage notes".

>    "The value of this Element SHOULD be kept the same when making a direct stream copy to another file."
> I wonder if the elements that should be kept identical when making a direct
> stream copy need to be marked more clearly.
> Or maybe I should ask: when would the element not be copied exactly?
> Maybe we can save some "ink" here and make this the default?

This is off by default. 85 out of 256 elements are using this feature. 
To save some ink we can just say Stream Copy: True and link to a section 
explaning what happens during Stream Copy (missing right now).

> } UncompressedFourCC Element
> }usage notes:
> }    This Element MUST NOT be used if the CodecID Element of the TrackEntry is
> }    set to "V_UNCOMPRESSED".
> }
> }Table 12: UncompressedFourCC implementation notes
> }attribute	note
> }minOccurs	UncompressedFourCC MUST be set (minOccurs=1) in TrackEntry,
> }                when the CodecID Element of the TrackEntry is set to "V_UNCOMPRESSED".
> If I read correctly, the two statements are contradictory?

Correct, it MUST be set.

> } ChromaSitingHorz Element
>    thru ....31.27
>    and: ProjectionType Element
> TrackPlaneType Element
> ContentEncodingScope Element
> ContentEncodingType Element
> ContentCompAlgo Element
> ContentEncAlgo Element
> ChapProcessTime Element
> TargetTypeValue Element  - looks like gaps matter?

The gaps between the values are arbitrary, leaving space in case we need 
something in between.

> do these need any allocation rules?

Same issue as with every enum we define. Do we need a IANA section for 
them ?

> } Green X chromaticity coordinate, as defined by CIE 1931.
> "CIE 1931" should be a reference.


> } ContentEncodingOrder Element
> }name: ContentEncodingOrder
> }definition:
> }Tell in which order to apply each ContentEncoding of the
> }ContentEncodings. The decoder/demuxer MUST start with the ContentEncoding
> }with the highest ContentEncodingOrder and work its way down to the
> }ContentEncoding with the lowest ContentEncodingOrder. This value MUST be
> }unique over for each ContentEncoding found in the ContentEncodings of this
> }TrackEntry.
> The english needs some work.
> Maybe "Tells in which..."


> } ContentEncAlgo Element
> questions will be asked about 1DES, and 3DES.
> 1) We need a note about how legacy files might be encrypted, and we need
>     those definitions for history.
> 2) Questions will be asked about how the keys were created.
>     I haven't read the Security Considerations, but it needs to say something.
> 3) We should probably have a suggestion on how encryption keys are
>     represented in text.
>     Consider one system that takes things in hex, and another that
>     uses strings, and that all hex values are valid strings.

I'll try to come up with something. Although I'm not too familiar with 
this part. It is however used in WebM.

> } ContentEncKeyID Element
> Will need a reference to public key formats used.


> } ContentEncAESSettings Element
> is this number of bits?

It's a master element, it only contains AESSettingsCipherMode.

> } AESSettingsCipherMode Element
> will need an IANA registry.


> } TargetType Element
> might need IANA Considerations, but table looks broken.


> value	label
> ...
> === got to section 6.

:clap: ;)

> --
> Michael Richardson <>, Sandelman Software Works
>   -= IPv6 IoT consulting =-
> _______________________________________________
> Cellar mailing list