Re: [Cellar] John Scudder's Discuss on draft-ietf-cellar-matroska-16: (with DISCUSS and COMMENT)

Steve Lhomme <slhomme@matroska.org> Sun, 02 July 2023 08:19 UTC

Return-Path: <slhomme@matroska.org>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D40D0C1524AE for <cellar@ietfa.amsl.com>; Sun, 2 Jul 2023 01:19:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 3.107
X-Spam-Level: ***
X-Spam-Status: No, score=3.107 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, GB_SUMOF=5, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=matroska-org.20221208.gappssmtp.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sB9nUJci9w5A for <cellar@ietfa.amsl.com>; Sun, 2 Jul 2023 01:19:11 -0700 (PDT)
Received: from mail-ed1-x52b.google.com (mail-ed1-x52b.google.com [IPv6:2a00:1450:4864:20::52b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 57862C1524A3 for <cellar@ietf.org>; Sun, 2 Jul 2023 01:19:10 -0700 (PDT)
Received: by mail-ed1-x52b.google.com with SMTP id 4fb4d7f45d1cf-51a52a7d859so5979605a12.0 for <cellar@ietf.org>; Sun, 02 Jul 2023 01:19:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=matroska-org.20221208.gappssmtp.com; s=20221208; t=1688285949; x=1690877949; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=e8r3dYFTQncjnhznqMXtgGBGQT8lxvt1gZaTIeDsHug=; b=r+S6LGzPTFSdTJk1b6U//fabiGzyy2XWYSDfG2RGPWYZju22SV0e9dPqXu+eoAXuPl YQ2nEPnpJo63pS6RK5xwjsLzONBGQBS1brcdTeewozRPBsYU7AFCgtehqK6dl+dmEZ9b f2Yfne1VfDovhgAAAeVRjGfA6kF9AGklMBPuZiaWs+hEyVIEoEO4MopdtRsU52fnCP7t VEyGd8bSCBpNKewR3hwSI3R1+byujnMpWbCv/e7Ag9XlGwfPxZAGbwHxIViHl+ywXKaW PIXuqw8HC8cQSMRgswmH1bhpvozbL/CVbVjEJHFnqupXXVQyIY1S3paSWYAUA5M2FRcW x1QA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688285949; x=1690877949; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=e8r3dYFTQncjnhznqMXtgGBGQT8lxvt1gZaTIeDsHug=; b=jEvIGQqMNMJJ1CEC2pMSKsbHaTMSXoiEXe3Y6Q9i0zaS96sEnBg7pgYnWxaTS8wPhH 9r2iQY0d5qg84We3LCtIf+DO89k+DOEELq9kkUUSFkDkux2tqeVhVec7CngOiwxRGfzL QkQ0JPf6K5+TF7Iyn1OUdc+Vn8OpnZE/9/qmTXAvAArRSobRX4H3rvP5r/J8Hei0tDYG 5TWi6DMSUM/TMDY3auWjqPO25Mw+Ub7aGSxLDoM0O+nUXeZrjbHPJ0ZJ5TuVssLkjO3t AJVHMzp9ttA+Y17TGQT5waeBF7d0fdWqvenO6GWwjHEMuj3fXwzdVzjyUozYU3bBFest mWAg==
X-Gm-Message-State: AC+VfDz7+O1BYhewyg9zM+u7nXUHYizK5EIQAsjGHdiA46V1euwTYJWu 8nfrI20DEntdtrxJ+2OWeODGrJpyxeioxpklN8Q=
X-Google-Smtp-Source: ACHHUZ65wM0zoYln6EEnCur4q0rp6amfGIvFYOnyImrpNlANF2WWXrqaQhTRs01mHp9sZ4YWzhGDgw==
X-Received: by 2002:a17:907:d2a:b0:974:fb94:8067 with SMTP id gn42-20020a1709070d2a00b00974fb948067mr9585863ejc.23.1688285948765; Sun, 02 Jul 2023 01:19:08 -0700 (PDT)
Received: from smtpclient.apple (2a01cb0c0020e900e8385abc16d4722b.ipv6.abo.wanadoo.fr. [2a01:cb0c:20:e900:e838:5abc:16d4:722b]) by smtp.gmail.com with ESMTPSA id l15-20020aa7d94f000000b0051de1a0af8fsm3579956eds.35.2023.07.02.01.19.07 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 02 Jul 2023 01:19:08 -0700 (PDT)
From: Steve Lhomme <slhomme@matroska.org>
Message-Id: <0144EAD7-C462-4872-B0E5-0BBEA37705E5@matroska.org>
Content-Type: multipart/alternative; boundary="Apple-Mail=_51E4C11D-5EC3-4192-80DD-FB60057F4770"
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\))
Date: Sun, 02 Jul 2023 10:18:57 +0200
In-Reply-To: <2C02FEBE-F10A-48DA-8939-5F7F25FFAD1B@juniper.net>
Cc: The IESG <iesg@ietf.org>, "draft-ietf-cellar-matroska@ietf.org" <draft-ietf-cellar-matroska@ietf.org>, "cellar-chairs@ietf.org" <cellar-chairs@ietf.org>, "cellar@ietf.org" <cellar@ietf.org>, "mcr+ietf@sandelman.ca" <mcr+ietf@sandelman.ca>
To: John Scudder <jgs@juniper.net>
References: <168477230376.6870.18325949540737076935@ietfa.amsl.com> <7B9EA3BB-272F-450C-AFB6-1AAFBD1AE84C@matroska.org> <2C02FEBE-F10A-48DA-8939-5F7F25FFAD1B@juniper.net>
X-Mailer: Apple Mail (2.3731.600.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/jmhDiQdHwZgPh7nURC_FLoeQVy4>
Subject: Re: [Cellar] John Scudder's Discuss on draft-ietf-cellar-matroska-16: (with DISCUSS and COMMENT)
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 02 Jul 2023 08:19:15 -0000

Hi John,

Comments below

> On 29 Jun 2023, at 23:53, John Scudder <jgs@juniper.net> wrote:
> 
> Hi Steve,
> 
> I apologize for not responding sooner; I thought work was still underway until I saw Spencer’s email a couple of days ago.
> 
> I’ve looked at the new version although not comprehensively. Some high-order bits:
> 
> - The rewrite of the “Block Header” section into “Block Structure” is a huge improvement from my point of view. Thanks!
> 
> - Now that I’ve had some time to consider it, I regret having so firmly characterized the "Conflict with RFC 8794, potentially resolved by errata” discussion point as being for “AD/IESG”. Here are my new thoughts on the subject:
> 
> 	- The Matroska draft allocates EMBL ID 0x80
> 	- RFC 8794 designated EMBL ID 0x80 as “Reserved”
> 	- Errata 7189 [1] and 7191 [2] try to retroactively change RFC 8794 to rescind the reservation of 0x80.
> 
> But this is not a normal way to do things in our process. On the one hand, an erratum isn’t supposed to be used to change technical content of an RFC unless it’s completely clear that it was an error and not just something the authors regret. On the other hand, it’s actually easy to do the fix — just not with an erratum. The registry was established by RFC 8794, a Standards Track RFC. It can be revised by another Standards Track RFC. One option would be to progress yet another RFC to update RFC 8794, but I think it would be more pragmatic, and OK, to do it within the Matroska spec itself.

Looking at the current EBML IANA registry [1] it seems that it has not been updated based on our errata [2]
[1] https://www.iana.org/assignments/ebml/ebml.xhtml
[2] https://www.rfc-editor.org/errata/eid7189

I think this should be more visible. A new RFC to update this is probably the way to go. However it may take a while (we may want to add more things to EBML while doing a new/updated document). So your proposal is probably the way to go for the current Matroska document.

> Basically, what you’d want to do is take your errata that are critical to the Matroska spec, roll them into the Matroska spec as a section, and add Updates:8794 to the metadata. You’d end up with something like
> 
> 	X. Updates to RFC 8794
> 
> 	Matroska makes use of EBML, defined in RFC 8794. Because of an oversight, RFC 8794 
> 	reserved EBML ID 0x80, which is used by deployed Matroska implementations. For this
> 	reason, this specification updates RFC 8794 to make 0x80 a legal EBML ID. Specifically,
> 	the following are changed in RFC 8794:
> 
> 	Section 5, 
> 
> 	OLD:
> 
>      +=========================+================+=================+
>      | Element ID Octet Length | Range of Valid | Number of Valid |
>      |                         |  Element IDs   |     Element IDs |
>      +=========================+================+=================+
>      |            1            |  0x81 - 0xFE   |             126 |
>      +-------------------------+----------------+-----------------+
> 
> 	NEW:
> 
>      +=========================+================+=================+
>      | Element ID Octet Length | Range of Valid | Number of Valid |
>      |                         |  Element IDs   |     Element IDs |
>      +=========================+================+=================+
>      |            1            |  0x80 - 0xFE   |             127 |
>      +-------------------------+----------------+————————+
> 
> 	Section 17.1,
> 
> 	OLD:
> 
>   The following one-octet Element IDs are RESERVED: 0xFF and 0x80.
> 
> 	NEW:
> 
>   The following one-octet Element ID is RESERVED: 0xFF.
> 
> And in your IANA section, something like —
> 
> 	RFC 8794 created the EBML Element IDs registry. IANA is requested to revise the registration procedures for that registry as:
> 
> 	OLD:
> 	0x81-0xFE	RFC Required 
> 
> 	NEW:
> 	0x80-0xFE	RFC Required
> 
> In case it’s not obvious, the above is just an outline to illustrate the idea, I don’t guarantee that the supplied text is either correct or complete!

OK, I did this Pull Request which more or less takes your suggestions and added the 2 errata that also include these text changes [3].

[3] https://github.com/ietf-wg-cellar/matroska-specification/pull/795

> 
> And, as idnits would tell you if you missed it, if you use the Updates: header you’re supposed to mention it in the abstract, as in “This document updates RFC 8794 to permit the use of a previously reserved EBML Element ID."
> 
> - By the way, now that I’ve spent some quality time staring at the EBML and Matroska IANA sections, I am confused about why we need the "Matroska Element IDs” registry. Shouldn’t all of these just be registered in the EBML Element IDs registry? Are the Matroska Element ID and EBML Element ID namespaces disjoint in some way that isn’t obvious? (But if they were disjoint, why would we need the updates to RFC 8794?) If they aren’t disjoint, it’s very hard to see how a new registry (a) helps and (b) isn’t harmful.

To make an analogy with XML, the EBML registry would be adding attributes to the <?xml?> part, while the Matroska registry is about the content of the XML document. In Matroska parlance, that’s the EBML header versus the EBML content which is designated by the DocType. There can be more DocType than Matroska. Each DocType would have their own set of elements with their own registry.

Technically it would be possible to have the same EBML ID used in the EBML header and in Matroska, for different purposes. We just don’t do it because we can do otherwise.

> To state more clearly: I think there should not be a Matroska Element IDs registry, I think you should just register your IDs in the EBML Element ID registry. If that’s wrong, please help me understand why.
> 
> - Finally, some of the less-major points seem to have been missed, which is what led me to imagine there was still work underway. For example, in an earlier exchange, we had
> 
>> On May 21, 2023, at 8:25 AM, Steve Lhomme <slhomme@matroska.org> wrote:
>> 
>>> ### Throughout, "bits" representation
>>> 
>>> In various places, you have language like "The bits 5-6 of the Block Header
>>> flags are set to 11" and similar. This language seems like it needs to be
>>> fixed, because the simplest plain English reading of the sentence (IMO anyway)
>>> is "bit 5 is set to decimal 11, and so is bit 6", which doesn't make sense. The
>>> most straightforward fix would be to write "... is set to 0b11" to indicate
>>> that you're writing the value in binary.
>>> 
>> 
>> You’re right.
> 
> The “you’re right" led me to think you were on board with using the “0b” representation, but that wasn’t done (search for “the bits 5-6” in the current version). I don’t mean to say this particular nit is worthy of holding the document up for — it isn’t. But like the famous Van Halen M&Ms [3] it leads me to wonder if other stuff has been missed, so I’m flagging it for your/the shepherd’s attention, rather than make you wait while I do an in-depth re-review and re-flag open issues right now — especially since the "Conflict with RFC 8794, potentially resolved by errata” point still needs to be resolved regardless.

I did change the text in this PR: https://github.com/ietf-wg-cellar/matroska-specification/pull/766/files
However I postfixed the values with ‘b’. Did you mean it should be prefixed with ‘b’ to mark it’s a binary representation.

> Thanks,
> 
> —John
> 
> [1] https://www.rfc-editor.org/errata/eid7189
> [2] https://www.rfc-editor.org/errata/eid7191
> [3] https://www.vulture.com/2020/10/eddie-van-halen-brown-m-and-ms.html
> 
>> On May 29, 2023, at 4:10 AM, Steve Lhomme <slhomme@matroska.org> wrote:
>> 
>> 
>> Hi John,
>> 
>> Thanks again for the time to look into this in detail :)
>> 
>>> On 22 May 2023, at 18:18, John Scudder via Datatracker <noreply@ietf.org> wrote:
>>> 
>>> John Scudder has entered the following ballot position for
>>> draft-ietf-cellar-matroska-16: Discuss
>>> 
>>> When responding, please keep the subject line intact and reply to all
>>> email addresses included in the To and CC lines. (Feel free to cut this
>>> introductory paragraph, however.)
>>> 
>>> 
>>> Please refer to https://www.ietf.org/about/groups/iesg/statements/handling-ballot-positions/ 
>>> for more information about how to handle DISCUSS and COMMENT positions.
>>> 
>>> 
>>> The document, along with other ballot positions, can be found here:
>>> https://datatracker.ietf.org/doc/draft-ietf-cellar-matroska/
>>> 
>>> 
>>> 
>>> ----------------------------------------------------------------------
>>> DISCUSS:
>>> ----------------------------------------------------------------------
>>> 
>>> # John Scudder, RTG AD, comments for draft-ietf-cellar-matroska-16
>>> CC @jgscudder
>>> 
>>> ## DISCUSS
>>> 
>>> (This is another, and I hope the final, update to my ballot. I add one more
>>> DISCUSS point, directly following, and I add COMMENTs and NITs, starting with
>>> Section 15.1. I apologize for the episodic nature of this review, and hope it
>>> was helpful to get the early part.)
>>> 
>>> ### MatroskaTags reference
>>> 
>>> I notice that the [MatroskaTags] document is referenced in several places. At
>>> first glance, it seems like this is a document that "must be read to understand
>>> or implement the technology in the new RFC, or whose technology must be present
>>> for the technology in the new RFC to work" [1], in which case it should be
>>> Normative.
>>> 
>>> I can see why the WG would be hesitant to make the reference Normative, since
>>> the document seems to be less mature and might hold up RFC publication for some
>>> time while it also progresses. But please be sure that it really does fit the
>>> criteria of "only [providing] additional information", and that the current
>>> spec can stand alone without it.
>> 
>> It can be both ways. Tags are optional and all the Normative parts are actually found in the main Matroska document (this draft). The other document is more a guideline or common value people “SHOULD” use if they want consistency. There are only four MUST in this document. Three of which are about what particular values must contain.
>> 
>> Two of the MatroskaTags link are found in the Tag elements description, for complementary information. The last reference is in the Tags precedence explanation. It is referring to a 6.4 section that doesn’t actually exist. So this needs some fixing. It might be better to carry over the actual normative MUST in this draft and not need this reference at all.
>> 
>> Done here: https://github.com/ietf-wg-cellar/matroska-specification/pull/770
>> 
>>> 
>>> Because I don't have the subject area expertise to make this call with
>>> confidence, I don't intend to hold a DISCUSS on this point past the telechat,
>>> this is primarily here to ensure that the question is considered by others.
>>> 
>>> [1]
>>> https://www.ietf.org/about/groups/iesg/statements/normative-informative-references/
>>> 
>>> --
>>> 
>>> (This is an update to my previous ballot, to add one more question. The
>>> original ballot (below) was entirely questions for the authors (though of
>>> course, anyone is welcome to engage). The new question is primarily
>>> process-related and I'm adding it mostly to flag the issue for the sponsoring
>>> AD, and to make sure we cover it during IESG discussion, although again, anyone
>>> is welcome to engage.)
>>> 
>>> ### Conflict with RFC 8794, potentially resolved by errata
>>> 
>>> Primarily for the AD/IESG: I happened to notice that there are 9 errata open
>>> against RFC 8794, the EMBL spec, which is a normative reference for this doc.
>>> All of them are in "reported" state so have no normative force right now. Two
>>> of them (ID 7189 and ID 7191) are added specifically to bring Matroska's usage
>>> into compliance (or rather, to redefine "compliance" to include what Matroska
>>> does).
>>> 
>>> It's not clear to me that these are proper uses of the errata process, but it's
>>> not my WG or area. In any case, it seems like this is a hint that there's a
>>> potential problem. Whether the problem should be fixed by verifying the errata,
>>> or some other way, I don't know.
>>> 
>>> Primarily for the authors:
>>> 
>>> (Update: thanks for your replies to this part, your proposed solutions sound
>>> good to me.)
>>> 
>>> Thanks for this document. It's well outside my comfort zone (I'm a routing
>>> protocols guy) and I appreciate the effort that's gone into making what is
>>> clearly a major piece of work accessible to someone like me.
>>> 
>>> Although I haven't completed my review, as I've gone through it I found two
>>> sections that seemed worth having a DISCUSSion about. Since the Section 10
>>> DISCUSSion might take a while, I figured I should raise it now rather than wait
>>> to finish my entire review. So, I may add some additional comments later.
>>> 
>>> ### Section 6.8
>>> 
>>> To quote §6.8 in its entirety:
>>> 
>>>  The Tags Element is most subject to changes after the file was
>>>  originally created.  For easier editing, the Tags Element SHOULD be
>>>  placed at the end of the Segment Element, even after the Attachments
>>>  Element.  On the other hand, it is inconvenient to have to seek in
>>>  the Segment for tags, especially for network streams.  So it's better
>>>  if the Tags Element is found early in the stream.  When editing the
>>>  Tags Element, the original Tags Element at the beginning can be
>>>  overwritten with a Void Element and a new Tags Element written at the
>>>  end of the Segment Element.  The file size will only marginally
>>>  change.
>>> 
>>> I've read this several times and I just can't figure out how to reconcile "the
>>> Tags Element SHOULD be placed at the end of the Segment Element" with "it's
>>> better if the Tags Element is found early in the stream". Have I missed some
>>> subtlety in the distinction between "the Segment Element" and "the stream"? Or
>>> are you setting up a dichotomy on purpose? Or something else?
>>> 
>>> My guess is you're purposefully setting up a dichotomy (the Tags Element both
>>> should go early, and late) and then proposing a solution for it (put it early,
>>> but if it gets revised, Void it out and put the replacement late). Is that it?
>>> If so, I'd recommend rewriting to be clearer about what you're doing and not
>>> presenting the reader with a contradiction. Absent a significant rewrite, at a
>>> minimum the SHOULD has to go.
>> 
>> Addressed here (not normative anymore): https://github.com/ietf-wg-cellar/matroska-specification/pull/762
>> 
>>> 
>>> ### Section 10
>>> 
>>> It seems as though there are significant unstated assumptions in the
>>> descriptions throughout this section and its subsections. I'll go into further
>>> detail on specifics below. Since the Block is such a fundamental structure for
>>> this document, it seems worth putting in some extra effort to make sure its
>>> description is clear.
>>> 
>>> In routing protocol specifications we tend to use quaint ASCII protocol packet
>>> diagrams with bit rulers along the top. Your use of variable-length fields
>>> makes this approach difficult, but I regretted not having some kind of
>>> graphical representation to reference, which might otherwise have helped. $0.02.
>>> 
>>> ### Throughout, "bits" representation
>>> 
>>> In various places, you have language like "The bits 5-6 of the Block Header
>>> flags are set to 11" and similar. This language seems like it needs to be
>>> fixed, because the simplest plain English reading of the sentence (IMO anyway)
>>> is "bit 5 is set to decimal 11, and so is bit 6", which doesn't make sense. The
>>> most straightforward fix would be to write "... is set to 0b11" to indicate
>>> that you're writing the value in binary.
>>> 
>>> ### Tables 36 and following, semantics
>>> 
>>> The semantics of the tables are not clear to me, and I don't see that they're
>>> defined anywhere:
>>> 
>>> Offset: What is this? All the examples are a small hex value and a plus sign.
>>> The plus sign generally means "or more" colloquially, although if this were a
>>> regex it would mean something about repetition. In any case, I don't know WHAT
>>> it means here. By the way, why are you showing these in hex and not decimal
>>> representation?
>>> 
>>> Player: This says "MUST" sometimes and is a hyphen other times. What does this
>>> mean? That a compliant Matroska Player MUST implement support for certain
>>> things (the MUSTs) and... something something I can't guess, for other things
>>> (the hyphens)?
>>> 
>>> ### Section 10.1, Track Number
>>> 
>>> The first line of Table 36 description is "Track Number (Track Entry). It is
>>> coded in EBML like form (1 octet if the value is < 0x80, 2 if < 0x4000, etc.)
>>> (most significant bits set to increase the range)."
>>> 
>>> First, a suggestion: if you're saying "encoded as an EBML VINT" (which I think
>>> is what you mean) then say so, with xref, since that's a well-defined format
>>> with a well-defined reference. Second, a concern: since you're using a
>>> variable-length encoding here, I don't get how you can provide a specific
>>> offset for the later fields; Timestamp's location is presumably "after Track
>>> Number". Maybe that's what the "0x01+" notation means? It's telling me that
>>> Timestamp won't be any earlier than byte 1, but it might be later, depending on
>>> how Track Number is coded? But if the + means "or later" I don't see how Track
>>> Number gets 0x00+ -- what would make it start later than byte 0?
>>> 
>>> ### Tables 40 and following, semantics
>>> 
>>> The various lacings have tables with a column headed "block octets" or "block
>>> octet". Given that the header is variable-length (because of the Track Number
>>> encoding) I don't see how these block octets can be known a priori. I guess
>>> perhaps the answer is "this is just an example, in a case where the Track
>>> Number happens to be encoded in a single byte, so the header occupies octets
>>> 0-3". If that is the case, I suggest stating it explicitly, perhaps in the
>>> introductory text of §10.4.
>>> 
>>> ### 10.4.2 Xiph lacing, description of encoding
>>> 
>>> After looking at the example (thank you for providing this) and checking RFC
>>> 3533, I see what the coding is, but I think the description needs some work.
>>> Here's what I understand your intent to be, with some comments:
>>> 
>>> "Lacing Head on 1 Octet". I'm not able to make sense of these words, on their
>>> own. I infer from the examples that what you mean is, the first octet of the
>>> block data is a single unsigned integer that provides the number of frames in
>>> the lace minus one, meaning a lace can contain between 1 and 256 frames. Is
>>> that right?
>>> 
>>> Then that's followed by a series of integers, described as "Lacing size of each
>>> frame except the last one". These indicate the sizes of the respective lace
>>> frames other than the final one (which is inferred). If the number of frames in
>>> the lace is N, there will be N-1 integers in the list. The integers are
>>> variable-sized, and the encoding of an integer is a repetition of zero or more
>>> octets with value 255, followed by a single octet with a value less than 255.
>>> The frame size value is the sum of the octet values.
>>> 
>>> If my paragraph above is a correct description, I think your paragraph that
>>> begins "The lacing size is split into 255 values" needs work. To begin with,
>>> the plain English reading of "is split into 255 values" is "it's split into 255
>>> distinct things", which is not what you mean.
>> 
>> Addressed here (rewritten with bit tables): https://github.com/ietf-wg-cellar/matroska-specification/pull/766
>> 
>>> 
>>> (Update: thanks for explaining why this point was mistaken.  By the way, what
>>> happens if N=1? I appreciate that it would be silly to use anything other than
>>> the "no lacing" option in this case, but is it forbidden to encode a single
>>> frame in one of the other lacings? Given there's no way to properly do the
>>> encoding, it seems like it might be worth saying explicitly that this is a MUST
>>> NOT.)
>> 
>> If you only have one frame, the number of frames minus 1 is just 0. It still fits in the lacing algorithm. It might be useful if you have some code that one write one kind of Block and don’t want to make a special for the last frame if it’s alone.
>> 
>>> 
>>> ### 10.4.3 EBML lacing, encodings
>>> 
>>> The previous point about "on 1 Octet" applies.
>>> 
>>> In this case, I have no problem with the integer coding definition, it's clear.
>>> But the example gave me difficulty in several ways.
>>> 
>>> In the second line of Table 43, you have
>>> 
>>>       Block Octets    Value           Description
>>> 
>>>       5-6             0x43 0x20       Size of the first frame
>>>                                       (800 = 0x320 + 0x4000)
>>> 
>>> My first attempt at understanding how eight hundred can be equal to 0x320 +
>>> 0x4000 was a fail, because of course using the normal understanding of the "="
>>> and "+" symbols, it can't be. My second attempt was to interpret this as saying
>>> that you're using the VINT coding, and the "=" is just (confusingly) being used
>>> to mean the colloquial English, i.e. you're saying "800, encoded in VINT as 0x4
>>> for the VINT_WIDTH and VINT_MARKER, meaning it's two octets wide, followed by
>>> 0x0320 for the next 14 bits... and 0x320 is decimal 800."
>>> 
>>> Looking at the third line,
>>> 
>>>       Block Octets    Value           Description
>>> 
>>>       7-8             0x5e 0xd3       Size of the second frame
>>>                                       (500 - 800 = -300 = -0x12c + 0x1fff +
>>>                                       0x4000)
>>> 
>>> Once again I think the descriptive text is confusing: the size of the second
>>> frame isn't -300, though I do get that you store -300 to let the size of the
>>> second frame be computed, as 800 - 300, to give 500, which is indeed the size.
>>> 
>>> Second, "-0x12c" isn't any standard notation I'm aware of. I get that you're
>>> using it to mean "negative 300". Then 0x1fff is the hex value for
>>> (2^((7*2)-1)-1), and the addition of 0x4000 is the value for VINT_WIDTH and
>>> VINT_MARKER for a 2-octet encoding. At a minimum, I would rewrite this as
>>> "0x1fff + 0x4000 - 0x012c" to eliminate the negation sign and produce an
>>> expression that would parse, but read on.
>>> 
>>> It seems to me that readability would benefit from dividing the examples up --
>>> a subsection on VINT/S_VINT, and a subsection on the actual Block encoding. For
>>> example, the VINT part could go something like this, inserted directly after
>>> Table 42.
>>> 
>>>  In our example, we will need to be able to represent 800, the size of
>>>  the first frame, as a VINT. This is done using a 2 octet wide VINT as
>>>  0x4000 + 0x0320 -- the 0x4000 is the VINT_WIDTH and VINT_MARKER, and
>>>  the 0x0320 is the hexadecimal representation of decimal 800. The
>>>  resulting VINT representation is 0x4320. We will also need to be able
>>>  to represent -300, the offset required for the size of the second
>>>  frame, as a signed VINT. This is done as 0x4000 + 0x1fff - 0x012c. In
>>>  this expression, 0x4000 is again the VINT_WIDTH and VINT_MARKER,
>>>  0x1fff is the hexadecimal representation of (2^((7*n)-1) - 1) for n
>>>  of 2, and 0x012c is the hexadecimal representation of 300. The
>>>  resulting signed VINT representation is 0x5ed3.
>>> 
>>> Or instead of talking about 300, you could also talk about "500 - 800", making
>>> the expression "0x4000 + 0x1fff + (0x01f4 - 0x0320)", and/or you could
>>> represent each value in the most convenient radix for clarity, as in "0x4000 +
>>> 0x1fff + (500 - 800)".
>>> 
>>> Then maybe you'd rewrite the descriptions in Table 43 something like
>>> 
>>>       Block Octets    Value           Description
>>> 
>>>       5-6             0x43 0x20       Size of the first frame, 800,
>>>                                       encoded as a VINT
>>> 
>>>       7-8             0x5e 0xd3       Offset between size of first
>>>                                       frame and size of second frame,
>>>                                       encoded as a signed VINT. To
>>>                                       recover the size of the second
>>>                                       frame, we sum this value with
>>>                                       the preceding frame, so,
>>>                                       (-300) + 800 = 500.
>> 
>> Also addressed by the big rewrite in https://github.com/ietf-wg-cellar/matroska-specification/pull/766
>> 
>>> 
>>> ### 10.5, misuse of RFC 2119 keywords
>>> 
>>>  When a frame in a BlockGroup is not a RAP, all references SHOULD be
>>>  listed as a ReferenceBlock, at least some of them,
>>> 
>>> Which is it? All? Or some of them? The RFC 2119 SHOULD is confused about what
>>> it needs to do here. Even if you remove the keyword (for example rewriting as
>>> "should"), if I were implementing this I'd have a hard time knowing what you're
>>> asking me to do.
>>> 
>>>                                                     even if not
>>>  accurate, or one ReferenceBlock with the value "0" corresponding to a
>>>  self or unknown reference.  The lack of ReferenceBlock would mean
>>>  such a frame is a RAP and seeking on that frame that actually depends
>>>  on other frames MAY create bogus output or even crash.
>>> 
>>> The RFC 2119 MAY is wrong in this context, you seem to be using it to express
>>> "might". As written, a strict reading would mean you're saying "it is
>>> explicitly OK, though optional, if implementations decide to create bogus
>>> output or crash in this case". You can fix this by replacing the MAY with
>>> "might" or "could".
>>> 
>> 
>> Addressed here: https://github.com/ietf-wg-cellar/matroska-specification/pull/763
>> 
>>> 
>>> ----------------------------------------------------------------------
>>> COMMENT:
>>> ----------------------------------------------------------------------
>>> 
>>> ## COMMENT
>>> 
>>> ### Section 7
>>> 
>>> "A reading application supporting Matroska version V MUST NOT refuse to read an
>>> application with DocReadTypeVersion equal to or lower than V even if
>>> DocTypeVersion is greater than V."
>>> 
>>> Should the second "application" be "file", making the text "... refuse to read
>>> a file with..."?
>> 
>> Addressed here: https://github.com/ietf-wg-cellar/matroska-specification/pull/764
>> 
>>> 
>>> ### Section 9
>>> 
>>> "For video sequences signaled as progressive, it is twice the value of
>>> DefaultDecodedFieldDuration."
>>> 
>>> What is "it"?
>> 
>> Clarified here (no more “it”): https://github.com/ietf-wg-cellar/matroska-specification/pull/765
>> 
>>> 
>>> ### Section 15.1, RFC 2119 keywords in an example
>>> 
>>> The first paragraph ends with an example, that contains MAY and SHOULD.
>>> Generally, examples are non-normative, so probably it would be better for these
>>> to be regular lower-case words and not RFC 2119 keywords. I have flagged some
>>> other cases where you do this with examples, but I probably haven't gotten all
>>> of them.
>> 
>> Addressed here: https://github.com/ietf-wg-cellar/matroska-specification/pull/771
>> 
>>> 
>>> By the way, I very much appreciated all the examples used in the document, they
>>> helped a lot with my understanding.
>>> 
>>> ### Section 17.1, pathological examples
>>> 
>>> The way hard linking is defined, the obvious usage is like a classic
>>> doubly-linked list... but you also allow for inference of forward and backward
>>> links if absent. I guess this is for historical reasons and to be robust in the
>>> face of weird pathologies, not because these patterns are desirable. Table 45
>>> shows the classic doubly-linked list usage, and Tables 46-48 show what I would
>>> describe as pathological (but still permitted) cases.
>>> 
>>> Maybe I am mistaken and all of the four examples shown are equally desirable
>>> and acceptable, in which case please disregard. But if indeed one is the
>>> preferred encoding style, then this might be a case for a SHOULD (for that
>>> style) and MAY (for the others, already done in your text).
>> 
>> There is no preferred way. You may have 20 segments that each have the same Segment PrevUUID. It means that Segment must be played before playing these files. But that Segment can’t reference each of them as the NextUUID, actually, none of them. 
>> 
>>> 
>>> Also, this paragraph doesn't seem quite right:
>>> 
>>>  For each node of the chain of Segments of a Linked Segment at least
>>>  one Segment MUST reference the other Segment of the node.
>>> 
>>> Specifically, I think "the other Segment of the node" should be something like
>>> "another Segment within the chain"?
>> 
>> Sounds reasonable, yes.
>> Addressed here: https://github.com/ietf-wg-cellar/matroska-specification/pull/772
>> 
>>> 
>>> ### Section 18.8, "sum" or "union"?
>>> 
>>>                           The Cues Elements corresponding to such a
>>>  virtual track SHOULD be the sum of the Cues Elements for each of the
>>>  tracks it's composed of (when the Cues are defined per track).
>>> 
>>> Should "sum" be "union"? I didn't see an obvious way to (arithmetically) sum
>>> Cues Elements.
>> 
>> You’re right. It seems more logical to use the union of the Cues of Track A and Track B.
>> Addressed here: https://github.com/ietf-wg-cellar/matroska-specification/pull/773
>> 
>>> ### Section 18.10, MAY instead of may?
>>> 
>>> I don't often suggest *adding* a 2119 keyword, but in the last paragraph,
>>> "Matroska Readers may" might be a good place to use the RFC 2119 MAY.
>> 
>> It does seem better.
>> Addressed here: https://github.com/ietf-wg-cellar/matroska-specification/pull/774
>> 
>>> 
>>> ### Section 19, examples don't need 2119 keywords
>>> 
>>> In the first paragraph, again we have 2119 keywords with examples. IMHO these
>>> aren't needed.
>> 
>> I think they may sense here. They are not part of the example text but what players SHOULD and MAY do. It’s still not mandatory, which is not enforceable in any way. But we do want people to use the proper track selection behavior.
>> 
>>> 
>>> ### Section 20.2.2, wrong xref?
>>> 
>>> Is the xref to 5.1.7.1.4.14 correct? Maybe it should be to 5.1.7.4.3 or
>>> 5.1.7.4.4?
>> 
>> The xref is correct, although a bit cryptic. It refers to the ChapterProcess element. This line is about the “chapter commands” that could be added in an “empty" ordered chapter.
>> 
>>> 
>>> ### Section 20.4
>>> 
>>> I would have benefited from a pointer to the [matroskatags] reference here:
>>> 
>>>  Each level can have different meanings for audio and video.  The
>>>  ORIGINAL_MEDIA_TYPE tag can be used to specify a string for
>>>  ChapterPhysicalEquiv = 60.
>> 
>> Indeed.
>> Addressed here: https://github.com/ietf-wg-cellar/matroska-specification/pull/775
>> 
>>> 
>>> In Table 57, is the list of values (10..70) the only and immutable set of
>>> possible values? Might this be a candidate for having an IANA registry?
>> 
>> These are the only known values, yes. Although I’m not sure anyone ever used them.
>> It is more equivalent to enum values found in the format that need an update of the spec, rather than a freeform value that anyone can play with. I’m not sure a IANA registry would work here.
>> 
>>> 
>>> ### Section 21.1 refs for JPEG/PNG
>>> 
>>> "Only JPEG and PNG image formats SHOULD be used for cover art pictures." It
>>> seems as though these should have references?
>> 
>> Indeed.
>> Addressed here (non normative): https://github.com/ietf-wg-cellar/matroska-specification/pull/776
>> 
>>> 
>>> ### Section 21.1, use of MAY; bullet list
>>> 
>> (This is section 21.2)
>> 
>>> Just checking: "If a selected subtitle track has some AttachmentLink elements,
>>> the player MAY use only these fonts." As I unpack this using the RFC 2119
>>> definition, this means that it's OK for a player to restrict its font rendering
>>> of a subtitle track to only those found in the AttachmentLink element. That
>>> seems reasonable, but I'm checking because if this were a plain English "may"
>>> it would instead come out meaning that a player is not allowed to use any fonts
>>> other than those in the AttachmentLink elements. It might be well to reword the
>>> sentence to make the intended meaning a little more explicit, for example, if
>>> the first reading is the intended one it could be something like "... the
>>> player MAY restrict its font rendering to use only these fonts".
>> 
>> Indeed, it’s clearer.
>> Addressed here: https://github.com/ietf-wg-cellar/matroska-specification/pull/777
>> 
>>> Also, this looks like a broken bullet list:
>>> 
>>>  A Matroska player SHOULD handle the official font media types from
>>>  [RFC8081] when the system can handle the type: * font/sfnt: Generic
>>>  SFNT Font Type, * font/ttf: TTF Font Type, * font/otf: OpenType
>>>  Layout (OTF) Font Type, * font/collection: Collection Font Type, *
>>>  font/woff: WOFF 1.0, * font/woff2: WOFF 2.0.
>>> 
>>> and so does this:
>>> 
>>>  There may also be some font attachments with the application/octet-
>>>  stream media type.  In that case the Matroska player MAY try to guess
>>>  the font type by checking the file extension of the
>>>  AttachedFile\FileName string.  Common file extensions for fonts are:
>>>  * .ttf for Truetype fonts, equivalent to font/ttf, * .otf for
>>>  OpenType Layout fonts, equivalent to font/otf, * .ttc for Collection
>>>  fonts, equivalent to font/collection The file extension check MUST be
>>>  case insensitive.
>> 
>> Indeed, I noticed this while reading:
>> Adressed here: https://github.com/ietf-wg-cellar/matroska-specification/pull/769
>> 
>>> ### Section 23.1, "the following guidelines"
>>> 
>>> What are the guidelines being referred to by "The following guidelines, when
>>> followed, help reduce the number of seeking operations"? I don't see any in
>>> §23.1 itself. Maybe this is supposed to be an xref to Section 25?
>> 
>> Indeed. I think this text was carried over from our old spec on one website page.
>> Addressed here: https://github.com/ietf-wg-cellar/matroska-specification/pull/778
>> 
>>> ### Section 23.2, TCP
>>> 
>>> In "Livestreaming of Matroska over HTTP (or any other plain protocol based on
>>> TCP) is possible", I suppose it's not actually necessary to restrict
>>> applicability to only TCP. QUIC is the elephant in the room, but presumably,
>>> Matroska doesn't really care what it's being streamed over as long as it gets
>>> reliable in-order delivery. Also, "plain protocol" isn't a well-defined term
>>> AFAIK.
>> 
>> Not sure how to formulate this. I would say file-based access protocols but that doesn’t sound right either. In the end you never know what’s serving you that URL you requested. And there might not  be a proper URL (\\server\path\file.mkv is not a URL or URI: https://www.rfc-editor.org/info/rfc3986).
>> 
>> I used the term “file-like protocol” in https://github.com/ietf-wg-cellar/matroska-specification/pull/779
>> 
>>> 
>>> ### Section 25.1, lower of?
>>> 
>>> Perhaps add "the lower of" in "store no more than 5 seconds or 5 megabytes" to
>>> make it "store no more than the lower of 5 seconds or 5 megabytes"? Or "the
>>> greater of" if that's what you mean, but as written it's underspecified.
>>> 
>> I used the first suggestion in https://github.com/ietf-wg-cellar/matroska-specification/pull/780
>> 
>>> ### Section 27.1, advice for DE
>>> 
>>> Since you've used the "Specification Required" policy, RFC 8126 §4.5 says
>>> 
>>>  The required documentation and review criteria, giving clear guidance
>>>  to the designated expert, should be provided when defining the
>>>  registry.
>>> 
>>> Also, I notice you're using the word "reclaimed" instead of "deprecated" in
>>> this section. I think your intention is clear, but why not use "deprecated"
>>> instead? AFAIK it's the usual term for this, and indeed you use it in the title
>>> of Annex A.
>> 
>> First used the term “returned”, and it was changed to “reclaimed” in https://github.com/ietf-wg-cellar/matroska-specification/pull/719 based on Michael’s suggestion: https://github.com/ietf-wg-cellar/matroska-specification/pull/719#discussion_r1089906011
>> 
>> I have no preference.
>> 
>>> 
>>> ## NITS
>>> 
>>> ### Section 4.4
>>> 
>>> "In some situation, a Cluster Element MAY" -> "In some situations, a Cluster
>>> Element MAY" (make "situations" plural).
>> 
>> Fixed in https://github.com/ietf-wg-cellar/matroska-specification/pull/761
>> 
>>> ### Section 5.1.5
>>> 
>>> "This Element SHOULD be set when the Segment is not transmitted as a live
>>> stream (see #livestreaming)." Looks like #livestreaming is a broken xref
>> 
>> Also fixed in https://github.com/ietf-wg-cellar/matroska-specification/pull/761
>> 
>>> ### Section 17.2
>>> 
>>> The last paragraph ends in what looks like a broken bullet list.
>>> 
>>>  There are 2 ways to use a chapter link: * Linked-Duration linking, *
>>>  Linked-Edition linking
>> 
>> Addressed in https://github.com/ietf-wg-cellar/matroska-specification/pull/769
>> 
>>> ## Notes
>>> 
>>> This review is in the ["IETF Comments" Markdown format][ICMF], You can use the
>>> [`ietf-comments` tool][ICT] to automatically convert this review into
>>> individual GitHub issues.
>>> 
>>> [ICMF]: https://github.com/mnot/ietf-comments/blob/main/format.md
>>> [ICT]: https://github.com/mnot/ietf-comments
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Cellar mailing list
>>> Cellar@ietf.org
>>> https://www.ietf.org/mailman/listinfo/cellar
>> 
>