Re: [Cellar] Bug in Matroska spec?

Steve Lhomme <slhomme@matroska.org> Sun, 09 May 2021 06:46 UTC

Return-Path: <slhomme@matroska.org>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C94523A1D24 for <cellar@ietfa.amsl.com>; Sat, 8 May 2021 23:46:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.003
X-Spam-Level:
X-Spam-Status: No, score=0.003 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=matroska-org.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id a8o3alylFS9N for <cellar@ietfa.amsl.com>; Sat, 8 May 2021 23:46:51 -0700 (PDT)
Received: from mail-wr1-x431.google.com (mail-wr1-x431.google.com [IPv6:2a00:1450:4864:20::431]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 57CD23A1D22 for <cellar@ietf.org>; Sat, 8 May 2021 23:46:51 -0700 (PDT)
Received: by mail-wr1-x431.google.com with SMTP id l14so13297409wrx.5 for <cellar@ietf.org>; Sat, 08 May 2021 23:46:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=matroska-org.20150623.gappssmtp.com; s=20150623; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-language:content-transfer-encoding; bh=NVpnlqTwYCgitL4gtR9VWmZ5JetTX+FLIfcpxeg3FSA=; b=DbUns3rQGIi9ua9iD0XbqzRe++CwGvb54lre4xTm7nByDBUsxKDp/5cF8u79uYpdVw bYHylPp39iiwDmPHxuWovJ3luwbCdR6zNN6bioGFZ3cCQCR9QbOhWYFR/v84cOjrM1i+ Af2zmCytsx/r6zSxmxOlCTkfbb2tRQneqlrDpBrbKzBeh9VV+eAVvA+CrHDLy2qJZF5J +/O2aXAE36/EwAcoPpllMfs3euWouF3FbNXK+fekcsJBN2xAfn/XPlQae1htwdAJx1Ol z3RBiKpUO8elnswekx40VTJD1tl91jbRA4AW4lVN6AxWLJKii7GUYEoldFxaeOwGfBEm b4Bw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=NVpnlqTwYCgitL4gtR9VWmZ5JetTX+FLIfcpxeg3FSA=; b=YK5YupCu8WDmQGv5DnyUBuM1UmAd/WloaWQXLyQN5TFyjukdy2icSBXYMIzBZpfwSi +MOMadlc3YkztY7nrUn0wGgOFIv8Q28XetOdKRHrL5zQxsYMT+Dhkh38pAl3SxgZzGWt wjiv6ADMuf2Z+Zo1B7At3wPgEcHdmQxxg2uGIZsBldb3PzxS7WFz0Xtwi945CrHwpE9+ koZpi0MVYf6ockvQxS9gZ0rRIEqxl5iFx32i2SX9UoE9yN5CftNIPH4cIbUVWrLQsy/j HNWn37IMrPkFxcyWrS+adU5VJpbHTz/lAddwzvxbv57E4OD6ybPhH9JIWEy8YfIQ77aw TtIw==
X-Gm-Message-State: AOAM533w5PPlIYTCJD1O7BoRWZ1m5c/D/S9Ut6NrXFJgJy1lgm2yfh46 pasDCcuiN6Y2iqjgdZy1FAZ4rRqvL6a9PFYa
X-Google-Smtp-Source: ABdhPJzkp/U/d5sYngkhKT5gR9b28iksCKLrlkuLMVEq+EKwlyWAkRjsvGutKi6RcnVyXVxjoOUHVw==
X-Received: by 2002:adf:ec4f:: with SMTP id w15mr23384230wrn.122.1620542808483; Sat, 08 May 2021 23:46:48 -0700 (PDT)
Received: from ?IPv6:2a01:cb0c:20:e900:d434:b5fa:81d:ef4d? (2a01cb0c0020e900d434b5fa081def4d.ipv6.abo.wanadoo.fr. [2a01:cb0c:20:e900:d434:b5fa:81d:ef4d]) by smtp.gmail.com with ESMTPSA id z9sm14451089wmi.17.2021.05.08.23.46.47 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 08 May 2021 23:46:47 -0700 (PDT)
To: Paul Foley <paul@foley.gen.nz>, Codec Encoding for LossLess Archiving and Realtime transmission <cellar@ietf.org>
References: <e18ea8f5-3ae6-a709-a343-fc56e07c5e33@foley.gen.nz>
From: Steve Lhomme <slhomme@matroska.org>
Message-ID: <2bcf60e1-293c-8b51-d3ce-6ff8fe491336@matroska.org>
Date: Sun, 09 May 2021 08:46:48 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1
MIME-Version: 1.0
In-Reply-To: <e18ea8f5-3ae6-a709-a343-fc56e07c5e33@foley.gen.nz>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/z_INjU-Vc9FOTDZs9AhdRRjrIig>
Subject: Re: [Cellar] Bug in Matroska spec?
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 09 May 2021 06:46:56 -0000

Hi Paul,

For discussions on the format, you should email the IETF mailing list 
[1] which is still very active for Matroska, FFv1, etc.
You can also raise issues on GitHub [2] [3].

I'm posting this reply to the CELLAR mailing list as we will need to 
discuss your points further in the community.

On 2021-05-02 14:56, Paul Foley wrote:
> EBML RFC 8794, Section 5 states that
> 
>     The bits of the VINT_DATA component of the Element ID MUST NOT be all
>     "0" values
> 
> however, Matroska defines the ChapterDisplay element with id 0x80,
> which, unless I'm very confused, has all "0" values in the VINT_DATA
> component(!)

You are correct. That's a very good catch !
Not sure yet how we can get around this issue. ChapterDisplay is a 
widely used element. We should probably issue an errata on the IETF 
spec. Even the IANA registry table is bogus in that regard...

> [I wrote code to read and write EBML/Matroska based on the RFC and
> parsing ebml_matroska.xml from
> {https://github.com/ietf-wg-cellar/matroska-specification}, but had to
> ignore the "not zero" rule, and add stuff to the xml file because it's
> incomplete. Specifically, EditionFlagHidden 0x45DB, ChapterFlagEnabled
> 0x4598, ChapterTrack 0x8F, ChapterTrackUID 0x89 are missing, and

These elements have been moved out as they are specific to control track 
which are unused right now. See [4]
This is not ideal for code generators though. The 2 XML files should be 
concatenated before being processed.

> TagDefault's element id was obviously changed from 0x44B4 to 0x4484
> (typo?) at some point; I have many files written before the change ...
> should there be there with maxver set?]

0x44B4 is a bug in old versions of libavcodec. Or maybe that's not where 
it originates but it's referenced there:
#define MATROSKA_ID_TAGDEFAULT          0x4484
#define MATROSKA_ID_TAGDEFAULT_BUG      0x44B4

Since we are putting in the RFC what is currently used, we should 
probably add it in the XML as you suggest. Probably with a maxver="0" 
making the element invalid in any case (it should have never been used). 
But as for other such elements, it will be listed in the RFC with some 
explanation on how to interpret it if it's found. [5]

This value is not present in libmatroska so VLC and mkvmerge won't even 
read the element when found.

> I also note that
> {https://www.iana.org/assignments/ebml/ebml.xhtml#ebml-doctype} has no
> registrations: surely "matroska" and "webm" ought to be registered.

They are mentioned in RFC text [6] but not in the IANA registry. This 
issue was raised during the RFC editor phase but I can't find any answer 
to this. My understanding was that the "matroska" would be added to the 
registry once the RFC for Matroska is ready. And the "webm" one should 
be done by WebM people.
I think the word RESERVED would mean something else. But maybe I'm 
wrong. Ultimately we will add the "matroska" one for sure.

Thanks a lot for your very useful feedback!

[1] https://www.ietf.org/mailman/listinfo/cellar
[2] https://github.com/ietf-wg-cellar/ebml-specification/issues
[3] https://github.com/ietf-wg-cellar/matroska-specification/issues
[4] https://github.com/ietf-wg-cellar/matroska-specification/pull/461
[5] https://github.com/ietf-wg-cellar/matroska-specification/pull/487
[6] https://datatracker.ietf.org/doc/html/rfc8794#section-17.2