Re: [Cellar] EBML Extensions
Jerome Martinez <jerome@mediaarea.net> Sun, 15 April 2018 15:44 UTC
Return-Path: <jerome@mediaarea.net>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C55F1127867 for <cellar@ietfa.amsl.com>; Sun, 15 Apr 2018 08:44:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aeZxgRs8YHNE for <cellar@ietfa.amsl.com>; Sun, 15 Apr 2018 08:44:08 -0700 (PDT)
Received: from 9.mo5.mail-out.ovh.net (9.mo5.mail-out.ovh.net [178.32.96.204]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 02E6B1270A3 for <cellar@ietf.org>; Sun, 15 Apr 2018 08:44:08 -0700 (PDT)
Received: from player730.ha.ovh.net (unknown [10.109.105.46]) by mo5.mail-out.ovh.net (Postfix) with ESMTP id EA7621A07B0 for <cellar@ietf.org>; Sun, 15 Apr 2018 17:44:05 +0200 (CEST)
Received: from [192.168.2.120] (p5DDB6BF5.dip0.t-ipconnect.de [93.219.107.245]) (Authenticated sender: jerome@mediaarea.net) by player730.ha.ovh.net (Postfix) with ESMTPSA id 9EDCC4400A3 for <cellar@ietf.org>; Sun, 15 Apr 2018 17:44:02 +0200 (CEST)
To: cellar@ietf.org
References: <CAOXsMFK-sp+=CRUmYunHRptudZgLGpMLoTF+UVA-oW94+gzb=w@mail.gmail.com>
From: Jerome Martinez <jerome@mediaarea.net>
Message-ID: <ffb9246b-2138-7441-61b6-4250ceb29a78@mediaarea.net>
Date: Sun, 15 Apr 2018 17:44:03 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <CAOXsMFK-sp+=CRUmYunHRptudZgLGpMLoTF+UVA-oW94+gzb=w@mail.gmail.com>
Content-Type: multipart/alternative; boundary="------------94DC54CA789F33255933808A"
Content-Language: en-GB
X-Ovh-Tracer-Id: 8305482137877483665
X-VR-SPAMSTATE: OK
X-VR-SPAMSCORE: 0
X-VR-SPAMCAUSE: gggruggvucftvghtrhhoucdtuddrgedtgedrieeigdekhecutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfqggfjpdevjffgvefmvefgnecuuegrihhlohhuthemuceftddtnecu
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/6G6ibfg7msekJ802t0LvdKYuSBc>
Subject: Re: [Cellar] EBML Extensions
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 15 Apr 2018 15:44:12 -0000
On 15/04/2018 15:30, Steve Lhomme wrote: > Following the discussion on RAWcooked elements we need a way to define > how to create extensions to a DocType that are not meant to be merged > in the main DocType specifications. > > The XML Schema we use is already quite good for external files as we > can define the exact path an element can be found, independently of > the main specs. We even have fields in the header to tell which > DocType and which version of the DocType it applies to. > > What we don't have is these information when reading an EBML file. We > know the DocType and the version but we don't know of any extensions > that are also valid in this file. > > If we had a DTD/Schema embedded in every EBML file that would be where > we put these data. But we don't. And there may be simpler way to do > it, in the same philosophy as EBML/Matroska: only write information > that's not obvious. > > We can take the RAWcooked elements as an example for what we need: > https://github.com/Matroska-Org/matroska-specification/pull/223/files > > This should be a separate XML file that has the same DocType and > version as the current Matroska one. > https://github.com/Matroska-Org/matroska-specification/blob/master/ebml_matroska.xml > We may also include a human readable name for the extension like > extension="RAWcooked". > > The 3 added elements and that's it. That would be enough for the > extension definition. > > In the file/stream we should tell if this extension is used, when it's > used. The logical way to do that is to put this information in the > EBML header. We will also need the IDs, the path where they belong and > at least if it's a master element or not (to allow children elements). > The path actually contains a lot of information like the maximum of > occurrences or if it's a recursive element and even how the recursion > works. > > The goal is not to interpret the values but only to tell what is valid > and not valid we may not need to know the exact type of the value and > the allowed ranges. But that could be added as well. > > The path would probably be in a binary format. Especially as the > elements name are not known. > > So in the case of RAWcooked we would have something like that stored > in the EBML header: > EBML > EBMLExtensions > EBMLExtensions > ExtensionName: RAWcooked > ExtensionElement > ExtensionElementPath: 0*1(\0x18538067\0x1F43B675\0xA0\0x207262) > ExtensionElementHasChildren: 0 (may be the type otherwise) > ExtensionElement > ExtensionElementPath: 0*(\0x18538067\0x207273) > ExtensionElementHasChildren: 0 > ExtensionElement > ExtensionElementPath: 0*1(\0x18538067\0x1654AE6B\0xAE\0x207274) > ExtensionElementHasChildren: 0 > > > We could add extra elements like a URL to know more about that > extension, a human readable name for each element, etc. The URL may > also replace the ExtensionName. > > The separators between IDs may be kept as we need a way to wrap things > in this way: 1*(\Segment\Chapters\EditionEntry(1*(\ChapterAtom))) > > Opinions ? While I like the idea of defining some "schema" in EBML header, I am wondering what is the actual usage: for a conformance checker, it helps to consider unknown elements as "expected", but is it useful? Shouldn't a conformance checker just discard unknown elements, expecting that it was defined later (compared to when the conformance checker was built)? Additionally, I understand you talk about third party extension, but don't we have the same issue with official new Matroska elements? when we freeze Matroska v4, how should be considered new elements if the DocVersion is still 4? in other words, does DocType of 4 mandates a precise list of elements, or is it "open"? IMO, thinking more and more about that, it should be "open" and a conformance checker should just ignore unknown element or list them in an information area. For third party extensions, I see that the proposal does not resolve the issue about collisions. even if collisions may be rare, people may want to use shorter ones so a collision would not be so rare. And it does not indicate which ID to use, i.e. if we (specification writers) don't know which element was used by third parties, how will we avoid to use an element ID not used by someone else who did not indicate us that an element is used? Worse, let imagine at EBML level: we want EBML to be used in more places, but what would happen if spec author X define element Y in the spec (nothing forbids him to do so), and EBML authors decide to add a global element with value Y (the same value) because they were not aware of author X and his spec? So I think we have a more global issue: we have reserved values nowhere (not in EBML, not in Matroska), and it may be important to consider that as an issue to resolve before flagging EBML as "final version 1". Checking e.g. 1-byte elements, I see that we have already 78 (76 Matroska + 2 global EBML) 1-byte elements defined, other 128 possibilities. Not a lot are remaining for future EBML global 1-byte element... I suggest that we first reserve (for all classes?) some EBML IDs for future global elements, before we can't use any new Global element due to Matroska using all of them. Independently (this is subject to debate, as some people don't want private elements), we could reserve a range of element IDs for private content. Even if I talked about "uuid" stuff, I don't like so much this idea because it increases the size (16 byte per UUID, in addition to EBML ID size), if I take your idea about EBMLExtension, I suggest that: - Reserve some global elements (3-byte minimum?) for private content - In EBML header, map such global element to something well defined (I don't like UUID because they are cryptic when someone faces the new value, inverted DNS or something similar to a tag name may be better) Idea is based on MXF "Primer", all 2-byte elements IDs are "user defined" and the "Primer" part in the MXF file maps them to their 16-byte "Universal Label", so the private content uses 2 bytes for element ID. Based on your proposal, it would mean something like: - We reserve 0x234500 to 0x2345FF (random values) in EBML for private content. - In EBMLHeader: EBML EBMLExtensions ExtensionElement ExtensionValue: 0x234562 ExtensionMeaning: "RAWcooked/RawcookedBlockGroup" ExtensionElement ExtensionValue: 0x234572 ExtensionMeaning: "RAWcooked/RawcookedTrackEntry" ExtensionElement ExtensionValue: 0x234573 ExtensionMeaning: "RAWcooked/RawcookedSegment" Just an idea as a potential solution for the issues I see with your idea and issues in general about EBML, but I have no strong ideas about how to handle them, someone has other ideas?
- [Cellar] EBML Extensions Steve Lhomme
- Re: [Cellar] EBML Extensions Jerome Martinez