[Cellar] EBML Extensions

Steve Lhomme <slhomme@matroska.org> Sun, 15 April 2018 13:30 UTC

Return-Path: <slhomme@matroska.org>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F1BAD1200A0 for <cellar@ietfa.amsl.com>; Sun, 15 Apr 2018 06:30:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, T_DKIMWL_WL_MED=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=matroska-org.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QvyhJph4Uaay for <cellar@ietfa.amsl.com>; Sun, 15 Apr 2018 06:30:11 -0700 (PDT)
Received: from mail-pl0-x22b.google.com (mail-pl0-x22b.google.com [IPv6:2607:f8b0:400e:c01::22b]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 718D61200C5 for <cellar@ietf.org>; Sun, 15 Apr 2018 06:30:11 -0700 (PDT)
Received: by mail-pl0-x22b.google.com with SMTP id 59-v6so8555896plc.13 for <cellar@ietf.org>; Sun, 15 Apr 2018 06:30:11 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=matroska-org.20150623.gappssmtp.com; s=20150623; h=mime-version:from:date:message-id:subject:to; bh=AdTq3fSHt3XvOqjMpicgtbFhiwJYeXwwyLsVAyo+/9Q=; b=P3gWo50xx4fS/L+5rcQDSbd/Nk0z10YXMBuYV2pRQkm6+zj4JEJdp7zUQDFKIjvkVw N28rFCh0pQMYTHnZU490BAjx1MfJXQcL9+FXllegbfD4Cu1NFLHxbYw039brGcggGwoG YBe5qiuDE8eJgRJHrCauVWMteJb+QlDfIWm64xw9IQhGCZVYd9vykIWML6eIsTpAbxUT 6nvOMMppOW0CLa2K8NdTEt0FB/Wji0B4xPrzhAfuYLwkevfJKHyuxa2RD28mLX4DZ30n Jbyxpxj2vnTjPcpcWMDBmqnw3DCRS7Tl8samCCobeY/a+SAPupysnrl7MuhdC8eXjOWb DU6g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=AdTq3fSHt3XvOqjMpicgtbFhiwJYeXwwyLsVAyo+/9Q=; b=M2lgwHBejPSgRVHmx8p6ssmsxro/ICrq0ryAwNqCTziTsXNOZEwEu5OiJ9Mx9h46AY wKXJODPOdk3apEU9I1nMF+Z5OHwkERriSAfY1B6LamyLms/QVWLUcwmKUOEVp8KYqFKn 1+e4FCgcqoLaYqJhOyba97fy6Il6zmhDLzkxtpp339V9LFbwySYuaGDtmfZm464qR3Qc kVeiRHtmO7rr3iLIakT7fL0V837y+OlOEYfKk0p+9bJWkbP3DR4LPDaqMk9noTpbnpjC F5S1+/ww5JHdkVrBhwzIRcA1jF4o/28rCTui/LVx+j5zYHTsL5nqRF10dU+5gyU61wpe cyhw==
X-Gm-Message-State: ALQs6tDosOqX/zfY7ZfBCUTtIwFBLOYPGcRbTVgdkU95WoXCoAPy4lay FmUfVvd++RrX+QnrPfA+d0ZLNV2qYXP0TK+POgoqjQEp
X-Google-Smtp-Source: AIpwx489+556cHvJGaHFC5EqXieINFpYJZR4hPwLUZ5eDrnr8/QJUhqquhtaAz/K6J5G/0pg86nvGh7FMuHrOeCG7iQ=
X-Received: by 2002:a17:902:9a44:: with SMTP id x4-v6mr12231201plv.312.1523799010663; Sun, 15 Apr 2018 06:30:10 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.236.164.196 with HTTP; Sun, 15 Apr 2018 06:30:10 -0700 (PDT)
From: Steve Lhomme <slhomme@matroska.org>
Date: Sun, 15 Apr 2018 15:30:10 +0200
Message-ID: <CAOXsMFK-sp+=CRUmYunHRptudZgLGpMLoTF+UVA-oW94+gzb=w@mail.gmail.com>
To: Codec Encoding for LossLess Archiving and Realtime transmission <cellar@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/JsGRrWX8bAHujyzlm1w1uOX2neo>
Subject: [Cellar] EBML Extensions
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 15 Apr 2018 13:30:14 -0000

Following the discussion on RAWcooked elements we need a way to define
how to create extensions to a DocType that are not meant to be merged
in the main DocType specifications.

The XML Schema we use is already quite good for external files as we
can define the exact path an element can be found, independently of
the main specs. We even have fields in the header to tell which
DocType and which version of the DocType it applies to.

What we don't have is these information when reading an EBML file. We
know the DocType and the version but we don't know of any extensions
that are also valid in this file.

If we had a DTD/Schema embedded in every EBML file that would be where
we put these data. But we don't. And there may be simpler way to do
it, in the same philosophy as EBML/Matroska: only write information
that's not obvious.

We can take the RAWcooked elements as an example for what we need:
https://github.com/Matroska-Org/matroska-specification/pull/223/files

This should be a separate XML file that has the same DocType and
version as the current Matroska one.
https://github.com/Matroska-Org/matroska-specification/blob/master/ebml_matroska.xml
We may also include a human readable name for the extension like
extension="RAWcooked".

The 3 added elements and that's it. That would be enough for the
extension definition.

In the file/stream we should tell if this extension is used, when it's
used. The logical way to do that is to put this information in the
EBML header. We will also need the IDs, the path where they belong and
at least if it's a master element or not (to allow children elements).
The path actually contains a lot of information like the maximum of
occurrences or if it's a recursive element and even how the recursion
works.

The goal is not to interpret the values but only to tell what is valid
and not valid we may not need to know the exact type of the value and
the allowed ranges. But that could be added as well.

The path would probably be in a binary format. Especially as the
elements name are not known.

So in the case of RAWcooked we would have something like that stored
in the EBML header:
EBML
  EBMLExtensions
      EBMLExtensions
        ExtensionName: RAWcooked
        ExtensionElement
          ExtensionElementPath: 0*1(\0x18538067\0x1F43B675\0xA0\0x207262)
          ExtensionElementHasChildren: 0 (may be the type otherwise)
        ExtensionElement
          ExtensionElementPath: 0*(\0x18538067\0x207273)
          ExtensionElementHasChildren: 0
        ExtensionElement
          ExtensionElementPath: 0*1(\0x18538067\0x1654AE6B\0xAE\0x207274)
          ExtensionElementHasChildren: 0


We could add extra elements like a URL to know more about that
extension, a human readable name for each element, etc. The URL may
also replace the ExtensionName.

The separators between IDs may be kept as we need a way to wrap things
in this way: 1*(\Segment\Chapters\EditionEntry(1*(\ChapterAtom)))

Opinions ?

Given the state of EBML it may be for a subsequent version of EBML so
we don't delay the current one further.

-- 
Steve Lhomme
Matroska association Chairman