[Cellar] on restricting Unknown-Sized Elements

Michael Richardson <mcr+ietf@sandelman.ca> Thu, 14 May 2020 14:34 UTC

Return-Path: <mcr+ietf@sandelman.ca>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 021D23A0A8C for <cellar@ietfa.amsl.com>; Thu, 14 May 2020 07:34:13 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nFmi-9IIt1ul for <cellar@ietfa.amsl.com>; Thu, 14 May 2020 07:34:11 -0700 (PDT)
Received: from tuna.sandelman.ca (tuna.sandelman.ca [IPv6:2607:f0b0:f:3:216:3eff:fe7c:d1f3]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 00D843A0AAF for <cellar@ietf.org>; Thu, 14 May 2020 07:34:10 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by tuna.sandelman.ca (Postfix) with ESMTP id EDF34389D5 for <cellar@ietf.org>; Thu, 14 May 2020 10:32:03 -0400 (EDT)
Received: from tuna.sandelman.ca ([127.0.0.1]) by localhost (localhost [127.0.0.1]) (amavisd-new, port 10024) with LMTP id XpHUPRYbAz76 for <cellar@ietf.org>; Thu, 14 May 2020 10:31:59 -0400 (EDT)
Received: from sandelman.ca (obiwan.sandelman.ca [IPv6:2607:f0b0:f:2::247]) by tuna.sandelman.ca (Postfix) with ESMTP id BB5E5389D0 for <cellar@ietf.org>; Thu, 14 May 2020 10:31:59 -0400 (EDT)
Received: from localhost (localhost [IPv6:::1]) by sandelman.ca (Postfix) with ESMTP id 84A25516 for <cellar@ietf.org>; Thu, 14 May 2020 10:34:05 -0400 (EDT)
From: Michael Richardson <mcr+ietf@sandelman.ca>
To: cellar@ietf.org
In-Reply-To: <cellar-wg/ebml-specification/issues/338@github.com>
References: <cellar-wg/ebml-specification/issues/338@github.com>
X-Mailer: MH-E 8.6+git; nmh 1.7+dev; GNU Emacs 26.1
X-Face: $\n1pF)h^`}$H>Hk{L"x@)JS7<%Az}5RyS@k9X%29-lHB$Ti.V>2bi.~ehC0; <'$9xN5Ub# z!G,p`nR&p7Fz@^UXIn156S8.~^@MJ*mMsD7=QFeq%AL4m<nPbLgmtKK-5dC@#:k
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature"
Date: Thu, 14 May 2020 10:34:05 -0400
Message-ID: <8104.1589466845@localhost>
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/3Kw9nBdXM1F0PnO8Mzv9m93NTec>
Subject: [Cellar] on restricting Unknown-Sized Elements
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 14 May 2020 14:34:13 -0000

My reply:
1) EBML is already past WGLC and has been approved by the IESG, so this issue
   is not in order!

2) your complaint is well taken, and it is reasonable for specific DocTypes
(e.g., 'Matroska' or another) to state that Unknown-Sized elements are not
allowed.

3) alternatively, you could author a new top-level EBML element which states
   "there are no Unknown-Sized" elements in this document. Creating a limited
   subset of EBML for specific purposes would make a lot of sense, the same way
   that PDF/A was created. I would encourage you to pursue this direction.

https://github.com/cellar-wg/ebml-specification/issues/338

wm4 wrote:
    > This issue is about:
    > https://github.com/cellar-wg/ebml-specification/blob/master/specification.markdown#unknown-data-size

    > Normally, EBML is very simple to parse: since every element has a size,
    > elements can just be recursive parsed where each recursive call would
    > have the start end end byte position of the sub-element as
    > argument. Unknown elements can always be skipped.

    > Unknown-Sized Elements change this. Suddenly a sub-parser has to be
    > aware of all parent elements, and which elements they can contain. This
    > is a significant complication. When a parser encounters an unknown
    > sized element of unknown type, it has to assume it is a master element,
    > and has to skip through its sub-elements. Then for each element in this
    > mode, it has to determine whether this element is allowed in any of the
    > top levels, and if so "return" (assuming it's a recursive parser).

    > In addition, it becomes impossible to parse files that contain unknown
    > elements. Being able to skip unknown elements is very important for
    > extensibility, but it's not generally possible if Unknown-Sized
    > Elements are allowed. At least global elements can not be associated
    > with the correct level anymore if multiple unknown sized elements are
    > nested, and at least for CRC-32 elements this should be a real
    > problem. (Maybe I'm misunderstanding this. And I don't understand the
    > last example in the table in the paragraph in the specification link
    > above.)

    > The specification restricts use of Unknown-Sized Elements a little. For
    > one, the "EBML Schema" must set "unknownsizeallowed". And also it says
    > this:

    >> Unknown-Sized Elements MUST only be used if the Element Data Size is
    >> not known before the Element Data is written, such as in some cases of
    >> data streaming.

    > This is extremely unclear and vague. I interpret this as: "a writer can
    > use unknown-sized elements whenever it seems convenient". For example,
    > does that mean Clusters in Matroska can be unknown sized or not? Who
    > knows...

    > I find it surprising that the format has such an obscure
    > complication. Just look how EBML fared in the past. mkvtoolnix keeps
    > adding weird hacks to avoid any new elements (such as putting new
    > elements into specially named Matroska tags, instead of adding new
    > proper EBML elements) because some Matroska parsers were bad enough to
    > not skip unknown elements. And unknown elements are a pretty obvious
    > concept, and there were test files for this. How would the average
    > Matroska parser fare on files that use tricky combinations of
    > unknown-sized elements? I think this concept is too complicated.

    > I claim that unknown-sized elements are useful only in very specific
    > cases, such as specific Matroska elements when livestreaming. I suggest
    > that the EBML Schema is required to list elements which can have
    > unknown size, that no new unknown sized elements get added unless on
    > incompatible new versions of the schema, and that the use of unknown
    > sized elements is very conservative. Specifications based on EBML
    > should clearly define when the use of unknown-sized elements is allowed
    > at all. For example, Matroska could define that Segment and Cluster
    > elements can be unknown sized (and nothing else) if it's a streaming
    > case, and using known Segment or Cluster sizes would negatively affect
    > media latency. I suggest the vague sentence quoted above is removed
    > from the EBML specification.

    > --
    > You are receiving this because you are subscribed to this thread.
    > Reply to this email directly or view it on GitHub:
    > https://github.com/cellar-wg/ebml-specification/issues/338

    > ----------------------------------------------------
    > Alternatives:

    > ----------------------------------------------------


--
Michael Richardson <mcr+IETF@sandelman.ca>ca>, Sandelman Software Works
 -= IPv6 IoT consulting =-