Re: [Cellar] Benjamin Kaduk's Discuss on draft-ietf-cellar-ebml-15: (with DISCUSS and COMMENT)

Steve Lhomme <slhomme@matroska.org> Tue, 24 December 2019 15:20 UTC

Return-Path: <slhomme@matroska.org>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F08F2120119 for <cellar@ietfa.amsl.com>; Tue, 24 Dec 2019 07:20:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, SPF_HELO_NONE=0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=matroska-org.20150623.gappssmtp.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id S3zio6CEO9iG for <cellar@ietfa.amsl.com>; Tue, 24 Dec 2019 07:20:38 -0800 (PST)
Received: from mail-pj1-x1044.google.com (mail-pj1-x1044.google.com [IPv6:2607:f8b0:4864:20::1044]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0EF01120111 for <cellar@ietf.org>; Tue, 24 Dec 2019 07:20:38 -0800 (PST)
Received: by mail-pj1-x1044.google.com with SMTP id s7so1347554pjc.0 for <cellar@ietf.org>; Tue, 24 Dec 2019 07:20:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=matroska-org.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=RNucmmKeli9IAq1dEzy6/52xD0+cqv7PePI09o8tuis=; b=xFrV2TSAd9pM6AlxFQGDDL3vER3EdtYjmKikRrMy/rJEKl3EFBvBsiW4ebB4tZV25N MovAxpYgkgy/HqF+zQ8Zbl8CkIBAHM5qK/uh2lcZdfbPPKNV0UmbWUULScMFr9RsqMW0 KrRrpOmUPxCwkb72dXmB9mbG9ybKFC9AUBDToYG3sRI0vL1cWMza8pLxpsRK5A7fkBen vdKo59gfAeD97Tp96Obq8pOBgulf1AYUKVYcsbDk/LxerLl6SD61reOrmKGyNuzAngqT 2+gz0OhtzCTeeOg3Z1TMx0tmu6an/pObrydOIIOjPvE1qB5DX0IqjB5tQxlTCDdH0mGC AzhA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=RNucmmKeli9IAq1dEzy6/52xD0+cqv7PePI09o8tuis=; b=eZUkJkls31D0PURKk93U8NQudY37eU23WTvsQ8Iu07jGW/jVMRW2/zk3P1NndrvsUK hi//wi6BMoZ9Enyx3Toog83AfWlP5Hkern3PC+vCyDJ9mBgQi/2sNFwTY/fF9NqqucLm O0LF3CMROX882xC6f2/ETF2+IfG3jcYPptmxefHC2sAI1WzgahCRB2wZbJQJegu+GBEV 5OSVWNjzgWmcQufYISTheZsDnaTE/f4yyuzC1bZPCt70i8qSLRoUksO9taXdMDppGfeB Mfrn8GWLyZeItXtJcDe+CGoIxcYR5mnyDP7OB6sbkcaQm/A/z0jJ9R86IdBlKJ88BYOM aeIg==
X-Gm-Message-State: APjAAAXrQkCQxRO7qbxDDKvf5nSqNEWJPxvVa9Y19C1xuabLWw+eOeuQ Hhff/y5I1pUwK84jr4B/WTDXkc82thDp0D3jqQnksRVSdG8ZWg==
X-Google-Smtp-Source: APXvYqyltYa4z6uDiW870xG7/aQjCDEreEZotRWVvFplTLdlWmrSbsOda9Hz3eHMOP4JsRFiOsyKlW8sE2mPTypIfPk=
X-Received: by 2002:a17:90a:3ae3:: with SMTP id b90mr6699695pjc.62.1577200837199; Tue, 24 Dec 2019 07:20:37 -0800 (PST)
MIME-Version: 1.0
References: <157676970970.27491.11040479061607849531.idtracker@ietfa.amsl.com>
In-Reply-To: <157676970970.27491.11040479061607849531.idtracker@ietfa.amsl.com>
From: Steve Lhomme <slhomme@matroska.org>
Date: Tue, 24 Dec 2019 16:20:25 +0100
Message-ID: <CAOXsMFJKk3HTEjoAJ9URhGt97SA++kNDp3HCVMscj+qED5+VgA@mail.gmail.com>
To: Benjamin Kaduk <kaduk@mit.edu>
Cc: The IESG <iesg@ietf.org>, Steven Villereal <villereal@gmail.com>, draft-ietf-cellar-ebml@ietf.org, cellar-chairs@ietf.org, Codec Encoding for LossLess Archiving and Realtime transmission <cellar@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/zjqYyYP-PI6O1RwoC05B8-KZS5Q>
Subject: Re: [Cellar] Benjamin Kaduk's Discuss on draft-ietf-cellar-ebml-15: (with DISCUSS and COMMENT)
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Dec 2019 15:20:41 -0000

Hi Benjamin,

Thanks a lot for your detailed review. Below are my comments and fixes:

Le jeu. 19 déc. 2019 à 16:35, Benjamin Kaduk via Datatracker
<noreply@ietf.org> a écrit :
>
> Benjamin Kaduk has entered the following ballot position for
> draft-ietf-cellar-ebml-15: Discuss
>
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
>
>
> Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
>
>
> The document, along with other ballot positions, can be found here:
> https://datatracker.ietf.org/doc/draft-ietf-cellar-ebml/
>
>
>
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
>
> Section 7.7 says:
>
>    A Master Element MUST declare a length in octets from zero to
>    VINTMAX.  The Master Element MAY also use an unknown length.  See
>    Section 6 for rules that apply to elements of unknown length.
>
> but the second sentence contradicts the immediately prior MUST.  We need
> to resolve the internal inconsistency.

I think this has been address with this change
https://github.com/cellar-wg/ebml-specification/commit/a76fcb5a42ff543698d7292ee8a7752fe44ea1a0

"A Master Element MUST declare a length in octets from zero to VINTMAX
or be of unknown length."

> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
>
> I support Adam's Discuss regarding the Abstract.
>
> Section 2
>
>    "Parent Element": A relative term to describe the "Master Element"
>    which contains a specified element.  For any specified "EBML Element"
>    that is not at "Root Level", the "Parent Element" refers to the
>    "Master Element" in which that "EBML Element" is contained.
>
> It sounds like this is intended to be "directly" or "immediately"
> contained (in order to be unique), right?  If not, then it sould be
> ''refers to a "Master Element" in which [...]''

Added the word "directly"here
https://github.com/cellar-wg/ebml-specification/pull/318

> Section 4.1
>
>    Each Variable Size Integer begins with a VINT_WIDTH which consists of
>    zero or many zero-value bits.  The count of consecutive zero-values
>    of the VINT_WIDTH plus one equals the length in octets of the
>    Variable Size Integer.  [...]
>
> Does the following attempted rewording change the meaning?

Nope, maybe the meaning wasn't clear before. Hopefully it's better now.

> %  Each Variable Size Integer begins with a VINT_WIDTH which consists of
> %  zero or more bits set to zero.  The length in octets of the entire
> %  Variable Size Integer is determined as one plus the number of
> %  consecutive bits set to zero.
>
> (I find the current formulation rather hard to parse.)
>
> Section 6.2
>
>     | "\root\level1\level2\<global>"     | Global Element cannot be   |
>     |                                    | assumed to have this path, |
>     |                                    | while parsing "elt" it can |
>     |                                    | only be a child of "elt"   |
>
> Cannot be assumed by who/what?  My brain is trying to parse this as just
> "cannot assume this path".

Rephrased to
Global Element cannot be interpreted with this path, while parsing
`elt` a Global Element can only be a child of `elt`

https://github.com/cellar-wg/ebml-specification/pull/319

> Section 7.5
>
> Should we say anything about termination of a UTF-8 string needing to
> still result in valid UTF-8 (i.e., not insert NULs in the middle of a
> codepoint)?

We already say:

A UTF-8 Element contains only a valid Unicode string as defined in
[RFC3629], with an exception made for termination

We also describe in Section 13 how NUL can be added at the end of
strings. That's probably where we should mention it should still be a
valid string. See this Pull Request
https://github.com/cellar-wg/ebml-specification/pull/320

> Section 7.7
>
>    stored within Master Elements SHOULD only consist of EBML Elements
>    and SHOULD NOT contain any data that is not part of an EBML Element.
>
> When might this SHOULD (NOT) be violated?

I think the idea is that a reader should not assume data are always
correct/clean. Files can get damaged, in which case some parts of them
may not look valid anymore.
I'm OK with changing it to MUST NOT, since error recovery is described
elsewhere. It would not be an error if junk data was considered OK.

> Section 8.2
>
>    part of an EBML Element.  This document defines precisely which EBML
>    Elements are to be used within the EBML Header, but does not name or
>
> (for EBMLVersion 1 only, right?)

Yes, we kind of mention it here:
The EBML Header of an EBML Document that uses an EBMLVersion of 1 MUST
only contain EBML Elements that are defined as part of this document.

> Section 11.1
>
>    Element; for example matroska or webm (see Section 11.2.6).  The
>    DocType value for an EBML Document Type MUST be unique and
>    persistent.
>
> It might be appropriate to refer to Section 17.2 and/or the IANA
> registry for DocType values, here.

Added in https://github.com/cellar-wg/ebml-specification/pull/321

>    EBMLVersion to only support a value of "1".  If an EBML Schema adopts
>    the EBML Header Element as-is, then it is not required to document
>    that Element within the EBML Schema.  If an EBML Schema constrains
>
> Does "as-is" imply some level of future-compatibility/extensibility for
> when EBMLVersions other than "1" are defined?

That's currently a grey area. If there was an EBML Header version 2 we
can't assume the Matroska Schema to use it because we don't know if we
would want modifications (as we already do) on top of it or not.

So the EBML Schema should probably tell which EBML Header version it's
based on. Probably a ebml="1" attribute in the EBMLSchema element.
See Pull Request https://github.com/cellar-wg/ebml-specification/pull/322

> Section 11.1.1
>
> It's a little amusing that we bother to provide "default" attributes
> when the "range" attribute uniquely determines the allowed value.
>
> Section 11.1.4
>
>    Each "<element>" defines one EBML Element through the use of several
>    attributes that are defined in Section 11.1.3.  EBML Schemas MAY
>
> I think this makes more sense as "Section 11.1.5".

You're right. Fixed in https://github.com/cellar-wg/ebml-specification/pull/323

> Section 11.1.5.2
>
> This ABNF seems to only allow "direct" recursion where element <x>
> appears directly inside element <x>, without any intermediate elements.
> I assume that's the intent, though it would be surprising in a
> general-purpose markup language.

Are you referring to the EBMLPathAtomRecursive ? It still has a parent
so it can be in another element as well.

>    In some cases the EBMLLastParent part of the path is an
>    EBMLGlobalParent.  A path with a EBMLGlobalParent defines a
>    Section 11.3.  Any path that starts with the EBMLFixedParent of the
>
> That second sentence doesn't parse.

Fixed in https://github.com/cellar-wg/ebml-specification/commit/55876fc0c306a54b08c79963a85d6aa3d4f9cfd6

>    As an example, a "path" of "1*(\Segment\Info)" means the element Info
>    is found inside the Segment elements at least once and with no
>    maximum iteration.  An element SeekHead with path
>    "0*2(\Segment\SeekHead)" may not be found at all in its Segment
>    parent, once or twice but no more than that.
>
> The way this text is written makes me want to interpret the path
> occurence counts more like the (regular) minOccurs/maxOccurs element
> attributes, as opposed to applying to the path components to get to the
> specific element in question.

In fact it's a remaining part of text from when the path did include
minOccurs/maxOccurs . This is not the case anymore and this text will
be removed:
https://github.com/cellar-wg/ebml-specification/pull/324

> Section 11.1.9.2
>
>    <element name="Item" path="1*1(\Items)" id="0x4025" type="master"
>      minOccurs="1" maxOccurs="1">
>      <documentation lang="en" purpose="definition">
>        A set of items.
>
> Is this "name" supposed to be "Item" or "Items"?

It's a typo. Also the path should not contain the occurrences. Fixed
in the #324 PR.

> Section 11.1.10-11.1.12
>
> I'm not sure I have a full understanding of how <restriction>/<enum> are
> used; perhaps a reference to the corresponding XML behavior is in order?

Ping Dave Rice on that one.

> Section 11.1.13-11.1.14
>
> The <extention type="..."> usage seems underspecified.
>
> Section 11.1.15
>
>        <xs:attribute name="path" use="required">
>          <!-- <xs:simpleType>
>            <xs:restriction base="xs:integer">
>              <xs:pattern value="[0-9]*\*[0-9]*()"/>
>            </xs:restriction>
>          </xs:simpleType> -->
>        </xs:attribute>
>
> Why do we include this commented-out snippet?

It's an include of a XSD file that is actually usable to validate an
EBML Schema. This commented out code is there because we should fix it
to parse properly the path values. Should it be removed ?

>        <xs:attribute name="unknownsizeallowed" type="xs:boolean"/>
>        <xs:attribute name="recurring" type="xs:boolean"/>
>
> Don't we effectively set default values for these two in the prose
> description?

"If the unknownsizeallowed attribute is not used then that EBML
Element is not allowed to use an unknown Element Data Size."
I guess we could use false as the default value.

"If the recurring attribute is not present then the EBML Element is
not an Identically Recurring Element."
Same here

See https://github.com/cellar-wg/ebml-specification/pull/325

> Section 11.1.16
>
>    Identically Recurring Elements SHOULD include a CRC-32 Element as a
>    Child Element; this is especially recommended when EBML is used for
>    long-term storage or transmission.  If a Parent Element contains more
>
> I'm not sure if the "long-term" is intended to also bind as "long-term
> transmission" (though I'm not sure what it would mean in that case).
> It's also not entirely clear what kinds of transmission would benefit
> from this, as reliable media presumably don't need redundancy for
> reliability, but unreliable media can't really be used to carry EBML
> without some framing requirements to know when elements start.

Actually, as long as you have the "packets" in the right order you can
use the EBML stream. You can also use it if you're missing some
packets. The Checksum can help determine if the data are valid or not,
even if the underlying transport loses some bits. It could technically
be used as-is as protocol on top of IP, like TCP or UDP.

> Section 11.1.18
>
>    If a Mandatory EBML Element has no default value declared by an EBML
>    Schema and its Parent Element is present then the EBML Element MUST
>    be present as well.  If a Mandatory EBML Element has a default value
>    declared by an EBML Schema and its Parent Element is present and the
>    value of the EBML Element is NOT equal to the declared default value
>    then the EBML Element MUST be present.
>
> This seems almost tautological, in that how would an EBML Element have a
> value if it was not present?  (The following paragraph that talks about
> when to write such elements, does make more sense.)

The difference here is whether the element has a value and is present.
If you read the negative of that sentence, an EBML Element MAY NOT be
present if it has the default value even if it's mandatory.

> Section 11.3.1
>
>    path: "*1((1*\)\CRC-32)"
>
> Using backslash as both an escape character and a path separator makes
> my head hurt, and I did not have enough caffeine yet this morning to
> figure it out.

All the pathes actually have the remaining minOccurs/maxOccurs that
were removed recently. I'll remove them via #324

After that the two global elements (which used to be tricky to read
even for a trained eye) do not match the ABNF. Which also adds extra
parenthesis a little too much. I created an issue to fix once #324 is
merge https://github.com/cellar-wg/ebml-specification/issues/326

>    8.1.1.6.2 of [ITU.V42.1994], with initial value of 0xFFFFFFFF.  The
>    CRC value MUST be computed on a little endian bitstream and MUST use
>    little endian storage.
>
> bitstream or bytestream?

Should be bytestream.
https://github.com/cellar-wg/ebml-specification/pull/327

> Section 12
>
>    If a Master Element contains a CRC-32 Element that doesn't validate,
>    then the EBML Reader MAY ignore all contained data except for
>    Descendant Elements that contain their own valid CRC-32 Element.
>
> Ignoring only part of the known questionable content could have
> significant security considerations, if (e.g.) security-relevant
> restrictions are in the garbled part of the document but the sensitive
> content has a (valid) redundant CRC.

That's why it's a MAY. If a Matroska Segment has a CRC and each frame
in it has a CRC. If the top CRC is invalid, we can still use some of
the frames that have a valid CRC. It's not a requirement but a
possibility.

> [review terminated early]
>
>
> _______________________________________________
> Cellar mailing list
> Cellar@ietf.org
> https://www.ietf.org/mailman/listinfo/cellar



-- 
Steve Lhomme
Matroska association Chairman