Re: [Cellar] Second AD review of draft-ietf-cellar-ebml-10

Dave Rice <dave@dericed.com> Thu, 24 October 2019 14:03 UTC

Return-Path: <dave@dericed.com>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6FC8212086A for <cellar@ietfa.amsl.com>; Thu, 24 Oct 2019 07:03:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.118
X-Spam-Level:
X-Spam-Status: No, score=-1.118 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sHvagdbsYb8y for <cellar@ietfa.amsl.com>; Thu, 24 Oct 2019 07:02:59 -0700 (PDT)
Received: from server172-3.web-hosting.com (server172-3.web-hosting.com [68.65.122.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B5D18120840 for <cellar@ietf.org>; Thu, 24 Oct 2019 07:02:59 -0700 (PDT)
Received: from [146.96.19.240] (port=40752 helo=[10.10.201.21]) by server172.web-hosting.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from <dave@dericed.com>) id 1iNdhV-003Dse-Np; Thu, 24 Oct 2019 10:02:59 -0400
From: Dave Rice <dave@dericed.com>
Message-Id: <C101AC7B-16A8-4700-947C-40347C7D8DFB@dericed.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_C90463EA-8D7E-4C89-AA95-9F4C237031A1"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\))
Date: Thu, 24 Oct 2019 10:02:48 -0400
In-Reply-To: <8cdc2a28-787b-419e-84b5-9b998a18390d@www.fastmail.com>
Cc: cellar@ietf.org
To: Alexey Melnikov <aamelnikov@fastmail.fm>
References: <3835cda8-7bfb-4178-bec7-b0acff9327ba@www.fastmail.com> <F50D112A-91E8-482B-A78F-8557480331BC@dericed.com> <52b3f63f-fddb-4438-be5f-f61359307f98@www.fastmail.com> <89CEEC4A-78D9-40BF-8A4D-732C7A199F30@dericed.com> <8cdc2a28-787b-419e-84b5-9b998a18390d@www.fastmail.com>
X-Mailer: Apple Mail (2.3445.104.8)
X-OutGoing-Spam-Status: No, score=-2.4
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server172.web-hosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - dericed.com
X-Get-Message-Sender-Via: server172.web-hosting.com: authenticated_id: dave@dericed.com
X-Authenticated-Sender: server172.web-hosting.com: dave@dericed.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/hbagGphEBe9F8ypl86xXukk5VYM>
Subject: Re: [Cellar] Second AD review of draft-ietf-cellar-ebml-10
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 24 Oct 2019 14:03:03 -0000

Hi Alexey,

> On Oct 24, 2019, at 8:33 AM, Alexey Melnikov <aamelnikov@fastmail.fm> wrote:
> 
> Hi Dave,
> 
> On Thu, Jul 11, 2019, at 2:53 PM, Dave Rice wrote:
>>> On Jul 11, 2019, at 9:36 AM, Alexey Melnikov <aamelnikov@fastmail.fm <mailto:aamelnikov@fastmail.fm>> wrote:
>>> 
>>> Hi Dave,
>>> I removed comments where we are in agreement. A few followups:
>>> 
>>> On Sat, Jul 6, 2019, at 6:23 PM, Dave Rice wrote:
>>>> Hello Alexey,
>>>> 
>>>> Thanks much for providing this thorough review. I’ll try to reply point by point below with references to either pull requests or issues or offer followup questions. In a few places I ping Steve Lhomme and Michael Richardson as the comments relate to text they originated in their work.
>>>> 
>>>>> EBMLElementOccurrence    = [EBMLMinOccurrence] "*" [EBMLMaxOccurrence]
>>>>> EBMLMinOccurrence        = 1*DIGIT
>>>>> EBMLMaxOccurrence        = 1*DIGIT
>>>>> 
>>>>> Are there any upper limits on allowed values for these fields? Even if you don't encode them using ABNF, it would be good to mention them in an ABNF comment.
>>>>> 
>>>>> VariableParentOccurrence = [PathMinOccurrence] "*" [PathMaxOccurrence]
>>>>> PathMinOccurrence        = 1*DIGIT
>>>>> PathMaxOccurrence        = 1*DIGIT
>>>>> 
>>>>> Same comment as above.
>>>> 
>>>> I looked for examples of that sort of commenting but didn’t find much guidance. Eventually I simply appended " ; no upper limit” to each of the four referenced lines and added that to https://github.com/cellar-wg/ebml-specification/pull/265 <https://github.com/cellar-wg/ebml-specification/pull/265>.
>>> I think this is the wrong fix. Is it sufficient for an implementation to use 32 bit value to represent any of these? 64 bit value?
>>> "no upper limit" is not going to be interoperable.
>> 
>> The smallest possible EBML Element is 2 bytes (1 bytes Element ID, 1 byte Element Data Size, and 0 bytes of Element Data). The upper limit of how many occurrences would be determined by the limit of the Element Data Size of the Parent Element. If the EBML Document has EBMLMaxSizeLength as the default of 8, then the upper limit of an Element Data Size is 72,057,594,037,927,934 bytes. Is the smallest possible Element is 2 bytes, then the upper limit of that Element’s occurrence would be 72,057,594,037,927,934 divided by two.
>> 
>> So if EBMLMaxSizeLength = 8, then this is possible
>> 
>> <RootElement>
>> <TwoByteElement/>  # 36,028,797,018,963,967 occurrences
>> </RootElement>
>> 
>> but this would overflow the Element Data Size of the Root Element
>> <RootElement>
>> <TwoByteElement/>  # 36,028,797,018,963,968 occurrences
>> </RootElement>
>> 
>> However, EBML allows the EBML Document Type to set an EBMLMaxSizeLength value higher than 8, and as that is incremented up the upper limit would expand exponentially.
> I will try one more time: if I write an implementation that has hardcoded limit of 8 would it be considered compliant with this specification. I think asking to support arbitrary length values is a big ask.

Matroska, as a document type of EBML, does this. The draft for Matroska has a section “Added Constraints on EBML” that limits EBMLMaxSizeLength to a range of 1-8 at https://tools.ietf.org/html/draft-ietf-cellar-matroska-03#section-6.1 <https://tools.ietf.org/html/draft-ietf-cellar-matroska-03#section-6.1> and that is reiterated in the EBML Schema for Matroska at https://tools.ietf.org/html/draft-ietf-cellar-matroska-03#section-9.2 <https://tools.ietf.org/html/draft-ietf-cellar-matroska-03#section-9.2>.

Though EBML does not limit the EBMLMaxSizeLength, the default value is 8 (see https://www.ietf.org/id/draft-ietf-cellar-ebml-13.html#name-ebmlmaxsizelength-element <https://www.ietf.org/id/draft-ietf-cellar-ebml-13.html#name-ebmlmaxsizelength-element>), so a document type definition would have to opt into a >8 value if they felt as if it was needed.

The fifth paragraph of https://www.ietf.org/id/draft-ietf-cellar-ebml-13.html#name-ebml-schema <https://www.ietf.org/id/draft-ietf-cellar-ebml-13.html#name-ebml-schema> starting with "An EBML Schema MAY constrain the use of EBML Header Elements” discusses the scenario of using a non-default value for EBMLMaxSizeLength in an EBML Schema that defines a EBML document type.

>> In my draft I would considering "no upper limit” and 36,028,797,018,963,967 to effectively be the same or that the definition does not define the limit but in practice it would be limited by the capacity of the Element Data Size of the parent Element.
> I will let the document start IETF LC without this issue being fully resolved, but I suspect it will come up again during IESG review.
> 
>>>>> 11.1.10.1.  label
>>>>> 
>>>>>   The label provides a concise expression for human consumption that
>>>>>   describes what the value of the "<enum>" represents.
>>>>> 
>>>>> Is it worth adding a cross reference to the "lang" attribute here?
>>>> 
>>>> Do you mean to express the language of the term used within the label? Currently the language of the label is undefined and since it is an attribute that label is not repeatable.
>>> 
>>> To be honest I am not yet sure how I feel about "undefined language" here. Need to think about that.
>>> But either way, I think adding some text that "lang" attribute doesn't apply would be helpful.
>> 
>> In the case of Matroska, the labels are in English. In the EBML definition we could say that the labels are in English unless the definition of that associated EBML Document Type claims otherwise.
> Ok.
> 
>>>>> 12.  Considerations for Reading EBML Data
>>>>> 
>>>>>   If a Master Element contains a CRC-32 Element that doesn't validate,
>>>>>   then the EBML Reader MAY ignore all contained data except for
>>>>>   Descendant Elements that contain their own valid CRC-32 Element.
>>>>> 
>>>>> I don't fully understand your use of "MAY ... except ..." here.
>>>>> Can you elaborate on why would an implementation ignore data contained in a Master Element and not ignore Descendant Elements, even if they own CRC-32 is valid?
>>>> 
>>>> For instance if a Matroska file has three metadata tags and each has a CRC value and so does the parent Tags element like this.
>>>> 
>>>> <Tags crc=invalid>
>>>>   <Tag crc=valid>
>>>>   <Tag crc=valid>
>>>>   <Tag crc=invalid>
>>>> <Tags>
>>>> 
>>>> We’re trying to say that even though the contents of the <Tags> element is invalid, that the valid child elements may still be used.
>>> So to me this means that after discarding all invalid elements you end up with something like this:
>>> 
>>> <Tags >
>>>   <Tag crc=valid>
>>> </Tags>
>>> 
>>> As this is an incomplete document, I am struggling to understand what it can be used for?
>> 
>> In that case, the valid Tags are still useful even if within a parent element whos contents are invalid. Perhaps another example would be in the Attachments Element:
>> 
>> <Attachments crc=invalid>
>> <Attachment crc=invalid>Poster Art</Attachment>
>> <Attachment crc=valid>Subtitle Font</Attachment>
>> </Attachments>
>> 
>> Here some bit damage occurs to the Poster Art so the Attachment and its parent Attachments now have invalid crcs, however the subtitle font is still ok and could be used in the presentation without an issue.
> I suggest some examples along the lines of your reply should be added to the document. This is not a property commonly found in IETF formats, so talking more about it would be useful.

I drafted an example at https://github.com/cellar-wg/ebml-specification/pull/296 <https://github.com/cellar-wg/ebml-specification/pull/296>.
Thanks much,
Dave Rice