Re: [Cellar] Second AD review of draft-ietf-cellar-ebml-10

Dave Rice <dave@dericed.com> Thu, 11 July 2019 13:54 UTC

Return-Path: <dave@dericed.com>
X-Original-To: cellar@ietfa.amsl.com
Delivered-To: cellar@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 585451200A3 for <cellar@ietfa.amsl.com>; Thu, 11 Jul 2019 06:54:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.118
X-Spam-Level:
X-Spam-Status: No, score=-1.118 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NEUTRAL=0.779, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id E23wurKFnv58 for <cellar@ietfa.amsl.com>; Thu, 11 Jul 2019 06:53:59 -0700 (PDT)
Received: from server172-3.web-hosting.com (server172-3.web-hosting.com [68.65.122.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 6AEB01200CD for <cellar@ietf.org>; Thu, 11 Jul 2019 06:53:59 -0700 (PDT)
Received: from [146.96.19.240] (port=13116 helo=[10.10.201.45]) by server172.web-hosting.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES256-GCM-SHA384:256) (Exim 4.92) (envelope-from <dave@dericed.com>) id 1hlZWD-003oxr-O9; Thu, 11 Jul 2019 09:53:58 -0400
From: Dave Rice <dave@dericed.com>
Message-Id: <89CEEC4A-78D9-40BF-8A4D-732C7A199F30@dericed.com>
Content-Type: multipart/alternative; boundary="Apple-Mail=_AF05F1A0-7488-43C1-ADF5-5D1DACE7BE3E"
Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.8\))
Date: Thu, 11 Jul 2019 09:53:52 -0400
In-Reply-To: <52b3f63f-fddb-4438-be5f-f61359307f98@www.fastmail.com>
Cc: cellar@ietf.org
To: Alexey Melnikov <aamelnikov@fastmail.fm>
References: <3835cda8-7bfb-4178-bec7-b0acff9327ba@www.fastmail.com> <F50D112A-91E8-482B-A78F-8557480331BC@dericed.com> <52b3f63f-fddb-4438-be5f-f61359307f98@www.fastmail.com>
X-Mailer: Apple Mail (2.3445.104.8)
X-OutGoing-Spam-Status: No, score=-2.4
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - server172.web-hosting.com
X-AntiAbuse: Original Domain - ietf.org
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - dericed.com
X-Get-Message-Sender-Via: server172.web-hosting.com: authenticated_id: dave@dericed.com
X-Authenticated-Sender: server172.web-hosting.com: dave@dericed.com
X-Source:
X-Source-Args:
X-Source-Dir:
X-From-Rewrite: unmodified, already matched
Archived-At: <https://mailarchive.ietf.org/arch/msg/cellar/heFKpd1ioPWnvuJcibuTj9xdkIQ>
Subject: Re: [Cellar] Second AD review of draft-ietf-cellar-ebml-10
X-BeenThere: cellar@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Codec Encoding for LossLess Archiving and Realtime transmission <cellar.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cellar>, <mailto:cellar-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cellar/>
List-Post: <mailto:cellar@ietf.org>
List-Help: <mailto:cellar-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cellar>, <mailto:cellar-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 11 Jul 2019 13:54:03 -0000


> On Jul 11, 2019, at 9:36 AM, Alexey Melnikov <aamelnikov@fastmail.fm> wrote:
> 
> Hi Dave,
> I removed comments where we are in agreement. A few followups:
> 
> On Sat, Jul 6, 2019, at 6:23 PM, Dave Rice wrote:
>> Hello Alexey,
>> 
>> Thanks much for providing this thorough review. I’ll try to reply point by point below with references to either pull requests or issues or offer followup questions. In a few places I ping Steve Lhomme and Michael Richardson as the comments relate to text they originated in their work.
>> 
>>> EBMLElementOccurrence    = [EBMLMinOccurrence] "*" [EBMLMaxOccurrence]
>>> EBMLMinOccurrence        = 1*DIGIT
>>> EBMLMaxOccurrence        = 1*DIGIT
>>> 
>>> Are there any upper limits on allowed values for these fields? Even if you don't encode them using ABNF, it would be good to mention them in an ABNF comment.
>>> 
>>> VariableParentOccurrence = [PathMinOccurrence] "*" [PathMaxOccurrence]
>>> PathMinOccurrence        = 1*DIGIT
>>> PathMaxOccurrence        = 1*DIGIT
>>> 
>>> Same comment as above.
>> 
>> I looked for examples of that sort of commenting but didn’t find much guidance. Eventually I simply appended " ; no upper limit” to each of the four referenced lines and added that to https://github.com/cellar-wg/ebml-specification/pull/265 <https://github.com/cellar-wg/ebml-specification/pull/265>.
> I think this is the wrong fix. Is it sufficient for an implementation to use 32 bit value to represent any of these? 64 bit value?
> "no upper limit" is not going to be interoperable.

The smallest possible EBML Element is 2 bytes (1 bytes Element ID, 1 byte Element Data Size, and 0 bytes of Element Data). The upper limit of how many occurrences would be determined by the limit of the Element Data Size of the Parent Element. If the EBML Document has EBMLMaxSizeLength as the default of 8, then the upper limit of an Element Data Size is 72,057,594,037,927,934 bytes. Is the smallest possible Element is 2 bytes, then the upper limit of that Element’s occurrence would be 72,057,594,037,927,934 divided by two.

So if EBMLMaxSizeLength = 8, then this is possible

<RootElement>
	<TwoByteElement/>  # 36,028,797,018,963,967 occurrences
</RootElement>

but this would overflow the Element Data Size of the Root Element
<RootElement>
	<TwoByteElement/>  # 36,028,797,018,963,968 occurrences
</RootElement>

However, EBML allows the EBML Document Type to set an EBMLMaxSizeLength value higher than 8, and as that is incremented up the upper limit would expand exponentially.

In my draft I would considering "no upper limit” and 36,028,797,018,963,967 to effectively be the same or that the definition does not define the limit but in practice it would be limited by the capacity of the Element Data Size of the parent Element.

>>> 11.1.10.1.  label
>>> 
>>>   The label provides a concise expression for human consumption that
>>>   describes what the value of the "<enum>" represents.
>>> 
>>> Is it worth adding a cross reference to the "lang" attribute here?
>> 
>> Do you mean to express the language of the term used within the label? Currently the language of the label is undefined and since it is an attribute that label is not repeatable.
> 
> To be honest I am not yet sure how I feel about "undefined language" here. Need to think about that.
> But either way, I think adding some text that "lang" attribute doesn't apply would be helpful.

In the case of Matroska, the labels are in English. In the EBML definition we could say that the labels are in English unless the definition of that associated EBML Document Type claims otherwise.

>>> 12.  Considerations for Reading EBML Data
>>> 
>>>   If a Master Element contains a CRC-32 Element that doesn't validate,
>>>   then the EBML Reader MAY ignore all contained data except for
>>>   Descendant Elements that contain their own valid CRC-32 Element.
>>> 
>>> I don't fully understand your use of "MAY ... except ..." here.
>>> Can you elaborate on why would an implementation ignore data contained in a Master Element and not ignore Descendant Elements, even if they own CRC-32 is valid?
>> 
>> For instance if a Matroska file has three metadata tags and each has a CRC value and so does the parent Tags element like this.
>> 
>> <Tags crc=invalid>
>>   <Tag crc=valid>
>>   <Tag crc=valid>
>>   <Tag crc=invalid>
>> <Tags>
>> 
>> We’re trying to say that even though the contents of the <Tags> element is invalid, that the valid child elements may still be used.
> So to me this means that after discarding all invalid elements you end up with something like this:
> 
> <Tags >
>   <Tag crc=valid>
> </Tags>
> 
> As this is an incomplete document, I am struggling to understand what it can be used for?

In that case, the valid Tags are still useful even if within a parent element whos contents are invalid. Perhaps another example would be in the Attachments Element:

<Attachments crc=invalid>
	<Attachment crc=invalid>Poster Art</Attachment>
	<Attachment crc=valid>Subtitle Font</Attachment>
</Attachments>

Here some bit damage occurs to the Poster Art so the Attachment and its parent Attachments now have invalid crcs, however the subtitle font is still ok and could be used in the presentation without an issue.

>> An invalid descendant element would make all ancestor element invalid, but should not necessary make all sibling elements unusable.
>> 
> 
> Best Regards,
> Alexey
> _______________________________________________
> Cellar mailing list
> Cellar@ietf.org <mailto:Cellar@ietf.org>
> https://www.ietf.org/mailman/listinfo/cellar <https://www.ietf.org/mailman/listinfo/cellar>