Re: [Cbor] Validation of maps

Henk Birkholz <henk.birkholz@sit.fraunhofer.de> Fri, 06 October 2017 10:30 UTC

Return-Path: <henk.birkholz@sit.fraunhofer.de>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7E9BE134293 for <cbor@ietfa.amsl.com>; Fri, 6 Oct 2017 03:30:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.85
X-Spam-Level:
X-Spam-Status: No, score=-5.85 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DATE_IN_PAST_12_24=1.049, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TLOrG_IymTSB for <cbor@ietfa.amsl.com>; Fri, 6 Oct 2017 03:29:57 -0700 (PDT)
Received: from mail-edgeS23.fraunhofer.de (mail-edges23.fraunhofer.de [153.97.7.23]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 86CE9133049 for <cbor@ietf.org>; Fri, 6 Oct 2017 03:29:55 -0700 (PDT)
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A2G4AQCwWddZ/xoHYZlbGQEBAQEBAQEBAQEBBwEBAQEBgm9uZG4nB4NzmgWBVCKYQQoYAQyFFgKEIFcBAgEBAQEBAgNoKIJlRiwBAQEBAQEmAQEBAQEBIwI+LAEBAQECAQEBIQpBEAsJAhEEAQEBJwMCAicfCQgGARIbig0HBQyIA51ngieLKQEBAQEBAQEBAQEBAQEBAQEBAQEBARgFgy2CAoFRgWorC4FAgTOEZAEBHjaCXS+CMgWYWIhbgQiBJoUwjxuFb4NVhzKIToxeAgQGBQIZAYE5WIEOUyZdEgGHDHQBhngNGAeBBQGBDwEBAQ
X-IPAS-Result: A2G4AQCwWddZ/xoHYZlbGQEBAQEBAQEBAQEBBwEBAQEBgm9uZG4nB4NzmgWBVCKYQQoYAQyFFgKEIFcBAgEBAQEBAgNoKIJlRiwBAQEBAQEmAQEBAQEBIwI+LAEBAQECAQEBIQpBEAsJAhEEAQEBJwMCAicfCQgGARIbig0HBQyIA51ngieLKQEBAQEBAQEBAQEBAQEBAQEBAQEBARgFgy2CAoFRgWorC4FAgTOEZAEBHjaCXS+CMgWYWIhbgQiBJoUwjxuFb4NVhzKIToxeAgQGBQIZAYE5WIEOUyZdEgGHDHQBhngNGAeBBQGBDwEBAQ
X-IronPort-AV: E=Sophos; i="5.42,482,1500933600"; d="scan'208,217"; a="53639907"
Received: from mail-mtas26.fraunhofer.de ([153.97.7.26]) by mail-edgeS23.fraunhofer.de with ESMTP/TLS/DHE-RSA-CAMELLIA256-SHA; 06 Oct 2017 12:29:51 +0200
X-IronPort-AV: E=Sophos;i="5.42,482,1500933600"; d="scan'208,217";a="264751152"
X-IronPort-Outbreak-Status: No, level 0, Unknown - Unknown
Received: from mailext.sit.fraunhofer.de ([141.12.72.89]) by mail-mtaS26.fraunhofer.de with ESMTP/TLS/DHE-RSA-AES256-SHA; 06 Oct 2017 12:29:50 +0200
Received: from mail.sit.fraunhofer.de (mail.sit.fraunhofer.de [141.12.84.171]) by mailext.sit.fraunhofer.de (8.14.4/8.14.4/Debian-4.1ubuntu1) with ESMTP id v96ATlnJ013225 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 6 Oct 2017 12:29:48 +0200
Received: from android-86100a6c0956de20.local (88.67.27.241) by mail.sit.fraunhofer.de (141.12.84.171) with Microsoft SMTP Server (TLS) id 14.3.361.1; Fri, 6 Oct 2017 12:29:42 +0200
Date: Thu, 05 Oct 2017 18:24:10 +0200
User-Agent: K-9 Mail for Android
In-Reply-To: <026b01d33df5$27919310$76b4b930$@augustcellars.com>
References: <e16da575-bbed-1f52-c754-9938237aa6bc@obj-sys.com> <3FFCD42B-C1DE-43E3-A06D-608CACD55D86@tzi.org> <CANh-dXnucjNP=eZfrEcrVC6HN0XHk0dcw-C+J56rksWxMbX8=A@mail.gmail.com> <HE1PR0701MB253924A5FBD83848583C053898700@HE1PR0701MB2539.eurprd07.prod.outlook.com> <HE1PR0701MB2539BFF4100C57A6C70CC94E98700@HE1PR0701MB2539.eurprd07.prod.outlook.com> <026b01d33df5$27919310$76b4b930$@augustcellars.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----WRY2TZ5CKY544SVXMTJ650BR8ZYAMY"
Content-Transfer-Encoding: 7bit
To: cbor@ietf.org, Jim Schaad <ietf@augustcellars.com>, 'Francesca Palombini' <francesca.palombini@ericsson.com>
From: Henk Birkholz <henk.birkholz@sit.fraunhofer.de>
Message-ID: <D90BEE91-32D1-4483-9D1A-7AEA97005E9B@sit.fraunhofer.de>
X-Originating-IP: [88.67.27.241]
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/FMiOQIh6YfEsVX2todizitDHQXk>
Subject: Re: [Cbor] Validation of maps
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 06 Oct 2017 10:30:01 -0000

I tend to agree (pols). Other thoughts?

Henk

On October 5, 2017 6:15:23 PM GMT+02:00, Jim Schaad <ietf@augustcellars.com> wrote:
>My opinion on this issue is that the open choice only matches if there
>is not an explicit choice with that value in the map.  It is the
>intention that this is what is there and if you have a fast parser
>which check types, then it should fail.  The code is going to presume
>that the value with tag 4 is an int and should not get surprised in the
>middle of processing, it should fail during validation if the parser is
>that smart.
>
> 
>
>Thus  { ?4=>int, * =>any}  should fail for { 4:”text”} 
>
> 
>
>Jim
>
> 
>
> 
>
>From: CBOR [mailto:cbor-bounces@ietf.org] On Behalf Of Francesca
>Palombini
>Sent: Thursday, October 5, 2017 3:53 AM
>To: Carsten Bormann <cabo@tzi.org>; cbor@ietf.org
>Subject: Re: [Cbor] Validation of maps
>
> 
>
>Sorry, early sent. The opinion I’d like to hear from the working group:
>is cuts something we want to consider putting in at this point?
>
>
>Francesca
>
> 
>
>From: Francesca Palombini 
>Sent: den 5 oktober 2017 12:51
>To: 'Jeffrey Yasskin' <jyasskin@chromium.org
><mailto:jyasskin@chromium.org> >; Carsten Bormann <cabo@tzi.org
><mailto:cabo@tzi.org> >
>Cc: Kevin Braun <kbraun@obj-sys.com <mailto:kbraun@obj-sys.com> >;
>cbor@ietf.org <mailto:cbor@ietf.org> 
>Subject: RE: [Cbor] Validation of maps
>
> 
>
>Agreed with Jeffrey.
>
> 
>
>Reviving this thread to ask the opinion from the working group: 
>
> 
>
>From: CBOR [mailto:cbor-bounces@ietf.org] On Behalf Of Jeffrey Yasskin
>Sent: den 20 juli 2017 03:16
>To: Carsten Bormann <cabo@tzi.org <mailto:cabo@tzi.org> >
>Cc: Kevin Braun <kbraun@obj-sys.com <mailto:kbraun@obj-sys.com> >;
>cbor@ietf.org <mailto:cbor@ietf.org> 
>Subject: Re: [Cbor] Validation of maps
>
> 
>
>By the time CDDL makes it to an RFC, we should be answering questions
>like this by quoting normative text from
>https://tools.ietf.org/html/draft-greevenbosch-appsawg-cbor-cddl-11#section-3.5,
>not just pointing at examples.
>
> 
>
>Jeffrey
>
> 
>
>On Tue, Jul 18, 2017 at 11:18 PM, Carsten Bormann <cabo@tzi.org
><mailto:cabo@tzi.org> > wrote:
>
>Hi Kevin,
>
>> I know the question of more formally specifying validation rules
>already came up.  One would think map validation would be fairly
>obvious, but what happens when key types overlap?
>>
>> For example, I think the intention is that if you have
>>
>>   top = { 4 => int, *int => tstr }
>>
>> then the key 4 must be present with an integer value,
>
>Right, that is the only way to match the first field.
>(And there is no way to have that as well as another /4/ key with a
>text string value.)
>
>> and you can have any number of other integer keys with text string
>values. Okay, but what about:
>>
>> top = { ? 4 => int, *int => tstr }
>>
>> We might say this means that if a key of 4 appears, then it must have
>an int value.  Or, does it allow a key of 4 to appear with a text
>string value while considering the optional "4 => int" as being absent?
>
>Yes, that is the semantics.  It is not always what a specifier might
>intend.
>
>The reason is that the map opens a choice point.  A member with key 4
>is starting to match the field.  If the value however does not  match
>(because there is no int), the matcher falls back to the choice point. 
>It then tries the other field, and indeed, that matches.
>
>In the research underlying CDDL, we have discussed “cuts” (a concept
>from error handling in Parse Expression Grammars (PEGs)) as the
>solution to this.  If ^ represents a cut, write:
>
>top = { ? 4 ^ => int, *int => tstr }
>
>Once the 4 matches, there is no way back; for this member, another
>match is no longer tried.
>A nice side effect is that anything except an int after a key of 4 can
>give a definite error message of “int expected”.
>The cut proposal includes : as an abbreviation for ^=>, so you can
>simply write:
>
>top = { ? 4: int, *int => tstr }
>
>> Given the examples in the spec, I guess the intention is for such a
>thing to mean the key 4, if present, has to have an int value.
>
>Which example leads you to this conclusion?
>
>>  So, there is some kind of "match the most specific key" rule implied
>(I guess).
>
>Actually, the PEG semantics we have borrowed here is that the *first*
>match is used.  But only rules are matched that indeed match!
>
>> How that rule applies in more complex situations (where there is some
>kind of nesting) probably needs to be spelled out....  Given:
>>
>>   top = { 1 => 1, ? ( 5 => 5, 6 => 6 ), *int => tstr }
>>
>> Must keys 5 & 6 be present together,
>
>Yes.
>
>The whole group in the parentheses is optional.
>
>> or does the wildcard allow only one of them to appear?
>
>(That was an early semantics we tried, and it leads down the drain.
>It is much better to have a matcher that simply and stupidly follows
>what’s in the grammar.)
>
>> Or, given:
>>
>>   top = { 1 => 1, ( 5 => 5 // 6 => 6 ), *int => tstr }
>>
>> does this mean { 1 : 1, 5 : 5, 6 : "hi" }  is not valid?
>
>No.  The first field eats the 1: 1, the second field only matches the
>5: 5, so the third field gets to eat zero or more int: tstr, of which
>6: “hi” is a match.
>
>> Is the 6 free to match the wildcard when the 5 has satisfied the
>group choice?
>
>Yes.
>
>>
>> Then there are cases where "most specific key" has no meaning,
>
>(Again, we use “first match”.)
>
>> such as when two key types overlap each other and neither is a
>single-value type.  Consider:
>>
>>   top = { * (0..10) => tstr, * (5..15) => int }
>>
>> Does this mean a key of 5 can have either a text string or an int
>value?
>
>As long as there are no cuts here, yes.
>
>> Or, does it require that a key of 5, if present, must have a value
>that is both a text string and an int at the same time (i.e. it
>disallows 5 to appear)?
>
>That would never be the semantics — the fact that there are two
>branches in a choice that can be fulfilled is not an error.
>
>With a cut like this:
>
>top = { * (0..10) ^ => tstr, * (5..15) => int }
>
>this could mean that key 0..10 cut the choice and therefore need to
>have a text string value, while the rest, 11..15 can be integers,
>because the choice is cut after matching 0..10.
>
>So far, we haven’t seen a use case that actually needed the cut, but it
>is still nice to have that error message.
>(We also haven’t implemented it yet, although we will certainly do that
>over time.)
>
>Another example where a cut helps:
>
>message = orderbeer / orderwine
>
>orderbeer = {
>  type: “beer”,
>  ferment: “bottom” / “top”,
>}
>
>orderwine = {
>  type: “wine”,
>  color: “red” / “white”.
>}
>
>If you feed {“type”: “wine”, “ferment”: “top”} into this, you get a
>rather unspecific error message that tells you things don’t match up —
>the matcher can’t really know whether the “type” value of “wine" or the
>“ferment” key is the “cause” of neither branch matching.
>
>If you add a cut:
>
>message = orderbeer / orderwine
>
>orderbeer = {
>  type: “beer” ^,
>  ferment: “bottom” / “top”,
>}
>
>orderwine = {
>  type: “wine” ^,
>  color: “red” / “white”.
>}
>
>the matcher can tell you right away that the key “ferment” is not
>allowed in an orderwine message.
>
>Grüße, Carsten
>
>
>_______________________________________________
>CBOR mailing list
>CBOR@ietf.org <mailto:CBOR@ietf.org> 
>https://www.ietf.org/mailman/listinfo/cbor
>
> 

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.