Re: [Cbor] Validation of maps

Jim Schaad <ietf@augustcellars.com> Thu, 05 October 2017 16:15 UTC

Return-Path: <ietf@augustcellars.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A968E13309C for <cbor@ietfa.amsl.com>; Thu, 5 Oct 2017 09:15:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=augustcellars.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id KT-SpxgJitbI for <cbor@ietfa.amsl.com>; Thu, 5 Oct 2017 09:15:33 -0700 (PDT)
Received: from mail4.augustcellars.com (augustcellars.com [50.45.239.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2245313307B for <cbor@ietf.org>; Thu, 5 Oct 2017 09:15:33 -0700 (PDT)
Content-Type: multipart/alternative; boundary="----=_NextPart_000_026C_01D33DBA.7B363D80"
Content-Language: en-us
DKIM-Signature: v=1; a=rsa-sha256; d=augustcellars.com; s=winery; c=simple/simple; t=1507220127; h=from:subject:to:date:message-id; bh=c2hArlCzF0MGpJTiJa7jklJ4ltYdEXQB/x2pUD1N/nE=; b=d/9TD9XHioysLXpe0OTMBu4gLqR2ZjXiNrdKoujrw86EuPXgNx5zSYe00wNngIQVaLodPq5J+wS js02iVlVZt8vhCH1vrt86t5tnkYE/IxK12ZVkpwBIgrdmR2RpeKV0dAlEMoTqXv53R0SJrivHSXdV sweiCyywUIgoY8WCNuXQyJD9Um6Tn3Oo1Jk82c+lsi9Eo1Y9ukJ+/3zdnnZjJhIkXOWspg1JEu3SL ABOZMWxGQmMxuKwyJM/R6NNsBvROTEpX/Wqw7hv9qwaSL4Gym6lE/vre6JiDrqkgnxvStE2VtzPsl M8a07fOHQTOCxKbe2adhp9FJXXYIO2TWPXKQ==
Received: from mail2.augustcellars.com (192.168.1.201) by mail4.augustcellars.com (192.168.1.153) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Thu, 5 Oct 2017 09:15:26 -0700
Received: from Hebrews (192.168.1.162) by mail2.augustcellars.com (192.168.1.201) with Microsoft SMTP Server (TLS) id 15.0.1263.5; Thu, 5 Oct 2017 09:14:37 -0700
From: Jim Schaad <ietf@augustcellars.com>
To: 'Francesca Palombini' <francesca.palombini@ericsson.com>, cbor@ietf.org
References: <e16da575-bbed-1f52-c754-9938237aa6bc@obj-sys.com> <3FFCD42B-C1DE-43E3-A06D-608CACD55D86@tzi.org> <CANh-dXnucjNP=eZfrEcrVC6HN0XHk0dcw-C+J56rksWxMbX8=A@mail.gmail.com> <HE1PR0701MB253924A5FBD83848583C053898700@HE1PR0701MB2539.eurprd07.prod.outlook.com> <HE1PR0701MB2539BFF4100C57A6C70CC94E98700@HE1PR0701MB2539.eurprd07.prod.outlook.com>
In-Reply-To: <HE1PR0701MB2539BFF4100C57A6C70CC94E98700@HE1PR0701MB2539.eurprd07.prod.outlook.com>
Date: Thu, 05 Oct 2017 09:15:23 -0700
Message-ID: <026b01d33df5$27919310$76b4b930$@augustcellars.com>
MIME-Version: 1.0
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQGj3fF8wyHPCFZvH3Ek15yu4fStEgIraHBcAXBelH0CK9krsAHFg4ltovfECJA=
X-Originating-IP: [192.168.1.162]
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/lZ5r_Ty6PBzbKjy6FJ52q2i72YQ>
Subject: Re: [Cbor] Validation of maps
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Oct 2017 16:15:35 -0000

My opinion on this issue is that the open choice only matches if there is not an explicit choice with that value in the map.  It is the intention that this is what is there and if you have a fast parser which check types, then it should fail.  The code is going to presume that the value with tag 4 is an int and should not get surprised in the middle of processing, it should fail during validation if the parser is that smart.

 

Thus  { ?4=>int, * =>any}  should fail for { 4:”text”} 

 

Jim

 

 

From: CBOR [mailto:cbor-bounces@ietf.org] On Behalf Of Francesca Palombini
Sent: Thursday, October 5, 2017 3:53 AM
To: Carsten Bormann <cabo@tzi.org>; cbor@ietf.org
Subject: Re: [Cbor] Validation of maps

 

Sorry, early sent. The opinion I’d like to hear from the working group: is cuts something we want to consider putting in at this point?


Francesca

 

From: Francesca Palombini 
Sent: den 5 oktober 2017 12:51
To: 'Jeffrey Yasskin' <jyasskin@chromium.org <mailto:jyasskin@chromium.org> >; Carsten Bormann <cabo@tzi.org <mailto:cabo@tzi.org> >
Cc: Kevin Braun <kbraun@obj-sys.com <mailto:kbraun@obj-sys.com> >; cbor@ietf.org <mailto:cbor@ietf.org> 
Subject: RE: [Cbor] Validation of maps

 

Agreed with Jeffrey.

 

Reviving this thread to ask the opinion from the working group: 

 

From: CBOR [mailto:cbor-bounces@ietf.org] On Behalf Of Jeffrey Yasskin
Sent: den 20 juli 2017 03:16
To: Carsten Bormann <cabo@tzi.org <mailto:cabo@tzi.org> >
Cc: Kevin Braun <kbraun@obj-sys.com <mailto:kbraun@obj-sys.com> >; cbor@ietf.org <mailto:cbor@ietf.org> 
Subject: Re: [Cbor] Validation of maps

 

By the time CDDL makes it to an RFC, we should be answering questions like this by quoting normative text from https://tools.ietf.org/html/draft-greevenbosch-appsawg-cbor-cddl-11#section-3.5, not just pointing at examples.

 

Jeffrey

 

On Tue, Jul 18, 2017 at 11:18 PM, Carsten Bormann <cabo@tzi.org <mailto:cabo@tzi.org> > wrote:

Hi Kevin,

> I know the question of more formally specifying validation rules already came up.  One would think map validation would be fairly obvious, but what happens when key types overlap?
>
> For example, I think the intention is that if you have
>
>   top = { 4 => int, *int => tstr }
>
> then the key 4 must be present with an integer value,

Right, that is the only way to match the first field.
(And there is no way to have that as well as another /4/ key with a text string value.)

> and you can have any number of other integer keys with text string values. Okay, but what about:
>
> top = { ? 4 => int, *int => tstr }
>
> We might say this means that if a key of 4 appears, then it must have an int value.  Or, does it allow a key of 4 to appear with a text string value while considering the optional "4 => int" as being absent?

Yes, that is the semantics.  It is not always what a specifier might intend.

The reason is that the map opens a choice point.  A member with key 4 is starting to match the field.  If the value however does not  match (because there is no int), the matcher falls back to the choice point.  It then tries the other field, and indeed, that matches.

In the research underlying CDDL, we have discussed “cuts” (a concept from error handling in Parse Expression Grammars (PEGs)) as the solution to this.  If ^ represents a cut, write:

top = { ? 4 ^ => int, *int => tstr }

Once the 4 matches, there is no way back; for this member, another match is no longer tried.
A nice side effect is that anything except an int after a key of 4 can give a definite error message of “int expected”.
The cut proposal includes : as an abbreviation for ^=>, so you can simply write:

top = { ? 4: int, *int => tstr }

> Given the examples in the spec, I guess the intention is for such a thing to mean the key 4, if present, has to have an int value.

Which example leads you to this conclusion?

>  So, there is some kind of "match the most specific key" rule implied (I guess).

Actually, the PEG semantics we have borrowed here is that the *first* match is used.  But only rules are matched that indeed match!

> How that rule applies in more complex situations (where there is some kind of nesting) probably needs to be spelled out....  Given:
>
>   top = { 1 => 1, ? ( 5 => 5, 6 => 6 ), *int => tstr }
>
> Must keys 5 & 6 be present together,

Yes.

The whole group in the parentheses is optional.

> or does the wildcard allow only one of them to appear?

(That was an early semantics we tried, and it leads down the drain.
It is much better to have a matcher that simply and stupidly follows what’s in the grammar.)

> Or, given:
>
>   top = { 1 => 1, ( 5 => 5 // 6 => 6 ), *int => tstr }
>
> does this mean { 1 : 1, 5 : 5, 6 : "hi" }  is not valid?

No.  The first field eats the 1: 1, the second field only matches the 5: 5, so the third field gets to eat zero or more int: tstr, of which 6: “hi” is a match.

> Is the 6 free to match the wildcard when the 5 has satisfied the group choice?

Yes.

>
> Then there are cases where "most specific key" has no meaning,

(Again, we use “first match”.)

> such as when two key types overlap each other and neither is a single-value type.  Consider:
>
>   top = { * (0..10) => tstr, * (5..15) => int }
>
> Does this mean a key of 5 can have either a text string or an int value?

As long as there are no cuts here, yes.

> Or, does it require that a key of 5, if present, must have a value that is both a text string and an int at the same time (i.e. it disallows 5 to appear)?

That would never be the semantics — the fact that there are two branches in a choice that can be fulfilled is not an error.

With a cut like this:

top = { * (0..10) ^ => tstr, * (5..15) => int }

this could mean that key 0..10 cut the choice and therefore need to have a text string value, while the rest, 11..15 can be integers, because the choice is cut after matching 0..10.

So far, we haven’t seen a use case that actually needed the cut, but it is still nice to have that error message.
(We also haven’t implemented it yet, although we will certainly do that over time.)

Another example where a cut helps:

message = orderbeer / orderwine

orderbeer = {
  type: “beer”,
  ferment: “bottom” / “top”,
}

orderwine = {
  type: “wine”,
  color: “red” / “white”.
}

If you feed {“type”: “wine”, “ferment”: “top”} into this, you get a rather unspecific error message that tells you things don’t match up — the matcher can’t really know whether the “type” value of “wine" or the “ferment” key is the “cause” of neither branch matching.

If you add a cut:

message = orderbeer / orderwine

orderbeer = {
  type: “beer” ^,
  ferment: “bottom” / “top”,
}

orderwine = {
  type: “wine” ^,
  color: “red” / “white”.
}

the matcher can tell you right away that the key “ferment” is not allowed in an orderwine message.

Grüße, Carsten


_______________________________________________
CBOR mailing list
CBOR@ietf.org <mailto:CBOR@ietf.org> 
https://www.ietf.org/mailman/listinfo/cbor