Re: [Cbor] Validation of maps

Jeffrey Yasskin <jyasskin@chromium.org> Thu, 20 July 2017 01:16 UTC

Return-Path: <jyasskin@google.com>
X-Original-To: cbor@ietfa.amsl.com
Delivered-To: cbor@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D00F91272E1 for <cbor@ietfa.amsl.com>; Wed, 19 Jul 2017 18:16:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.7
X-Spam-Level:
X-Spam-Status: No, score=-2.7 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com header.b=ED5Vcei6; dkim=pass (1024-bit key) header.d=chromium.org header.b=X26lsFVW
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9mCUVWfIomRq for <cbor@ietfa.amsl.com>; Wed, 19 Jul 2017 18:16:15 -0700 (PDT)
Received: from mail-wm0-x233.google.com (mail-wm0-x233.google.com [IPv6:2a00:1450:400c:c09::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BCA42127868 for <cbor@ietf.org>; Wed, 19 Jul 2017 18:16:14 -0700 (PDT)
Received: by mail-wm0-x233.google.com with SMTP id w191so13371141wmw.1 for <cbor@ietf.org>; Wed, 19 Jul 2017 18:16:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=eWhM+hkgW2iplTD0HbhXJ1M/sGa8Vbal2btpHgiAX4U=; b=ED5Vcei6rQ/xWAKMy/mqJUs1nlPjxDCN4g5BiN6e53bYPOz/GQ7QEdAZpPJyJUn8Hx 5lGjyr5G/r+EkY9EnZ6EsQBb0IMVjVQ33Jh9VhiNgc9XvU7SCyXBeqLFYHyvmiXS1zGe sIrZgTNR2PUEZzkpcVtbMznHbJ4+448oPGQfL6enDPc3kY5v+SRzgyeV2RQNK2Nx0oH3 BC6V9Ir3LxS3ESGlVvzZGmuqOsVKZo6G35O7BmloRQ3CEIf6mIriO9bcw+LkP9ifoqTs zPl0iur+I7FOEHQYAT/qF6UmDDeZKCpmHOGX2DM+RUJwiF4X22LxYUC2f2P4zz5yf4uG cpoQ==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=eWhM+hkgW2iplTD0HbhXJ1M/sGa8Vbal2btpHgiAX4U=; b=X26lsFVW1nY4rn2atX5nKoOeMH97NHcFot3Euoam/PSEYz595cuOWMPmiBd8pwut9r UjzphN0uhfpsshBTCeTbKXI/lepETW6tSHL4cyUJw8XOTHDgZj3d+Pe+sjTxgHmUeeza yvfGod2EkvNLlejzcugFUU33MFfnhVv31xb8k=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=eWhM+hkgW2iplTD0HbhXJ1M/sGa8Vbal2btpHgiAX4U=; b=XE5KN7YJSsnElFXPTKrXrMtOgft5z+xtJjYecPBqN5ONPzumv7DUsG23EUDMO3Pj3t JOenY6TkKiHZKLuhT51D58fkGhfXj3Dq68u1C2vD4woJh2wNi7dTlEsrcPUj9bZkRDA8 PHAomffY7epRkveCg0t0PjnSnoKhhpBl4nuuMI3sNDFlq68PXL9VMn0zA7T2XQvzca9L ixxHG3H8/kdm7+lfbeWbpRD7zs482lkOD/73/hVllMqogwuIHforCO6/BYxAw/xmBTgg 4+7pbhn7+J/jeirVIzWrW2jkkp+88OY6h+Sm8TaQ3zCC5YYXONVGR4cKrQ/DF+SD3Olg 7xAw==
X-Gm-Message-State: AIVw111Z18+GZG4nx0FXCLD9FRqYhbA7OAKNCIC9LkDTp/8Bu1Xh4m07 9uEdy1dW+5SGfQwg+9EoiHCwaoTb/QTlvnlgDA==
X-Received: by 10.28.144.211 with SMTP id s202mr480121wmd.111.1500513373097; Wed, 19 Jul 2017 18:16:13 -0700 (PDT)
MIME-Version: 1.0
Sender: jyasskin@google.com
Received: by 10.28.216.144 with HTTP; Wed, 19 Jul 2017 18:15:52 -0700 (PDT)
In-Reply-To: <3FFCD42B-C1DE-43E3-A06D-608CACD55D86@tzi.org>
References: <e16da575-bbed-1f52-c754-9938237aa6bc@obj-sys.com> <3FFCD42B-C1DE-43E3-A06D-608CACD55D86@tzi.org>
From: Jeffrey Yasskin <jyasskin@chromium.org>
Date: Thu, 20 Jul 2017 03:15:52 +0200
X-Google-Sender-Auth: S86k_QaQvGuGGaEfOneS9ZWsR-Q
Message-ID: <CANh-dXnucjNP=eZfrEcrVC6HN0XHk0dcw-C+J56rksWxMbX8=A@mail.gmail.com>
To: Carsten Bormann <cabo@tzi.org>
Cc: Kevin Braun <kbraun@obj-sys.com>, cbor@ietf.org
Content-Type: multipart/alternative; boundary="001a11469d86961b710554b57ea5"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cbor/60Jc-t0SwMA2x0MsnY1zx4fZaGI>
Subject: Re: [Cbor] Validation of maps
X-BeenThere: cbor@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Concise Binary Object Representation \(CBOR\)" <cbor.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/cbor>, <mailto:cbor-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cbor/>
List-Post: <mailto:cbor@ietf.org>
List-Help: <mailto:cbor-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/cbor>, <mailto:cbor-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Jul 2017 01:16:18 -0000

By the time CDDL makes it to an RFC, we should be answering questions like
this by quoting normative text from
https://tools.ietf.org/html/draft-greevenbosch-appsawg-cbor-cddl-11#section-3.5,
not just pointing at examples.

Jeffrey

On Tue, Jul 18, 2017 at 11:18 PM, Carsten Bormann <cabo@tzi.org> wrote:

> Hi Kevin,
>
> > I know the question of more formally specifying validation rules already
> came up.  One would think map validation would be fairly obvious, but what
> happens when key types overlap?
> >
> > For example, I think the intention is that if you have
> >
> >   top = { 4 => int, *int => tstr }
> >
> > then the key 4 must be present with an integer value,
>
> Right, that is the only way to match the first field.
> (And there is no way to have that as well as another /4/ key with a text
> string value.)
>
> > and you can have any number of other integer keys with text string
> values. Okay, but what about:
> >
> > top = { ? 4 => int, *int => tstr }
> >
> > We might say this means that if a key of 4 appears, then it must have an
> int value.  Or, does it allow a key of 4 to appear with a text string value
> while considering the optional "4 => int" as being absent?
>
> Yes, that is the semantics.  It is not always what a specifier might
> intend.
>
> The reason is that the map opens a choice point.  A member with key 4 is
> starting to match the field.  If the value however does not  match (because
> there is no int), the matcher falls back to the choice point.  It then
> tries the other field, and indeed, that matches.
>
> In the research underlying CDDL, we have discussed “cuts” (a concept from
> error handling in Parse Expression Grammars (PEGs)) as the solution to
> this.  If ^ represents a cut, write:
>
> top = { ? 4 ^ => int, *int => tstr }
>
> Once the 4 matches, there is no way back; for this member, another match
> is no longer tried.
> A nice side effect is that anything except an int after a key of 4 can
> give a definite error message of “int expected”.
> The cut proposal includes : as an abbreviation for ^=>, so you can simply
> write:
>
> top = { ? 4: int, *int => tstr }
>
> > Given the examples in the spec, I guess the intention is for such a
> thing to mean the key 4, if present, has to have an int value.
>
> Which example leads you to this conclusion?
>
> >  So, there is some kind of "match the most specific key" rule implied (I
> guess).
>
> Actually, the PEG semantics we have borrowed here is that the *first*
> match is used.  But only rules are matched that indeed match!
>
> > How that rule applies in more complex situations (where there is some
> kind of nesting) probably needs to be spelled out....  Given:
> >
> >   top = { 1 => 1, ? ( 5 => 5, 6 => 6 ), *int => tstr }
> >
> > Must keys 5 & 6 be present together,
>
> Yes.
>
> The whole group in the parentheses is optional.
>
> > or does the wildcard allow only one of them to appear?
>
> (That was an early semantics we tried, and it leads down the drain.
> It is much better to have a matcher that simply and stupidly follows
> what’s in the grammar.)
>
> > Or, given:
> >
> >   top = { 1 => 1, ( 5 => 5 // 6 => 6 ), *int => tstr }
> >
> > does this mean { 1 : 1, 5 : 5, 6 : "hi" }  is not valid?
>
> No.  The first field eats the 1: 1, the second field only matches the 5:
> 5, so the third field gets to eat zero or more int: tstr, of which 6: “hi”
> is a match.
>
> > Is the 6 free to match the wildcard when the 5 has satisfied the group
> choice?
>
> Yes.
>
> >
> > Then there are cases where "most specific key" has no meaning,
>
> (Again, we use “first match”.)
>
> > such as when two key types overlap each other and neither is a
> single-value type.  Consider:
> >
> >   top = { * (0..10) => tstr, * (5..15) => int }
> >
> > Does this mean a key of 5 can have either a text string or an int value?
>
> As long as there are no cuts here, yes.
>
> > Or, does it require that a key of 5, if present, must have a value that
> is both a text string and an int at the same time (i.e. it disallows 5 to
> appear)?
>
> That would never be the semantics — the fact that there are two branches
> in a choice that can be fulfilled is not an error.
>
> With a cut like this:
>
> top = { * (0..10) ^ => tstr, * (5..15) => int }
>
> this could mean that key 0..10 cut the choice and therefore need to have a
> text string value, while the rest, 11..15 can be integers, because the
> choice is cut after matching 0..10.
>
> So far, we haven’t seen a use case that actually needed the cut, but it is
> still nice to have that error message.
> (We also haven’t implemented it yet, although we will certainly do that
> over time.)
>
> Another example where a cut helps:
>
> message = orderbeer / orderwine
>
> orderbeer = {
>   type: “beer”,
>   ferment: “bottom” / “top”,
> }
>
> orderwine = {
>   type: “wine”,
>   color: “red” / “white”.
> }
>
> If you feed {“type”: “wine”, “ferment”: “top”} into this, you get a rather
> unspecific error message that tells you things don’t match up — the matcher
> can’t really know whether the “type” value of “wine" or the “ferment” key
> is the “cause” of neither branch matching.
>
> If you add a cut:
>
> message = orderbeer / orderwine
>
> orderbeer = {
>   type: “beer” ^,
>   ferment: “bottom” / “top”,
> }
>
> orderwine = {
>   type: “wine” ^,
>   color: “red” / “white”.
> }
>
> the matcher can tell you right away that the key “ferment” is not allowed
> in an orderwine message.
>
> Grüße, Carsten
>
> _______________________________________________
> CBOR mailing list
> CBOR@ietf.org
> https://www.ietf.org/mailman/listinfo/cbor
>