Re: [art] Artart last call review of draft-ietf-core-links-json-07

Carsten Bormann <cabo@tzi.org> Tue, 25 April 2017 21:26 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1670D129436; Tue, 25 Apr 2017 14:26:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.2
X-Spam-Level:
X-Spam-Status: No, score=-4.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lTWt0hOQbt53; Tue, 25 Apr 2017 14:26:43 -0700 (PDT)
Received: from mailhost.informatik.uni-bremen.de (mailhost.informatik.uni-bremen.de [IPv6:2001:638:708:30c9::12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 446DA126B6E; Tue, 25 Apr 2017 14:26:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at informatik.uni-bremen.de
Received: from submithost.informatik.uni-bremen.de (submithost.informatik.uni-bremen.de [134.102.201.11]) by mailhost.informatik.uni-bremen.de (8.14.5/8.14.5) with ESMTP id v3PLPnO8005255; Tue, 25 Apr 2017 23:25:49 +0200 (CEST)
Received: from client-0139.vpn.uni-bremen.de (client-0139.vpn.uni-bremen.de [134.102.107.139]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by submithost.informatik.uni-bremen.de (Postfix) with ESMTPSA id 3wCGTn2MXhzDJ7X; Tue, 25 Apr 2017 23:25:49 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: [art] Artart last call review of draft-ietf-core-links-json-07
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <A43ECEE0-47C8-485C-A9AC-E7890B0A6AA4@gbiv.com>
Date: Tue, 25 Apr 2017 23:25:48 +0200
Cc: IETF <ietf@ietf.org>, Julian Reschke <julian.reschke@gmx.de>, art@ietf.org, Herbert Van de Sompel <hvdsomp@gmail.com>, "core@ietf.org WG" <core@ietf.org>, Erik Wilde <erik.wilde@dret.net>, draft-ietf-core-links-json.all@ietf.org
X-Mao-Original-Outgoing-Id: 514848348.452778-61494f675340a082fab68f2a595dd329
Content-Transfer-Encoding: quoted-printable
Message-Id: <26C26E7B-24E1-4982-B3D8-9991AA1CC6DF@tzi.org>
References: <149188258769.15738.17473942496982365590@ietfa.amsl.com> <A12A8CB3-F756-4790-806A-A67AA8CE1D78@tzi.org> <CAOywMHdqitw-uN09p11j2xkBK6TO8y3wjAWipK7vhqbTWp0T1w@mail.gmail.com> <a2350664-05a7-8909-4cf4-5b765e09f9e7@dret.net> <027F2C41-E498-4801-86E2-047771E10545@tzi.org> <4cd01462-2a0f-803e-df10-e68b3eed0226@dret.net> <B04F33DD-51C1-4545-AD59-2F1A3AF14FF6@tzi.org> <feee7d84-263a-49e4-d95e-09ab8526b703@dret.net> <CAOywMHfJpYB6u7BFVf10Gf=Nxk0E1h5iEvyVX5VeAW0UKQOSzQ@mail.gmail.com> <5EB045F7-09FA-4EE8-844A-5AC0E3BF5C1E@tzi.org> <f1b9f42f-559d-d146-e355-c3e2ba31cb01@gmx.de> <23DDC7F2-D46F-4C19-AEA8-C71187099414@tzi.org> <A43ECEE0-47C8-485C-A9AC-E7890B0A6AA4@gbiv.com>
To: "Roy T. Fielding" <fielding@gbiv.com>
X-Mailer: Apple Mail (2.3273)
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/UogIo8QgKzJ2dkcQeAxjV6HyajU>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Apr 2017 21:26:51 -0000

>> RFC 6690 says:
>> 
>>  In
>>  order to convert an HTTP Link Header field to this link format, first
>>  the "Link:" HTTP header is removed, any linear whitespace (LWS) is
>>  removed, the header value is converted to UTF-8, and any percent-
>>  encodings are decoded.
> 
> Well, that's broken.

OK, let me start typing that errata report then.

>> coap://example.com?stupid%3Dkey=4711
>> 
>> is not distinguishable from
>> 
>> coap://example.com?stupid=key=4711
>> 
>> (The typical reaction of an implementer is “then don’t do that!” [1,2].)
> 
> That isn't a "limitation”.  

For RFC6690 users, it pretty much is, because certain URIs don’t work.
They tend to design their URIs in such a way that they do, probably more so because these designs are natural for them than because they are fully aware of that limitation.

> It's a bug to decode pct-encoded octets in
> a URI before decomposing the reference into its parts.  

Well, percent-encoding is playing two roles in RFC 3986: hiding characters within syntactic elements from their delimiter roles, and encoding non-ASCII (and C0 etc.) characters.
The passage I cited from RFC 6690 got nicely rid of the latter, and broke the former(*).

> ASCII is already
> in UTF-8.  Decoding a pct-encoding doesn't make it "more UTF-8"; it just
> means the string is no longer a URI reference.  That's broken.  So utterly
> broken that it obviously wasn't reviewed by the right people.

So what should I write into the errata report?

Or more generally speaking, how should we fix RFC 6690, without creating a need for constrained nodes to do full URI processing?

Maybe it is sufficient to document the limitation in the errata, for now?

And, more to the point of the subject line, how should we handle this on the JSON/CBOR level?

There definitely will be a round-tripping problem with RFC 6690 if the URIs collide with the above limitation of RFC 6690.  But that’s OK because that defines the subset.

To be more general, not doing any percent-decoding of URIs when creating JSON/CBOR from scratch is probably the easy way, but it means that when we want to phase out RFC 6690 on the constrained level by replacing it with JSON/CBOR, there is additional complexity.  Horribile dictu, but maybe IRIs are the right thing to do here.

Grüße, Carsten

(*) It may be worth pointing out that the amount of breakage here is much larger than for CoAP itself, which does the percent-decoding only after decomposing a URI into what CoAP considers to be its components, so the URI parsing works properly — coap://example.com/foo%2fbar has one path segment, “foo/bar”.
But the application semantics of hiding application delimiters, which my example above is breaking, is not supported in CoAP either.
Some people think that URIs should be carried around in that decomposed form throughout the constrained space, and I can’t blame them.
I don’t have data how many URI libraries in active use in the non-constrained space get this particular detail right, either.