[T2TRG] IRIs in CoRAL (was: draft-hartke-t2trg-ciri-00 review)

Klaus Hartke <hartke@projectcool.de> Mon, 04 February 2019 11:46 UTC

Return-Path: <hartke@projectcool.de>
X-Original-To: t2trg@ietfa.amsl.com
Delivered-To: t2trg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7113A130E2E for <t2trg@ietfa.amsl.com>; Mon, 4 Feb 2019 03:46:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.329
X-Spam-Level:
X-Spam-Status: No, score=-0.329 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTTP_EXCESSIVE_ESCAPES=1.572, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id HJGjxmfnGWct for <t2trg@ietfa.amsl.com>; Mon, 4 Feb 2019 03:46:44 -0800 (PST)
Received: from wp382.webpack.hosteurope.de (wp382.webpack.hosteurope.de [IPv6:2a01:488:42:1000:50ed:8597::]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E9B7D1294D0 for <T2TRG@irtf.org>; Mon, 4 Feb 2019 03:46:43 -0800 (PST)
Received: from mail-qt1-f170.google.com ([209.85.160.170]); authenticated by wp382.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1gqci1-0005d1-AH; Mon, 04 Feb 2019 12:46:41 +0100
Received: by mail-qt1-f170.google.com with SMTP id b8so5036827qtj.1 for <T2TRG@irtf.org>; Mon, 04 Feb 2019 03:46:41 -0800 (PST)
X-Gm-Message-State: AJcUukfbIAEURl/F+U4PWZJAY12q2tuSy3CiFT159ZXYCQUdba2y8w13 vmZxeAbtXAo5Dsr+9mKMJ7eSMvYyWHgo/ENsS+k=
X-Google-Smtp-Source: ALg8bN4AiLcAA2PLxXI7dANtD6wxM8xhQDFbgdmQ6lEkvot/JaXmRsdyEeis20Od9YVkQ8ibMDdJa55IWlE5xaZhhFA=
X-Received: by 2002:a0c:ae30:: with SMTP id y45mr46235179qvc.145.1549280800239; Mon, 04 Feb 2019 03:46:40 -0800 (PST)
MIME-Version: 1.0
References: <58aa0ae4-b3fe-abf7-9bda-4908ef0b3fd7@ericsson.com> <CY4PR21MB0168C83AF295761F73FCDF7FA39F0@CY4PR21MB0168.namprd21.prod.outlook.com> <A0D234F0-51D8-4543-9344-43999C304D73@tzi.org> <CY4PR21MB016884C73B7F842FFF5A53C1A39F0@CY4PR21MB0168.namprd21.prod.outlook.com> <CAAzbHva=YjK5j=W9aFDikYrLLJQ+pDcRy2HV71e0JbyHu_1BBw@mail.gmail.com> <CY4PR21MB0168CEDC3F1EB41FCD21AD28A39F0@CY4PR21MB0168.namprd21.prod.outlook.com>
In-Reply-To: <CY4PR21MB0168CEDC3F1EB41FCD21AD28A39F0@CY4PR21MB0168.namprd21.prod.outlook.com>
From: Klaus Hartke <hartke@projectcool.de>
Date: Mon, 04 Feb 2019 12:46:07 +0100
X-Gmail-Original-Message-ID: <CAAzbHvbUvoqGrAoR_MOkMb_89U-4dQZQusqA+qCQabQX-N-yeA@mail.gmail.com>
Message-ID: <CAAzbHvbUvoqGrAoR_MOkMb_89U-4dQZQusqA+qCQabQX-N-yeA@mail.gmail.com>
To: Dave Thaler <dthaler@microsoft.com>
Cc: "T2TRG@irtf.org" <T2TRG@irtf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-bounce-key: webpack.hosteurope.de; hartke@projectcool.de; 1549280803; 54b527b4;
X-HE-SMSGID: 1gqci1-0005d1-AH
Archived-At: <https://mailarchive.ietf.org/arch/msg/t2trg/FyLCuSRKpXrttcsDFGHHwYlM7ko>
Subject: [T2TRG] IRIs in CoRAL (was: draft-hartke-t2trg-ciri-00 review)
X-BeenThere: t2trg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IRTF Thing-to-Thing Research Group <t2trg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/t2trg>, <mailto:t2trg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/t2trg/>
List-Post: <mailto:t2trg@irtf.org>
List-Help: <mailto:t2trg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/t2trg>, <mailto:t2trg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Feb 2019 11:46:47 -0000

> And I also agree that draft-hartke-t2trg-coral-06 likely has the same
> issues because it uses IRIs instead of URIs.

Some further thoughts:


* In CoAP, the request URI is transported as a sequence of of CoAP
options that contain the different parts of an URI without
percent-encoding. For example, the URI
<http://example.com/city/Montr%C3%A9al> in a request to (http,
example.com, 80) would be encoded as an Uri-Path option containing
utf8(decode-percent-encodings("city")) = h'63697479' followed by an
Uri-Path option containing
utf8(decode-percent-encodings("Montr%C3%A9al")) =
h'4d6f6e7472c3a9616c'.

CoAP does not require any Unicode normalization be performed, so if a
client happens to make a request with an Uri-Path option with
utf8(nfd("Montréal")) = h'4d6f6e7472_65cc81_616c' where the server
expects an Uri-Path option with utf8(nfc("Montréal")) =
h'4d6f6e7472_c3a9_616c' (or vice versa), then the client will get a
4.04 Not Found error.

CoAP defines a conversion from CoAP options to URIs (and vice versa).
This conversion is purely syntactic, so an Uri-Path option with
h'4d6f6e7472_65cc81_616c' in the request URI would become
<http://example.com/city/Montr%65%CC%81al>.

CoRAL, in the binary format, does exactly the same for link targets
(except that the conversion of CIRI options is currently defined to be
to IRIs, which I'll replace with URIs in the upcoming
draft-hartke-t2trg-ciri-01).


* In Web Linking [RFC8288], the context and the target of a link are
IRIs. However, these are serialized as URIs on the wire in the "Link"
header field.

CoRAL, in the binary format, does exactly the same for link targets
(except that it uses CBOR instead of ASCII characters to delimit the
URI components on the wire).


* In RDF, concepts are named with globally unique Unicode strings. To
make the minting of these strings painless, they are restricted to the
syntax of an IRI. These IRIs are used purely as identity tokens (in
RFC3987 lingo) and are therefore compared character-by-character.

RDF recommends [1] that these IRIs avoid non-normalized forms such as
uppercase characters in scheme names, explicitly stated HTTP default
port, percent-encoding of characters where it is not required by IRI
syntax, and IRIs that are not in NFC. So for example the concept
identified by the IRI <http://example.com/city/Montréal> is not the
same as the concept identified by the IRI
<http://example.com/city/Montréal> if one of them isn't in NFC.

CoRAL, in the binary format, does exactly the same for link relation types.


Now, in M2M communication, I think we avoid all problems with
normalization etc.: Servers authoritatively manage the namespace of
their resources. If a client asks a server "Hey, server, what
resources do you have?" and the server responds with "I have a
resource at [6, h'63697479', 6, h'4d6f6e7472c3a9616c'].", then the
client can simply copy those bytes into its next request without ever
decoding them. As long as the server accepts its own output as input,
everything works.

When a client compares link relation types to locally stored strings,
it can use byte-for-byte comparison (as suggested by RFC3987) as long
as both the server and the client store the link relation types
exactly as they are defined.


The only issue left is when human users input IRIs as link relation
types or as link targets. In CoRAL, this happens in the textual
format. Interestingly, Turtle [2] doesn't seem to perform any kind of
input normalization, so human users are expected to write in perfect
NFC when they identify a concept by the IRI. (Maybe I'm missing
something?) If true, this seems like a bad user experience. However, I
think it would be a bad user experience as well if one has to write
<%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF> instead of <こんにちは>.

I would prefer to not invent anything new here and to just follow the
consensus if possible.


Klaus

[1] https://www.w3.org/TR/rdf11-concepts/#note-iris
[2] https://www.w3.org/TR/turtle/