Re: [T2TRG] IRIs in CoRAL (was: draft-hartke-t2trg-ciri-00 review)

Dave Thaler <dthaler@microsoft.com> Mon, 04 February 2019 19:17 UTC

Return-Path: <dthaler@microsoft.com>
X-Original-To: t2trg@ietfa.amsl.com
Delivered-To: t2trg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D92F4130EE6 for <t2trg@ietfa.amsl.com>; Mon, 4 Feb 2019 11:17:44 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.982
X-Spam-Level:
X-Spam-Status: No, score=-2.982 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-4.553, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, HTTPS_HTTP_MISMATCH=1.989, HTTP_EXCESSIVE_ESCAPES=1.572, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=microsoft.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CJWactZnNmBb for <t2trg@ietfa.amsl.com>; Mon, 4 Feb 2019 11:17:41 -0800 (PST)
Received: from NAM05-DM3-obe.outbound.protection.outlook.com (mail-eopbgr730094.outbound.protection.outlook.com [40.107.73.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B87421200D7 for <T2TRG@irtf.org>; Mon, 4 Feb 2019 11:17:40 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=1E+5ZkcyXnb+DlZKIPjo0xSK9mI3d9kbMliN9GsLzV4=; b=oKVyaW7aW8VJRUcT2YbwQyNAdMHqUbznUAIthz/e7iQXaL9JkIIp0ZpEQexwYK5XxS94tNKwK/g6oCfsmt8Kzd8yp085WFf7yMWYjforn4J3W2LX0M2SCzsW4pDj+Z38lOMVvPvAodtPYafGpPJoS8pjFv4plOwdcanONqIvfUs=
Received: from CY4PR21MB0168.namprd21.prod.outlook.com (10.173.192.150) by CY4PR21MB0854.namprd21.prod.outlook.com (10.173.192.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1622.5; Mon, 4 Feb 2019 19:17:38 +0000
Received: from CY4PR21MB0168.namprd21.prod.outlook.com ([fe80::522:452b:589b:44fb]) by CY4PR21MB0168.namprd21.prod.outlook.com ([fe80::522:452b:589b:44fb%5]) with mapi id 15.20.1622.000; Mon, 4 Feb 2019 19:17:38 +0000
From: Dave Thaler <dthaler@microsoft.com>
To: Klaus Hartke <hartke@projectcool.de>
CC: "T2TRG@irtf.org" <T2TRG@irtf.org>
Thread-Topic: IRIs in CoRAL (was: draft-hartke-t2trg-ciri-00 review)
Thread-Index: AQHUvH9NBqTjxsiMcEe9Np0Dbi/eOKXQALYA
Date: Mon, 04 Feb 2019 19:17:38 +0000
Message-ID: <CY4PR21MB0168AA24EB83347612D19077A36D0@CY4PR21MB0168.namprd21.prod.outlook.com>
References: <58aa0ae4-b3fe-abf7-9bda-4908ef0b3fd7@ericsson.com> <CY4PR21MB0168C83AF295761F73FCDF7FA39F0@CY4PR21MB0168.namprd21.prod.outlook.com> <A0D234F0-51D8-4543-9344-43999C304D73@tzi.org> <CY4PR21MB016884C73B7F842FFF5A53C1A39F0@CY4PR21MB0168.namprd21.prod.outlook.com> <CAAzbHva=YjK5j=W9aFDikYrLLJQ+pDcRy2HV71e0JbyHu_1BBw@mail.gmail.com> <CY4PR21MB0168CEDC3F1EB41FCD21AD28A39F0@CY4PR21MB0168.namprd21.prod.outlook.com> <CAAzbHvbUvoqGrAoR_MOkMb_89U-4dQZQusqA+qCQabQX-N-yeA@mail.gmail.com>
In-Reply-To: <CAAzbHvbUvoqGrAoR_MOkMb_89U-4dQZQusqA+qCQabQX-N-yeA@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
msip_labels: MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Enabled=True; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SiteId=72f988bf-86f1-41af-91ab-2d7cd011db47; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Owner=dthaler@ntdev.microsoft.com; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_SetDate=2019-02-04T19:17:39.8373495Z; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Name=General; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Application=Microsoft Azure Information Protection; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_ActionId=14134f84-0e83-490f-ba45-c58851da04fe; MSIP_Label_f42aa342-8706-4288-bd11-ebb85995028c_Extended_MSFT_Method=Automatic
x-originating-ip: [73.59.106.235]
x-ms-publictraffictype: Email
x-microsoft-exchange-diagnostics: 1; CY4PR21MB0854; 6:UVpf9h+nC8MhT6uQmQUxs2YBjPWJPx0oLoEIrTynTAAJjZuju9bLDqMHM8JF3vaDFY772+quinjIJL5Rfb3gB324MND7m9CbRYj8b9mj8q7LuRPHWgAulycppEqOAO+sJtKQiIb4k9d2xfQO7N7leuq/nETe+TBxNwKOCRdRjZgW5I0i7XCUJy9OlYaWeJzvgMDkxkZnegDUBEbXEgXc6n+L93EV6eyf3gUiEfimCUWfSgkE8+XQ5SHyLe2s0VZQM7nWF+/9x5An5gJ7S9LXZE9Z1YRxy3Y6yP2e9B8E8muuAbE7DpdGLrytvNUnXU76OdezBEXriENZfVY0FbUWVtWzppfTbFXGAZ83TPiSROP+qrDSOOZDYucxBLhsbRFYRi8iyc39Jv70AC2P3dIBtLZwY3jXyMvs4h9q9+8wdLDOrEj1isCag5jYFDB2/6OkCMpGv1ij74VyqVnzmGwcPw==; 5:6juwxYR5ZiBauWuKeQ02+eYF+utAuqg5+7Vo3uulLTpNbjWOzaBYbFUL7Rvd/iR862N5SXmrCCs6u0Wgrr0ArqQzcbwOdzClCXzPxrQKReSxPlW2FkWjOKtRuWKZXxeeZi9BuEJu+7EGrgc1sdSLok+nhzcn3xh2WHxe4Ja7b6fZ+jAzS57k+D+olX/SSaYrmQvX2ZVKYjAeDRVYVlBnIw==; 7:hnW8DKtwwde87otEw+wEKb4BCd/2T5kPT2G2OUY4mffHIZKijrVcmEaJBH4vOIs3qQ3ymT7X4RnLkEs58lN3zrzCwG0dVJ40G2MPaYDdi3jzzCTMLkKxp9vyOGdivk51OBIhIlX5+nNR3WHoFPSH9Q==
x-ms-office365-filtering-correlation-id: 570dd5e4-a83c-45de-8bbd-08d68ad56d37
x-ms-office365-filtering-ht: Tenant
x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600110)(711020)(4605077)(4618075)(2017052603328)(7193020); SRVR:CY4PR21MB0854;
x-ms-traffictypediagnostic: CY4PR21MB0854:
x-ms-exchange-purlcount: 4
x-microsoft-antispam-prvs: <CY4PR21MB0854A242217E675E0B6923EAA36D0@CY4PR21MB0854.namprd21.prod.outlook.com>
x-forefront-prvs: 0938781D02
x-forefront-antispam-report: SFV:NSPM; SFS:(10019020)(136003)(396003)(376002)(346002)(39860400002)(366004)(13464003)(199004)(189003)(478600001)(2906002)(256004)(54896002)(14444005)(106356001)(53546011)(105586002)(6506007)(102836004)(71200400001)(186003)(71190400001)(7696005)(86612001)(66574012)(6116002)(6436002)(6346003)(76176011)(9686003)(26005)(53936002)(4326008)(25786009)(790700001)(55016002)(86362001)(3846002)(229853002)(66066001)(14454004)(10090500001)(93886005)(10290500003)(236005)(97736004)(6306002)(99286004)(6916009)(6246003)(7736002)(33656002)(74316002)(68736007)(81156014)(81166006)(476003)(446003)(11346002)(486006)(8936002)(8990500004)(8676002)(316002)(606006)(22452003)(966005); DIR:OUT; SFP:1102; SCL:1; SRVR:CY4PR21MB0854; H:CY4PR21MB0168.namprd21.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; A:1; MX:1;
received-spf: None (protection.outlook.com: microsoft.com does not designate permitted sender hosts)
authentication-results: spf=none (sender IP is ) smtp.mailfrom=dthaler@microsoft.com;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam-message-info: qi21rzsEdmgj++v6W/URCPa58bdyNY2KhgoPAYgTAosd8VusbKewv7F/bzahI7nwmTHlmwHBYS2Wmco+uEiuwg6+/OOTZqY4/m2LKFma7ilEUQzaz5TBtOXxtMXWO7wdg8Gk9L2Pqx7y/j4Y5yYo+jUdL27A/6WUqerwvTDbhTXCJH1rThxgCffWuQERWYwWpvZq8UhJ3swihqxVhN7LWcOSxaSjbnFWcfHgvlrtFPqAcWjqzfTL/PnMLcockB/h4lNuAQ5dQukm7eWyDvEGC7XfLCYil0jmMaEvy+qbZudraoM3ISwAczGXcz8wxKDXm2WSbVqEaSwR0beWLGwqo5BrA4OTACB7kIdoLbmYnEAo+f5clqGYcy3N28bzswMM+Y5y2kJy/xCfTLigXAJfI9Glfon45xLfBgPA4HFz4fA=
Content-Type: multipart/alternative; boundary="_000_CY4PR21MB0168AA24EB83347612D19077A36D0CY4PR21MB0168namp_"
MIME-Version: 1.0
X-OriginatorOrg: microsoft.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 570dd5e4-a83c-45de-8bbd-08d68ad56d37
X-MS-Exchange-CrossTenant-originalarrivaltime: 04 Feb 2019 19:17:38.3717 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 72f988bf-86f1-41af-91ab-2d7cd011db47
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR21MB0854
Archived-At: <https://mailarchive.ietf.org/arch/msg/t2trg/sFWFOFsUa1s1CIiB1M1jN7zhMOU>
Subject: Re: [T2TRG] IRIs in CoRAL (was: draft-hartke-t2trg-ciri-00 review)
X-BeenThere: t2trg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IRTF Thing-to-Thing Research Group <t2trg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/t2trg>, <mailto:t2trg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/t2trg/>
List-Post: <mailto:t2trg@irtf.org>
List-Help: <mailto:t2trg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/t2trg>, <mailto:t2trg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Feb 2019 19:17:45 -0000

Inline below…



-----Original Message-----
From: Klaus Hartke <hartke@projectcool.de>
Sent: Monday, February 4, 2019 3:46 AM
To: Dave Thaler <dthaler@microsoft.com>
Cc: T2TRG@irtf.org
Subject: IRIs in CoRAL (was: draft-hartke-t2trg-ciri-00 review)



> And I also agree that draft-hartke-t2trg-coral-06 likely has the same

> issues because it uses IRIs instead of URIs.



Some further thoughts:



* In CoAP, the request URI is transported as a sequence of of CoAP options that contain the different parts of an URI without percent-encoding.



In contrast IRIs do have some percent encoding, in three cases:

1)     To escape reserved characters

2)     When the byte sequence is not UTF-8 (e.g., binary bytes)

3)     To escape characters not appropriate in an IRI

See section 3.2 of RFC 3987 for details.



For example, the URI <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fexample.com%2Fcity%2FMontr%25C3%25A9al&amp;data=02%7C01%7Cdthaler%40microsoft.com%7Cf33e3c24c3e34af6d91f08d68a966e89%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636848776049783845&amp;sdata=4nYM1%2Bj2Zz52XgEGgnuBasmJdgkf5E38zr7cpBsm3IM%3D&amp;reserved=0> in a request to (http, example.com, 80) would be encoded as an Uri-Path option containing

utf8(decode-percent-encodings("city")) = h'63697479' followed by an Uri-Path option containing

utf8(decode-percent-encodings("Montr%C3%A9al")) = h'4d6f6e7472c3a9616c'.



CoAP does not require any Unicode normalization be performed, so if a client happens to make a request with an Uri-Path option with

utf8(nfd("Montréal")) = h'4d6f6e7472_65cc81_616c' where the server expects an Uri-Path option with utf8(nfc("Montréal")) = h'4d6f6e7472_c3a9_616c' (or vice versa), then the client will get a

4.04 Not Found error.



Most people would say that’s a problem (as you point out at the bottom of your message).

IRI’s (in RFC 3987) on the other hand do require normalization.



CoAP defines a conversion from CoAP options to URIs (and vice versa).

This conversion is purely syntactic, so an Uri-Path option with h'4d6f6e7472_65cc81_616c' in the request URI would become <https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fexample.com%2Fcity%2FMontr%2565%25CC%2581al&amp;data=02%7C01%7Cdthaler%40microsoft.com%7Cf33e3c24c3e34af6d91f08d68a966e89%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636848776049783845&amp;sdata=A88h0r8EuhVyju889ip7swWUFgn2laYUjbXnSDVmO2k%3D&amp;reserved=0>.



CoRAL, in the binary format, does exactly the same for link targets (except that the conversion of CIRI options is currently defined to be to IRIs, which I'll replace with URIs in the upcoming draft-hartke-t2trg-ciri-01).



* In Web Linking [RFC8288], the context and the target of a link are IRIs. However, these are serialized as URIs on the wire in the "Link"

header field.



CoRAL, in the binary format, does exactly the same for link targets (except that it uses CBOR instead of ASCII characters to delimit the URI components on the wire).



* In RDF, concepts are named with globally unique Unicode strings. To make the minting of these strings painless, they are restricted to the syntax of an IRI. These IRIs are used purely as identity tokens (in

RFC3987 lingo) and are therefore compared character-by-character.



RDF recommends [1] that these IRIs avoid non-normalized forms such as uppercase characters in scheme names, explicitly stated HTTP default port, percent-encoding of characters where it is not required by IRI syntax, and IRIs that are not in NFC. So for example the concept identified by the IRI <https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fexample.com%2Fcity%2FMontr%C3%A9al&amp;data=02%7C01%7Cdthaler%40microsoft.com%7Cf33e3c24c3e34af6d91f08d68a966e89%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636848776049783845&amp;sdata=0adMfMUJNNYHiHk9jZ%2Fh11NVM9Rh5eCYlIwKk3CQDUI%3D&amp;reserved=0> is not the same as the concept identified by the IRI <https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fexample.com%2Fcity%2FMontr%C3%A9al&amp;data=02%7C01%7Cdthaler%40microsoft.com%7Cf33e3c24c3e34af6d91f08d68a966e89%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C1%7C636848776049783845&amp;sdata=0adMfMUJNNYHiHk9jZ%2Fh11NVM9Rh5eCYlIwKk3CQDUI%3D&amp;reserved=0> if one of them isn't in NFC.



CoRAL, in the binary format, does exactly the same for link relation types.



Now, in M2M communication, I think we avoid all problems with normalization etc.: Servers authoritatively manage the namespace of their resources. If a client asks a server "Hey, server, what resources do you have?" and the server responds with "I have a resource at [6, h'63697479', 6, h'4d6f6e7472c3a9616c'].", then the client can simply copy those bytes into its next request without ever decoding them. As long as the server accepts its own output as input, everything works.



When a client compares link relation types to locally stored strings, it can use byte-for-byte comparison (as suggested by RFC3987) as long as both the server and the client store the link relation types exactly as they are defined.



The only issue left is when human users input IRIs as link relation types or as link targets. In CoRAL, this happens in the textual format. Interestingly, Turtle [2] doesn't seem to perform any kind of input normalization, so human users are expected to write in perfect NFC when they identify a concept by the IRI. (Maybe I'm missing

something?) If true, this seems like a bad user experience. However, I think it would be a bad user experience as well if one has to write <%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF> instead of <こんにちは>.



It’s not just user experience, it might cause security vulnerabilities if such a string is used for security purposes.

See RFC 6943.



I would prefer to not invent anything new here and to just follow the consensus if possible.



One typical consensus in the IETF is to put URIs, not IRIs, on the wire.



Dave



Klaus



[1] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2Frdf11-concepts%2F%23note-iris&amp;data=02%7C01%7Cdthaler%40microsoft.com%7Cf33e3c24c3e34af6d91f08d68a966e89%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636848776049783845&amp;sdata=BqgcQRFWj1DpNuOU%2FNCCMvHRMwTDFmzJKtMujHcMsag%3D&amp;reserved=0

[2] https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.w3.org%2FTR%2Fturtle%2F&amp;data=02%7C01%7Cdthaler%40microsoft.com%7Cf33e3c24c3e34af6d91f08d68a966e89%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636848776049793804&amp;sdata=WS%2BWnFEFffWRohqs7EpOe4FZyiFrgKre89gY630mT98%3D&amp;reserved=0