[core] Possible and impossible CIRIs (was: Review of CoRAL)

Christian Amsüss <christian@amsuess.com> Fri, 09 November 2018 12:18 UTC

Return-Path: <christian@amsuess.com>
X-Original-To: core@ietfa.amsl.com
Delivered-To: core@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CBAF3130DE9; Fri, 9 Nov 2018 04:18:06 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BaBwvYKn3Nvn; Fri, 9 Nov 2018 04:18:04 -0800 (PST)
Received: from prometheus.amsuess.com (prometheus.amsuess.com [5.9.147.112]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 10D3B130DD8; Fri, 9 Nov 2018 04:18:03 -0800 (PST)
Received: from poseidon-mailhub.amsuess.com (unknown [IPv6:2a02:b18:c13b:8010:a800:ff:fede:b1bd]) by prometheus.amsuess.com (Postfix) with ESMTPS id C259341F6B; Fri, 9 Nov 2018 13:18:01 +0100 (CET)
Received: from poseidon-mailbox.amsuess.com (unknown [IPv6:2a02:b18:c13b:8010:a800:ff:fede:b1bf]) by poseidon-mailhub.amsuess.com (Postfix) with ESMTP id 1656639; Fri, 9 Nov 2018 13:18:00 +0100 (CET)
Received: from hephaistos.amsuess.com (hephaistos.amsuess.com [IPv6:2a02:b18:c13b:8010::71b]) by poseidon-mailbox.amsuess.com (Postfix) with ESMTPSA id 500D12E; Fri, 9 Nov 2018 13:17:59 +0100 (CET)
Received: (nullmailer pid 12457 invoked by uid 1000); Fri, 09 Nov 2018 12:17:58 -0000
Date: Fri, 9 Nov 2018 13:17:58 +0100
From: Christian =?iso-8859-1?Q?Ams=FCss?= <christian@amsuess.com>
To: draft-hartke-t2trg-coral@ietf.org
Cc: core@ietf.org
Message-ID: <20181109121757.GA17755@hephaistos.amsuess.com>
References: <20181031164534.GA4995@hephaistos.amsuess.com>
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="yrj/dFKFPuw6o+aM"
Content-Disposition: inline
In-Reply-To: <20181031164534.GA4995@hephaistos.amsuess.com>
User-Agent: Mutt/1.10.1 (2018-07-13)
Archived-At: <https://mailarchive.ietf.org/arch/msg/core/uv4OiTHA2f8bgIoVRMJzrPF07JY>
Subject: [core] Possible and impossible CIRIs (was: Review of CoRAL)
X-BeenThere: core@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Constrained RESTful Environments \(CoRE\) Working Group list" <core.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/core>, <mailto:core-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/core/>
List-Post: <mailto:core@ietf.org>
List-Help: <mailto:core-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/core>, <mailto:core-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Nov 2018 12:18:07 -0000

Running larger tests of CoRAL conversion from RDF (because that's what I
have large quantities of) I ran into issues converting IRIs, which I
think makes sense to discuss (and maybe write down in a section in what
I only just noticed now to be the standalone CIRI document) -- "Which
IRIs can we express and which not".

Topics there are:

* userinfo (ftp://user:password@example.com/): Can not be expressed; that
  way of transporting credentials is deprecated anyway.

* Some syntax-based denormalization in IRIs is implicitly normalized
  (following RFC 3986 section 6.2.2) when converted to CoRAL. In
  particular, this removes

  * The difference of having a trailing slash after the netloc when the
    path is empty (http://example.com vs. http://example.com/), and

  * any denormalization in percent encoding.

  * Denormalization in the capitalization of scheme and host name may
    be persisted.

* Any differences in IP literal representation: Can not be expressed.
  (I'm not sure whether that falls in the syntax-based normalization
  category; at any rate, there's no guarantees that a CoRAL system
  actually outputs the normalized version.)

* "Using the scheme's default port" can not be expressed.

  As this precludes to-CoRAL conversion (and normalized from-CoRAL) on
  schemes the processor has no explicit knowledge of, I already
  suggested that this be allowed. (It's also a form of normalization,
  but not syntax-based.)

* URIs that do not use a netloc are not supported.

  Anticipating interaction with the dev-urn draft, I think that the
  netloc parts in the CIRI should just be optional; micrurus can run the
  recompose code with just one more bit of state (has_netloc) and
  round-trip URNs with that as well [1]. I did not check whether relative
  resolution works correctly there, but don't see why it shouldn't if
  it's defined at all.

  (I ran into this converting the various mailto: and xmpp: addresses in
  my RDF files, but urn:dev: is probably a more suitable example.)

I'm currently expanding my test section that decomposes RFC3986 example
URIs and tests recomposition, but so far, I'm confident that CIRIs can
be used for all I'd ever intend them :-)

Christian

[1]: https://gitlab.com/chrysn/micrurus/commit/b734de8fdf6cfb701ca0f662c70562574048f621

-- 
To use raw power is to make yourself infinitely vulnerable to greater powers.
  -- Bene Gesserit axiom