Re: [apps-discuss] draft-ietf-iri-comparison

"Roy T. Fielding" <> Tue, 27 January 2015 02:05 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id B198C1A1B39 for <>; Mon, 26 Jan 2015 18:05:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.666
X-Spam-Status: No, score=-1.666 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, IP_NOT_FRIENDLY=0.334, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id wO8fpiirGumM for <>; Mon, 26 Jan 2015 18:05:43 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 0B5061A008B for <>; Mon, 26 Jan 2015 18:05:42 -0800 (PST)
Received: from (localhost []) by (Postfix) with ESMTP id BF0D62005D82E; Mon, 26 Jan 2015 18:05:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed;; h=subject :mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to;; bh=mlNAhiFH6Ps0TLWgb69FwU6aT44=; b=GsfQHEN+mYtvQHqA1XzJgzD/5B9+ 2hE5SHuWNUD5BgVHQ/cGwi1Zg9QOxbYJahsTyDjOVsQARPz0Bn7g3pUkGBwjsV97 znbrHVC6tOxd+Vm65oYjX+E30+LZzkEr6KKlPhPgVykcCFJKp3s9uwJ+OI2g/gvh LbTeEguZrIEqeEg=
Received: from [] ( []) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id A73642005D82D; Mon, 26 Jan 2015 18:05:41 -0800 (PST)
Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset="windows-1252"
From: "Roy T. Fielding" <>
In-Reply-To: <>
Date: Mon, 26 Jan 2015 18:05:40 -0800
Content-Transfer-Encoding: quoted-printable
Message-Id: <>
References: <> <> <> <> <012001d02d91$6ec42300$> <> <018e01d02dc6$1d03b0a0$> <> <> <> <20150116033032.GD2350@localhost> <> <015c01d0362f$1f6f6020$> <> <> <>
To: David Singer <>
X-Mailer: Apple Mail (2.1283)
Archived-At: <>
Cc: Graham Klyne <>, "" <>
Subject: Re: [apps-discuss] draft-ietf-iri-comparison
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 27 Jan 2015 02:05:49 -0000

On Jan 26, 2015, at 10:21 AM, David Singer wrote:
>> On Jan 26, 2015, at 15:56 , Graham Klyne <> wrote:
>> IF this issue gets further discussion, I'd be very concerned that a comparison of URI beyond simple character comparison would not be universally implemented.  E.g., I do from time-to-time use URIs as index values for dictionary lookups - that depends implicitly on character-based equality.
>> Further I don't believe it is possible to completely define URI equivalence of different URI strings in a meaningful way, because the notion of equivalence depends on context of use.  When working with, say, RDF, when it is important to be sure that two references are denoting the same thing, the easy way is to just use the same referring string;  alternatively, use some extrinsic assertion that two different URIs are denoting the same thing.
>> In other words, why bother?  Just keep it simple.
> I guess it depends on what you mean by simple. 
> The simplest for the receiver is to say it’s byte-by-byte equality.  The simplest for each environment that the URI might pass through is to insist it’s in the normal form for that environment.
> These are in conflict.
> So, if at the terminal the URIs arrive, hypothetically one by a route that wants UTF-8 and escaping, and the other by a route that handles everything in UTF-16, the terminal won’t find byte equality when the user expects it. The URIs need to be put in a canonical form.  Which means that that needs defining, alas.

Technically speaking, what namespaces use is character-by-character
equality.  Hence, that is not a problem.

> What am I missing?

I think people assume there exists some canonical form, and that it
makes some difference to the application that it knows that form (or at
least is able to derive it from a non-canonical URI).

The reality is that a canonical form (if any) is only known by the person
who configures the software to provide a service that is associated with
a given URI.  All other forms are aliases. Some of those aliases might
be mechanically expected (like hostname case), but others could be
just as canonical as the first.

In some cases, the preponderance of use in one form over another causes
the original owner to change their mind about which URI is canonical.

Regardless, what Graham noted is true: the definition of canonical for
caching/reusing a GET is very different from the definition of canonical
for something like namespaces.  That was a deliberate decision, whether
we agree with it or not, so it is fair to say that there is no single
standard for canonicalizing a URI (or URL).

This is, of course, why we have a link relation for canonical.

It is not an algorithm that can be written, in code or in spec,
so it is not a bug that needs fixing in RFC3986 (or anything that
pretends to supplant it).  The URI spec defines normalization of
URIs in ways that are not scheme-specific.  The scheme specs are
supposed to define what can be normalized within each scheme.


p.s.  Am I the only one that finds it incredibly annoying that the
      APPS area uses this umbrella list for technical discussions about
      topics for which we still have the original working group mailing
      list active, archived, and populated with folks still capable of
      answering these questions?  Ya know,