Re: [apps-discuss] Fun with URLs and regex

"Roy T. Fielding" <> Thu, 29 January 2015 01:24 UTC

Return-Path: <>
Received: from localhost ( []) by (Postfix) with ESMTP id 095A71A8A8D for <>; Wed, 28 Jan 2015 17:24:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.666
X-Spam-Status: No, score=-1.666 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, IP_NOT_FRIENDLY=0.334, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id S8MDO3ecFOrh for <>; Wed, 28 Jan 2015 17:24:56 -0800 (PST)
Received: from ( []) by (Postfix) with ESMTP id 2EF1A1A8A8C for <>; Wed, 28 Jan 2015 17:24:56 -0800 (PST)
Received: from (localhost []) by (Postfix) with ESMTP id ECD97200D3072; Wed, 28 Jan 2015 17:24:55 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed;; h=subject :mime-version:content-type:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to;; bh=Rw01ZeiS8+wkiPbGVaOrZVgTFIg=; b=Nw0RHeAzXiFhvtXsPcivgz8GdWgb pZbIQSmVRdZOonmYpYnKhUUMjH8u/gFW7NmJ/2gGmh5thlwde2Sq/0dljxdKlYGu 14Dtok5WAMs1KU9RMCEVr3LssGBRNBHabDnr3G+TZvOvmRqvmrgVLKwqoc7dtORz LHPSIA/btzF8L0Y=
Received: from [] ( []) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: by (Postfix) with ESMTPSA id BC03F200D3071; Wed, 28 Jan 2015 17:24:55 -0800 (PST)
Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset="us-ascii"
From: "Roy T. Fielding" <>
In-Reply-To: <>
Date: Wed, 28 Jan 2015 17:24:55 -0800
Content-Transfer-Encoding: 7bit
Message-Id: <>
References: <> <> <> <> <> <> <> <> <> <> <>
To: Matthew Kerwin <>
X-Mailer: Apple Mail (2.1283)
Archived-At: <>
Cc: Julian Reschke <>, Mark Nottingham <>, IETF Apps Discuss <>
Subject: Re: [apps-discuss] Fun with URLs and regex
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 29 Jan 2015 01:24:57 -0000

On Jan 28, 2015, at 3:59 PM, Matthew Kerwin wrote:

> On 29/01/2015, Julian Reschke <> wrote:
>> I agree that the fragment is part of the URI; the question, as far as I
>> understand, is whether the *scheme* definition should include the
>> fragment, given the fact that you can attach a fragment to any URI anyway.
> The answer to this affects what I write in the 'file' scheme draft. I
> was advised early on to not mention fragments (which I took as
> "disallow by omission") because, while it's easy to define syntax, the
> scheme also has to define semantics, and fragment semantics are tied
> to content type, and dereferenced 'file' URIs don't have a
> well-defined content type.
> Whether or not I mention it comes back to the definition and intended
> use-case of RFC 3986; if it defines an 'abstract' syntax - in the POO
> sense - then there's no such thing as a universal parser (i.e. It's
> impossible to parse a URI with an unknown scheme). If it defines a
> low-level structure, then any URI can be parsed, but the individual
> components can't be validated without deferring to scheme-specific
> machinery.
> If the former and I don't include the fragment in 'file', it isn't
> allowed. If the latter, I just leave a hole in the spec.

It isn't that black and white.  The grammar for the scheme is what
excludes a fragment.  That doesn't prevent the scheme docs from
talking about fragments (in reference to RFC3986) and using them
within examples.

That part of the URI spec was written specifically to address
issues created by folks who thought they could redefine the meaning
of fragments within individual schemes, or forbid them entirely,
when in fact the meaning and use of fragments are independent of

What makes you think that dereferenced files don't have a well-defined
content type?  The client might not know what it is, but that doesn't
mean the content type doesn't exist, and any decision to process the
file is basically an assumption of some content type (and its rules
for processing fragments).