Re: [apps-discuss] I-D Action: draft-ietf-appsawg-file-scheme-09.txt

Matthew Kerwin <> Wed, 25 May 2016 22:05 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 2112512DDD8 for <>; Wed, 25 May 2016 15:05:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -1.45
X-Spam-Status: No, score=-1.45 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FORGED_FROMDOMAIN=0.198, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id IuqJvyyCKASN for <>; Wed, 25 May 2016 15:05:44 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4001:c0b::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id AF5B012D56F for <>; Wed, 25 May 2016 15:05:44 -0700 (PDT)
Received: by with SMTP id z123so36866761itg.0 for <>; Wed, 25 May 2016 15:05:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=Ob5QkxQ2cC8DJ/eTMtV3IMW6GL3y8BYO27/5zQSJZH8=; b=oaRsCw4l+2DI6J0MnQaltEppdmGKCGdZHoPloZEQ/8oDiKqvMdUor3Yb5pQFHMrHXf WKdYVYcNoPVXZrKqsYXo78EsK9gPozYkuPQEQ99ogMdk2GXx8jVUqkDONAlnwEHqWJTG aC+WGVMzpDNFPrZQyOWmLtxVeYyiJ1R/8IPaDsnjeAcuoOIrU0FiSP7EYFLkEyfy+rAt KUj3IrLvptx2QPRG+Q6Zhk/2KxtBkdfkm1KOkXsFHKgAjv4e4OLIEVckvC3SoRWUVH5R cVQn2lg8sxtg3X8v5bMx3eIVHI3Bad+ZXOvej3H5GPO2DZaoYQrdfzA6P/ivAy8RObXx /5nA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=Ob5QkxQ2cC8DJ/eTMtV3IMW6GL3y8BYO27/5zQSJZH8=; b=IlGkxkIM2EeUshtMzTsiXagjP0mDpAMnIxk2ynEEIc9sbW9xh/Kh26dchE2rhxI16e YmVrhix12nikLw0MnzuPhULFE2BTTrU7NONrWr03inj06At6IhuiOWFvkQt6Prs/VGMz m5LT3uoeusDx/DYS11MIzHEa6ZW2s8fV2zMdLWgLeuoSqf+5sIEQoj8FR30oTTF+v7gx ZqnycmfPq17CPgysLlKBcos8ncxXl3uuQ9sWJ43HmwHMlVs45DZGfslTDbxOARVMq2gZ hcNQBepJ6DYOe0D3EAMbBT+deyAd1u+SPjV61DRfPPwl5TjG4I1UhEW3JCrEEzKKRAC4 vdkA==
X-Gm-Message-State: ALyK8tLf6jWyOKeg6hni/Rk0gbNWEJvGuHlkU8R/DwUozIVWFaOnO6cyFhYN6oK8xy0HYga80ibLu2x76RnLDQ==
MIME-Version: 1.0
X-Received: by with SMTP id b40mr204352itd.10.1464213944078; Wed, 25 May 2016 15:05:44 -0700 (PDT)
Received: by with HTTP; Wed, 25 May 2016 15:05:44 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <> <> <>
Date: Thu, 26 May 2016 08:05:44 +1000
X-Google-Sender-Auth: XgwzhAv-uoEqXV9p3Jp_iwCOEkk
Message-ID: <>
From: Matthew Kerwin <>
To: Mark Nottingham <>
Content-Type: multipart/alternative; boundary="001a11351f660331e90533b1e046"
Archived-At: <>
Cc: IETF Apps Discuss <>
Subject: Re: [apps-discuss] I-D Action: draft-ietf-appsawg-file-scheme-09.txt
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: General discussion of application-layer protocols <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 25 May 2016 22:05:46 -0000

Hi Mark, sorry for sitting on this for another week. I've been called up
for jury duty.

On 20 May 2016 at 18:55, Mark Nottingham <> wrote:

> > On 20 May 2016, at 4:37 PM, Matthew Kerwin <>
> wrote:
> >
> >> >    Without other encoding information, percent-encoded octets in a
> file
> >> >    URI ([RFC3986], Section 2.1) MAY be interpreted according to the
> >> >    preferred or configured encoding of the system on which the URI is
> >> >    being interpreted.
> >>
> >>
> >> Do the current implementations of file:// do this -- i.e., use the
> filesystem's encoding for the URI?
> >
> >
> > ​Apparently. I don't have a spare drive lying around where I can
> reformat a partition to test it for myself, though. A discussion I had with
> Dave Thaler back at the very start of this draft revolved around the fact
> that percent-encoded URIs are ambiguous (apparently a real issue for
> Windows), which was why for a very long time the draft contained advice to
> use an IRI​ instead, or at the least normalize.
> VMs are good for testing.
​But some operating systems don't come cheap, if you don't happen to
already have a Windows VM to hand. ;)​

> It appears that both Windows and OSX have used UTF-8 for file name
> encoding for some time (since NT for the former, 10.0 for the latter,
> AIUI). See: <

​I thought NTFS used UCS2 for storing file and directory names; and the
Windows API functions I've seen (like PathCreateFromURL) return either UCS2
or Windows-1252 strings, depending on how you use them. I can't speak to

> Linux uses whatever locale is set. However, it appears that both Gnome and
> Firefox (at least) try to be 'smart' and will recognise UTF-8 even if
> ISO-8859-1 is set as the locale. Having said that, it's not too smart; if I
> try to open a file with a UTF-8 encoded name, Firefox can't find it when
> the locale isn't UTF-8 (although the file chooser *does* see it).
> This is an important point; the advice above that they "MAY be interpreted
> according to the preferred or configured encoding of the system on which
> the URI is being interpreted" doesn't account for the fact that a single
> filesystem might have several users who have different encodings set*.
​Not explicitly, although I had it in mind when I wrote it. Maybe your
"heuristics" bit below covers it well enough.​

> Other encodings seem to just be percent-encoded straight into the file
> URI, and Firefox doesn't make any attempt to display them as IRIs.
> (This seems to mirror how most browsers handle non-ascii characters in
> HTTP headers, e.g., Location; they just percent-encode them, since that's
> an encoding of bytes, not characters).
> Can we say something more like this?
> ---%<---
> When a file URI is produced, characters not allowed by the ABNF MUST be
> percent-encoded as characters using UTF-8 encoding, as per
> ​​
> RFC3986 Section 2.5.
​That's strengthening the requirement. RFC3986 §2.5 only says that: if
characters are part of UCS they "should first be encoded" as UTF-8 before
being percent-encoded. The MUST-level requirement is just that octets not
allowed by the ABNF must be percent-encoded.

> However, encoding information for file and/or directory names might not be
> available. In these cases, implementations MAY use heuristics to determine
> the encoding. If that fails, they SHOULD percent-encode the raw bytes of
> the label directly.

​This says this part well.​

> --->%---
> Cheers,
> * Possible but unlikely, since most people are going to be using UTF-8.
> Still...
  Matthew Kerwin