Re: [apps-discuss] I-D Action: draft-ietf-appsawg-file-scheme-09.txt

Matthew Kerwin <matthew@kerwin.net.au> Wed, 25 May 2016 22:05 UTC

Return-Path: <phluid61@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2112512DDD8 for <apps-discuss@ietfa.amsl.com>; Wed, 25 May 2016 15:05:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.45
X-Spam-Level:
X-Spam-Status: No, score=-1.45 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FORGED_FROMDOMAIN=0.198, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id IuqJvyyCKASN for <apps-discuss@ietfa.amsl.com>; Wed, 25 May 2016 15:05:44 -0700 (PDT)
Received: from mail-it0-x22a.google.com (mail-it0-x22a.google.com [IPv6:2607:f8b0:4001:c0b::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id AF5B012D56F for <apps-discuss@ietf.org>; Wed, 25 May 2016 15:05:44 -0700 (PDT)
Received: by mail-it0-x22a.google.com with SMTP id z123so36866761itg.0 for <apps-discuss@ietf.org>; Wed, 25 May 2016 15:05:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=Ob5QkxQ2cC8DJ/eTMtV3IMW6GL3y8BYO27/5zQSJZH8=; b=oaRsCw4l+2DI6J0MnQaltEppdmGKCGdZHoPloZEQ/8oDiKqvMdUor3Yb5pQFHMrHXf WKdYVYcNoPVXZrKqsYXo78EsK9gPozYkuPQEQ99ogMdk2GXx8jVUqkDONAlnwEHqWJTG aC+WGVMzpDNFPrZQyOWmLtxVeYyiJ1R/8IPaDsnjeAcuoOIrU0FiSP7EYFLkEyfy+rAt KUj3IrLvptx2QPRG+Q6Zhk/2KxtBkdfkm1KOkXsFHKgAjv4e4OLIEVckvC3SoRWUVH5R cVQn2lg8sxtg3X8v5bMx3eIVHI3Bad+ZXOvej3H5GPO2DZaoYQrdfzA6P/ivAy8RObXx /5nA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=Ob5QkxQ2cC8DJ/eTMtV3IMW6GL3y8BYO27/5zQSJZH8=; b=IlGkxkIM2EeUshtMzTsiXagjP0mDpAMnIxk2ynEEIc9sbW9xh/Kh26dchE2rhxI16e YmVrhix12nikLw0MnzuPhULFE2BTTrU7NONrWr03inj06At6IhuiOWFvkQt6Prs/VGMz m5LT3uoeusDx/DYS11MIzHEa6ZW2s8fV2zMdLWgLeuoSqf+5sIEQoj8FR30oTTF+v7gx ZqnycmfPq17CPgysLlKBcos8ncxXl3uuQ9sWJ43HmwHMlVs45DZGfslTDbxOARVMq2gZ hcNQBepJ6DYOe0D3EAMbBT+deyAd1u+SPjV61DRfPPwl5TjG4I1UhEW3JCrEEzKKRAC4 vdkA==
X-Gm-Message-State: ALyK8tLf6jWyOKeg6hni/Rk0gbNWEJvGuHlkU8R/DwUozIVWFaOnO6cyFhYN6oK8xy0HYga80ibLu2x76RnLDQ==
MIME-Version: 1.0
X-Received: by 10.36.65.168 with SMTP id b40mr204352itd.10.1464213944078; Wed, 25 May 2016 15:05:44 -0700 (PDT)
Sender: phluid61@gmail.com
Received: by 10.107.138.160 with HTTP; Wed, 25 May 2016 15:05:44 -0700 (PDT)
In-Reply-To: <6F0FF7E8-D04B-458E-8E57-3B1257024C09@mnot.net>
References: <20160515051508.2444.90815.idtracker@ietfa.amsl.com> <c52370bf-dffa-4b57-9f33-52a49456b3a8@seantek.com> <CACweHNBuR8X_ub6J-yOtvoV7CjZyDC5__qKNHWGtxsjZbvyB0w@mail.gmail.com> <AEA52E18-8ECF-4497-A6C7-AD7F1B4B47DD@mnot.net> <CACweHNASbgVNvHM67UD_gDsc7L1GakLao2PsYqjZ9oiz_+v7wQ@mail.gmail.com> <6F0FF7E8-D04B-458E-8E57-3B1257024C09@mnot.net>
Date: Thu, 26 May 2016 08:05:44 +1000
X-Google-Sender-Auth: XgwzhAv-uoEqXV9p3Jp_iwCOEkk
Message-ID: <CACweHND3aZsFXbN9QY3Bn4CTVYpBq7Rv41thso5SfDcbCH_fZw@mail.gmail.com>
From: Matthew Kerwin <matthew@kerwin.net.au>
To: Mark Nottingham <mnot@mnot.net>
Content-Type: multipart/alternative; boundary=001a11351f660331e90533b1e046
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/IRHqtmnY-bVo_hFFu_wFKAegKXI>
Cc: IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] I-D Action: draft-ietf-appsawg-file-scheme-09.txt
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 25 May 2016 22:05:46 -0000

Hi Mark, sorry for sitting on this for another week. I've been called up
for jury duty.

On 20 May 2016 at 18:55, Mark Nottingham <mnot@mnot.net> wrote:

>
> > On 20 May 2016, at 4:37 PM, Matthew Kerwin <matthew@kerwin.net.au>
> wrote:
> >
> >> >    Without other encoding information, percent-encoded octets in a
> file
> >> >    URI ([RFC3986], Section 2.1) MAY be interpreted according to the
> >> >    preferred or configured encoding of the system on which the URI is
> >> >    being interpreted.
> >>
> >>
> >> Do the current implementations of file:// do this -- i.e., use the
> filesystem's encoding for the URI?
> >
> >
> > ​Apparently. I don't have a spare drive lying around where I can
> reformat a partition to test it for myself, though. A discussion I had with
> Dave Thaler back at the very start of this draft revolved around the fact
> that percent-encoded URIs are ambiguous (apparently a real issue for
> Windows), which was why for a very long time the draft contained advice to
> use an IRI​ instead, or at the least normalize.
>
> VMs are good for testing.
>
>
​But some operating systems don't come cheap, if you don't happen to
already have a Windows VM to hand. ;)​



> It appears that both Windows and OSX have used UTF-8 for file name
> encoding for some time (since NT for the former, 10.0 for the latter,
> AIUI). See: <
> https://docs.python.org/3/library/sys.html#sys.getfilesystemencoding>
>

​I thought NTFS used UCS2 for storing file and directory names; and the
Windows API functions I've seen (like PathCreateFromURL) return either UCS2
or Windows-1252 strings, depending on how you use them. I can't speak to
HFS+.​



>
> Linux uses whatever locale is set. However, it appears that both Gnome and
> Firefox (at least) try to be 'smart' and will recognise UTF-8 even if
> ISO-8859-1 is set as the locale. Having said that, it's not too smart; if I
> try to open a file with a UTF-8 encoded name, Firefox can't find it when
> the locale isn't UTF-8 (although the file chooser *does* see it).
>
> This is an important point; the advice above that they "MAY be interpreted
> according to the preferred or configured encoding of the system on which
> the URI is being interpreted" doesn't account for the fact that a single
> filesystem might have several users who have different encodings set*.
>
>
​Not explicitly, although I had it in mind when I wrote it. Maybe your
"heuristics" bit below covers it well enough.​



> Other encodings seem to just be percent-encoded straight into the file
> URI, and Firefox doesn't make any attempt to display them as IRIs.
>
> (This seems to mirror how most browsers handle non-ascii characters in
> HTTP headers, e.g., Location; they just percent-encode them, since that's
> an encoding of bytes, not characters).
>
> Can we say something more like this?
>
> ---%<---
> When a file URI is produced, characters not allowed by the ABNF MUST be
> percent-encoded as characters using UTF-8 encoding, as per
> ​​
> RFC3986 Section 2.5.
>
>
​That's strengthening the requirement. RFC3986 §2.5 only says that: if
characters are part of UCS they "should first be encoded" as UTF-8 before
being percent-encoded. The MUST-level requirement is just that octets not
allowed by the ABNF must be percent-encoded.



> However, encoding information for file and/or directory names might not be
> available. In these cases, implementations MAY use heuristics to determine
> the encoding. If that fails, they SHOULD percent-encode the raw bytes of
> the label directly.
>

​This says this part well.​



> --->%---
>
> Cheers,
>
>
> * Possible but unlikely, since most people are going to be using UTF-8.
> Still...
>
>
​Cheers
-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/