Re: [apps-discuss] New information relating to draft-ietf-appsawg-file-scheme

Graham Klyne <gk@ninebynine.org> Mon, 18 April 2016 08:03 UTC

Return-Path: <gk@ninebynine.org>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E240A12DFE1 for <apps-discuss@ietfa.amsl.com>; Mon, 18 Apr 2016 01:03:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.22
X-Spam-Level:
X-Spam-Status: No, score=-4.22 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01] autolearn=unavailable autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id chJb8gKCtn9B for <apps-discuss@ietfa.amsl.com>; Mon, 18 Apr 2016 01:03:31 -0700 (PDT)
Received: from relay12.mail.ox.ac.uk (relay12.mail.ox.ac.uk [129.67.1.163]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9E36912DB56 for <apps-discuss@ietf.org>; Mon, 18 Apr 2016 00:55:36 -0700 (PDT)
Received: from smtp6.mail.ox.ac.uk ([163.1.2.206]) by relay12.mail.ox.ac.uk with esmtp (Exim 4.80) (envelope-from <gk@ninebynine.org>) id 1as41p-0008S3-fT; Mon, 18 Apr 2016 08:55:29 +0100
Received: from gklyne38.plus.com ([81.174.129.24] helo=sasharissa.local) by smtp6.mail.ox.ac.uk with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <gk@ninebynine.org>) id 1as41p-000CCz-LS; Mon, 18 Apr 2016 08:55:29 +0100
Message-ID: <571492EF.30703@ninebynine.org>
Date: Mon, 18 Apr 2016 08:55:27 +0100
From: Graham Klyne <gk@ninebynine.org>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Matthew Kerwin <matthew@kerwin.net.au>
References: <570D4C99.1030405@dcrocker.net> <CACweHND-OX+5okkJ+oE=6UN84x+CFtPBpMnU8HqaPbgQgJ_oWA@mail.gmail.com> <570E267A.2070801@ninebynine.org> <CACweHNDry0HaLUqTCRh7tpR6VwgmJnaaa9mLoKqRHV6j8j2AdQ@mail.gmail.com> <5710CD2D.6040103@ninebynine.org> <CACweHNBMXOozGNC2p5c77xWyyv2F3=EgSx02oYnkVZ-oVSea3Q@mail.gmail.com>
In-Reply-To: <CACweHNBMXOozGNC2p5c77xWyyv2F3=EgSx02oYnkVZ-oVSea3Q@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Oxford-Username: zool0635
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/AO4a6huJTEsCrmlhm35JkDELhv0>
Cc: Dave Crocker <dcrocker@bbiw.net>, Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] New information relating to draft-ietf-appsawg-file-scheme
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Apr 2016 08:03:33 -0000

Matthew,

(FWIW, I think you've done sterling work on this spec, and I'd hate to see it 
die like previous efforts did - my suggestions are offered in the hope of a 
route to sufficient consensus to allow it to proceed.  I've spoken to a few 
developers last week at the WWW conference about this, and the general feeling I 
get is that they'd like to see file:// URIs "back in the fold", primarily for 
their use as a unifying mechanism for local and web access.  My suggested 
approach to achieve this has been to stick very close to the intent of earlier 
file:// specs, while bringing it inline with RFC3986.  To the extent that 
documented extensions have community consensus I see no cause to pull them; my 
comments have come from a place of uncertainty about this.)

On 18/04/2016 06:13, Matthew Kerwin wrote:
> Hi,
>
> Thanks for some Monday reading to keep my mind occupied :)
>
> On 15 April 2016 at 21:14, Graham Klyne <gk@ninebynine.org
> <mailto:gk@ninebynine.org>> wrote:
>
>     Yesterday, in discussion with one of the original authors of RFC 1738, I
>     learned something about file:// URIs that I had not previously realized.
>
>     The upshot of this for me, in particular with references to Mark
>     Nottingham's comment
>     (https://mailarchive.ietf.org/arch/msg/apps-discuss/pZG6iMoerYpa5qq6BkHa8j0Nfjo),
>     is that there may be parts of this spec that go beyond the clarification
>     exercise required to maintain file:// URI scheme as a standard-track
>     specification.
>
>     I offer a new (not detailed) review with suggestions for elements that might
>     be dropped, in the hope that what remains is truly just a clarification
>     rather than a modification of the RFC1732 description of file:// URIs.
>
>     (My own experience and use of file:// URIs is not affected by what I
>     learned, and in my previous reviews have assumed that contributions relating
>     to UNC files in particular were based on appropriate knowledge or experience.)
>
>
> This is toeing the edge of a catch-22; when I first came up with a draft it was
> very minimal update from RFC 1738's ABNF to match RFC 3986. The very strong
> suggestion then was that this wouldn't be enough, and I should address all the
> other cruft that's built up over the decades.
>
> Part of that cruft is in handling file URIs with a non-localhost hostname like
> UNC paths.

Yeah. These discussions have been running a long while now, and some of the rest 
of the world is moving on. That seemed reasonable to me at the time as long as 
it wasn't controversial - but if it now looks like divergence from current 
practice then I'd suggest it be more clearly discursive, not normative.

>
>     ...
>
>     I have learned that the intent of including a hostname component was *not*
>     to allow some kind of cross-host file access to be invoked, but to act as a
>     signal to software using the URI that was not running on the specified host
>     that the corresponding resource was not dereferencable.
>
>
>     This is supported by, or at least consistent with, a close reading of the
>     relevant text in https://tools.ietf.org/html/rfc1738#section-3.10:
>
>     [[
>         A file URL takes the form:
>
>             file://<host>/<path>
>
>         where <host> is the fully qualified domain name of the system on
>         which the <path> is accessible, and <path> is a hierarchical
>         directory path of the form <directory>/<directory>/.../<name>.
>     ]]
>
>
> I entirely agree. That's why I have this text: "A file URI can be dependably
> dereferenced or translated to a local file path only if it is local. A file URI
> is considered “local” if it has no file-auth, or the file-auth is the special
> string “localhost”."

Yes.  For me, the new understanding calls for something something stronger, 
maybe along the lines of:

"A file URI is intended to be dereferenced for local file access only.  The 
presence of a host component in the URI is intended to be used as a signal to 
implementations that are running on hosts other than the one named that the file 
is not accessible to them."

and maybe add (possibly in an appendix):

"Some implementations allow a host component in file URIs to access files on 
non-local hosts - such behaviour is beyond the scope of this specification."

>
> I should note that there is some historical baggage here. If I'm on a machine
> called "fred", can I dereference this URL? "file://fred/tmp/foo"
>
> Surely, according to the spec, the answer is "yes". Otherwise putting anything
> other than "localhost" in the 'host' part of the URL makes it undereferenceable.

Agreed. I think that's clear in the original RFC1630 text (but not so clear in 
RFC1732).

> However as I understand it most implementations actually went with "no" (the
> heuristic presumably being: only a URL with a blank or "localhost" host can be
> dereferenced).

Maybe so - I think that is something that got obscured in RFC1732.  For myself, 
and developers I've spoken to, this is somewhat moot as the file:// URI is 
mostly used as a relative reference resolved against a locally generated base 
URI, or just specified locally.

>
> Then on top of that some (notably Windows and IE) extended the heuristic to be
> "if the host is blank, it's local; otherwise it's a network share." So in
> Windows "file://localhost/tmp/foo" is accessed via SMB. On this final point,
> after much discussion and back and forth, I went with RFC 1738's specification
> of "localhost", leaving Windows and IE noncompliant.

I recall that, and I agree with the outcome re "localhost".  (Implementations 
are always free to be non-compliant - it can hurt interop when they are, but in 
this case I think maybe not so much.)

>
> So I'm not as far from 1738 as I could be.

Sure.  I think it's the adaptations to bless remote file access that may be 
problematic - the rest looks fine to me.

>
>
>     Also, RFC 1630 is more explicit that the file scheme is for *local* file
>     access.  The following from https://tools.ietf.org/html/rfc1630 is much
>     clearer about this than RFC1738:
>
>     [[
>     file
>
>         The other URI schemes (except nntp) share the property that they are
>         equally valid at any geographical place.
>
>         There is however a real practical requirement to be able to generate
>         a URL for an object in a machine's local file system.
>
>         The syntax is similar to the ftp syntax, but in this case the slash
>         is used to donate boundaries between directory levels of a
>         hierarchical file system is used.  The "client" software converts the
>         file URL into a file name in the local file name conventions.  This
>         allows local files to be treated just as network objects without any
>         necessity to use a network server for access.  This may be used for
>         example for defining a user's "home" document in WWW.
>
>         There is clearly a danger of confusion that a link made to a local
>         file should be followed by someone on a different system, with
>         unexpected and possibly harmful results.  Therefore, the convention
>         is that even a "file" URL is provided with a host part.  This allows
>         a client on another system to know that it cannot access the file
>         system, or perhaps to use some other local mecahnism to access the
>         file.
>
>         The special value "localhost" is used in the host field to indicate
>         that the filename should really be used on whatever host one is.
>         This for example allows links to be made to files which are
>         distribted on many machines, or to "your unix local password file"
>         subject of course to consistency across the users of the data.
>
>         A void host field is equivalent to "localhost".
>     ]]
>
>
> This text was dropped when it was effectively replaced by RFC 1738. It also says
> "...or perhaps to use some other local mechanism to access the file."
>
> In either case, I don't believe I've violated the intent (or at least the
> letter) of 1738 too badly. It may be my failure to communicate -- we'll see as I
> address the comments below.
>
>
>
>     Reviewing https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-06
>     again in light of this, I have the following comments:
>
>     ...
>
>     Section 1:
>
>     [[
>     This document defines a syntax that is compatible with most extant
>         implementations, while attempting to push towards a stricter subset
>         of "ideal" constructs.  In many cases it simultaneously acknowledges
>         and deprecates some less common or outdated constructs.
>     ]]
>
>     I don't think "deprecates" is right here, as it doesn't discourage any
>     behaviours specifically allowed by RFC1732.  (cf. Mark's comment.)
>
>
> As per my replies to Dave and Mark, this wording is already changed.

Ack.  (I haven't seen the new text yet.)

>
>
>     ...
>
>     Section 1.2:
>
>     It now seems to me that the role of a host name in UNC name is quite
>     different to its (original) role in a file:// URI.  In light of this, should
>     this section be dropped?
>
>
> They both name files, and potentially allow files to be dereferenced. That's
> what I mean by "similar technologies." If it was "identical but for syntax" I
> wouldn't bother working on the file URI scheme.

But the UNC form, to my perspective, is specifically about accessing files on 
different hosts - something which was not part of the original intent of file:.

This is why I suggest it might be dropped (or moved to an informative 
appendix?).  In the face of Mark's push-back, I think that in the absence of 
hard evidence of widespread implementation we need to focus on clarifying what 
is already stated in RFC1732, and bringing it in line with RFC3986.

My understanding is that the primary purpose of this spec is to bring file:// 
URIs back onto the standards track following obsoleting of RFC1732, and that is 
something I'd really like to see myself.  My suggestions are attempting to 
remove the obstacles to this.

>
>     ...
>
>     Section 2:
>
>     [[
>            file-auth      = [ userinfo "@" ] host
>     ]]
>
>     The inclusion of "userinfo @" is an extension to previous *specifications*
>     of the file URI.  As such, I question whether this change should be
>     included. Dropping it doesn't affect compatibility with RFC3986.
>
>
> Hmm, possibly. I'm trying to recall what discussions and ideas lead to it
> looking the way it does in the draft. It may just be because that's what is
> currently allowed/expected by those who do "use some other local mechanism to
> access the file."
>
> It could be removed to the relevant appendices, I think, without any issues.

I could live with that.  I have never personally come across an implementation 
of file:// that uses that form (or if I did I didn't notice it).

(See also comments at the end of this message.)

>     ...
>
>     Section 3:
>
>     [[
>         This specification neither defines nor forbids a mechanism for
>         accessing non-local files.  See SMB [MS-SMB], NFS [RFC7530], NCP
>         [NOVELL] for examples of protocols that can be used to access files
>         over a network.  Also see Appendix C.2 for a non-normative discussion
>         on translating non-local file URIs to and from UNC strings.
>     ]]
>
>     Given the new information noted, this seems extraneous to me.  Suggest dropping.
>
>
> I admit it might come across that having the references here are a weaselly way
> of saying, "I won't tell you what to do, but you should do this." I think they
> belong in the document somewhere, though; perhaps in a relevant appendix.
>

Now it's clearer to me that file:// was never intended to allow remote access, I 
think an appendix that positions it more clearly as a non-standard extension 
would be more appropriate if the text is to remain.

>
>     ...
>
>     Section 3.1:
>
>     There is an option to include the host name even for local files, as an
>     indication that the same file should not be expected to exist on other hosts.
>
>     I think the default position, in practice, is to not specify a host name.
>     But if the applications expects that the full absolute URI may be passed to
>     another system it may make sense to include it to avoid dereferencing the
>     value in an inappropriate context.
>
>
> I might just remove section 3.1 altogether; it should be enough to have the ABNF
> syntax and then describe in prose the way the segments map to their equivalents
> in a file path. Lazily, that also puts the burden of identifying and dealing
> with all these edge cases on not-me.
>
> All I really wanted to emphases was that you can't use backslashes, even in Windows.

Ack.

>
>     ...
>
>     Section 3.2:
>
>     Given the new information noted, this section seems extraneous to me.
>     Suggest dropping.
>
>
> ...or moving to an appendix, since in my experience lots of file URL minters
> expect this to be a thing that works.
>

Ack.

>
>     ...
>
>     Section 4:
>
>     Given the new information, the discussion of encoding when sending a file
>     URI to a different system seems less relevant.  I would be inclined to drop
>     all but the first paragraph of this section.
>
>
> ​Except that there's no rule that says "you MUST NOT send a file URI to another
> computer".

Sure, but the intent of the original file:// spec is that if you do this, it not 
be dereferencable on that other computer.

But I take your point, and agree, that encoding is still relevant for relative 
references.

>
> Imagine you're into modding your favourite obscure video game, but most of the
> cool developers are Ukrainian. You unzip a self-contained bundle of libraries
> and some very beautiful hand-crafted HTML-based documentation where all the
> relative URLs look like "%E3%B3%E4/123.jpg". A clever person knows the
> percent-encoded bit is in Windows-1251 and matches the directory called "гід",
> but a dumb computer doesn't.
>
> So we do still need to document cross-system encoding. That said, on re-reading
> I do think the first sentence isn't so much an introductory statement for the
> other two as a condensed and less-specified restatement. I should work on it.
>
>
>     ...
>
>     Section 6:
>
>     If the userinfo option is removed (see above), the final paragraph becomes
>     moot.  Suggest drop.
>
>     ...
>
>     Appendix A:
>
>     The characterization of "local" and "non-local" files isn't really germane.
>     Suggest drop the sub-categorization and just list the examples.
>
>     ...
>
>     (I'm not reviewing Appendices B and C at this time, as they aren't affected
>     by my new perspective.)
>
>     ...
>
>
> So in summary you're suggesting:
>
> 1. if the hostname is given and resolves to the local machine, it's the same as
> "localhost"

Yes.  (I believe that's the intent of the original spec.)

>
> 2. remove the userinfo from the authority

I was, but see below.

>
> 3. don't mention UNC or network shares or "non-local" files in the main text
>

Yes.

> Is that right (if glib)?
>
> Regarding #1, we might have issues with Windows, IE, and Chrome, among others.

The point here, IMO, is not to *specify* all extension behaviours.  Not 
including extension behaviours in the spec doesn't prohibit them.

>
> For #2 I'll need to ask around and see what is expected here.

Hmmm, I can see your point here.  This is an incompatible extension to the 
original spec, in the sense that including a userinfo would be a syntax 
violation.  In light of my comment above, I see that may be too proscriptive.

Maybe keeping it in the syntax (for compatibility), but being clear that its 
effect is not defined by this specification?

>
> And as to #3, I don't know. I might be able to clean it up much more as part of
> my effort to make the whole thing more readable and clear of purpose. I'll give
> it some solid work this afternoon and tomorrow.

I think the key test here is whether it's seen as diverging from current 
practice - if there's consensus that this material does reflect current practice 
then I think there's no cause to pull it.  I was responding mainly to the 
concern that the spec was defining extension behaviours that do not reflect 
current widespread practice.

Thanks for your efforts!

#g
--