Re: [apps-discuss] New information relating to draft-ietf-appsawg-file-scheme

Matthew Kerwin <matthew@kerwin.net.au> Mon, 18 April 2016 05:13 UTC

Return-Path: <phluid61@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 364C912E029 for <apps-discuss@ietfa.amsl.com>; Sun, 17 Apr 2016 22:13:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.149
X-Spam-Level:
X-Spam-Status: No, score=-2.149 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FORGED_FROMDOMAIN=0.199, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OpD0sSG2Zl35 for <apps-discuss@ietfa.amsl.com>; Sun, 17 Apr 2016 22:13:02 -0700 (PDT)
Received: from mail-ig0-x235.google.com (mail-ig0-x235.google.com [IPv6:2607:f8b0:4001:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E439712E024 for <apps-discuss@ietf.org>; Sun, 17 Apr 2016 22:13:01 -0700 (PDT)
Received: by mail-ig0-x235.google.com with SMTP id g8so71097342igr.0 for <apps-discuss@ietf.org>; Sun, 17 Apr 2016 22:13:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc; bh=yTfsSjrtgUs8pCyJ5V3+xwWF/j0SPsDrgOSeCp86Pxo=; b=r+0CiZ92Ldk/lX0PN2FmfIZWujE90ZNUaRxI8SMoz5uK33KYO7bNjpmRHhTBEzej2D MZMoM/3NeQMt4hdmha5kvXFG++gQmWNrAQPpJEU6PoO20KNq8c4dvJJuCbdOkxrxtQx4 18+xm6kFhWkeGq1boZKgM7UhbtTpAsdekQGHzUwt7E2DB6yfXwqwznLZAXo7lT3lV0rT 3edMNPdFjcfc1rOrCxU/4trUsQj1Ym/gqzAT+3b3SXfSQxilpw/tYKBpVHtp7KgH1fPY 1vEiRmgrEB1+xCWzMpEh5sig9rbi1eX5eAjPH+ztr51lWxokNuqwFibcSabF42Ws4zGR EA2Q==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:date :message-id:subject:from:to:cc; bh=yTfsSjrtgUs8pCyJ5V3+xwWF/j0SPsDrgOSeCp86Pxo=; b=gAm0xLz3oaUNy5PCXM7pAn296gDLKhP0v3kvUNXp32HiOfS6iGnglBc836q38TcgOV x24dG/KT86BsSAmteawSieaGu6LsI1Dw/Ob17Fm/7TPl+kzyK6sNkLMs6thbAilxSeDI +6AwdiiVC9n16uoEdeb3BO2TdzZ9QR+iZZFrMkHhl5sfQDAnGHudecEoGfToxuKwss5l ss5tpqm1UhJdEPJ9D0TrXuoH6dkZ6Hc9O/kaL2eCqebx90Znyel6SN1rbY3AC4CAAX7C SAbi6TDuNgG6EuNViEcuR4lKSPU6MSngvn4Kqp3XsEwpfITlE+uThfdJDiqtqxgXoNrj Y0kg==
X-Gm-Message-State: AOPr4FVwbaQZlrlXix5VkMl6NyNUlkqf3gl5BQPg+bj2M0JG9jYvrF8jixVP1NqrPcoaPd0z41GdGDhucL3LYw==
MIME-Version: 1.0
X-Received: by 10.50.62.113 with SMTP id x17mr17486562igr.34.1460956381177; Sun, 17 Apr 2016 22:13:01 -0700 (PDT)
Sender: phluid61@gmail.com
Received: by 10.107.4.2 with HTTP; Sun, 17 Apr 2016 22:13:01 -0700 (PDT)
In-Reply-To: <5710CD2D.6040103@ninebynine.org>
References: <570D4C99.1030405@dcrocker.net> <CACweHND-OX+5okkJ+oE=6UN84x+CFtPBpMnU8HqaPbgQgJ_oWA@mail.gmail.com> <570E267A.2070801@ninebynine.org> <CACweHNDry0HaLUqTCRh7tpR6VwgmJnaaa9mLoKqRHV6j8j2AdQ@mail.gmail.com> <5710CD2D.6040103@ninebynine.org>
Date: Mon, 18 Apr 2016 15:13:01 +1000
X-Google-Sender-Auth: F5G6qfXUSG97x86mmB5FbMkFVWk
Message-ID: <CACweHNBMXOozGNC2p5c77xWyyv2F3=EgSx02oYnkVZ-oVSea3Q@mail.gmail.com>
From: Matthew Kerwin <matthew@kerwin.net.au>
To: Graham Klyne <gk@ninebynine.org>
Content-Type: multipart/alternative; boundary="047d7bb0414e21f3be0530bb6aa4"
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/QbQZLr489-EYjqzBhSFJeSESWPE>
Cc: Dave Crocker <dcrocker@bbiw.net>, Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] New information relating to draft-ietf-appsawg-file-scheme
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Apr 2016 05:13:08 -0000

Hi,

Thanks for some Monday reading to keep my mind occupied :)

On 15 April 2016 at 21:14, Graham Klyne <gk@ninebynine.org> wrote:

> Yesterday, in discussion with one of the original authors of RFC 1738, I
> learned something about file:// URIs that I had not previously realized.
>
> The upshot of this for me, in particular with references to Mark
> Nottingham's comment (
> https://mailarchive.ietf.org/arch/msg/apps-discuss/pZG6iMoerYpa5qq6BkHa8j0Nfjo),
> is that there may be parts of this spec that go beyond the clarification
> exercise required to maintain file:// URI scheme as a standard-track
> specification.
>
> I offer a new (not detailed) review with suggestions for elements that
> might be dropped, in the hope that what remains is truly just a
> clarification rather than a modification of the RFC1732 description of
> file:// URIs.
>
> (My own experience and use of file:// URIs is not affected by what I
> learned, and in my previous reviews have assumed that contributions
> relating to UNC files in particular were based on appropriate knowledge or
> experience.)
>
>
This is toeing the edge of a catch-22; when I first came up with a draft it
was very minimal update from RFC 1738's ABNF to match RFC 3986. The very
strong suggestion then was that this wouldn't be enough, and I should
address all the other cruft that's built up over the decades.

Part of that cruft is in handling file URIs with a non-localhost hostname
like UNC paths.


...
>
> I have learned that the intent of including a hostname component was *not*
> to allow some kind of cross-host file access to be invoked, but to act as a
> signal to software using the URI that was not running on the specified host
> that the corresponding resource was not dereferencable.


> This is supported by, or at least consistent with, a close reading of the
> relevant text in https://tools.ietf.org/html/rfc1738#section-3.10:
>
> [[
>    A file URL takes the form:
>
>        file://<host>/<path>
>
>    where <host> is the fully qualified domain name of the system on
>    which the <path> is accessible, and <path> is a hierarchical
>    directory path of the form <directory>/<directory>/.../<name>.
> ]]
>
>
I entirely agree. That's why I have this text: "A file URI can be
dependably dereferenced or translated to a local file path only if it is
local. A file URI is considered “local” if it has no file-auth, or the
file-auth is the special string “localhost”."

I should note that there is some historical baggage here. If I'm on a
machine called "fred", can I dereference this URL? "file://fred/tmp/foo"

Surely, according to the spec, the answer is "yes". Otherwise putting
anything other than "localhost" in the 'host' part of the URL makes it
undereferenceable. However as I understand it most implementations actually
went with "no" (the heuristic presumably being: only a URL with a blank or
"localhost" host can be dereferenced).

Then on top of that some (notably Windows and IE) extended the heuristic to
be "if the host is blank, it's local; otherwise it's a network share." So
in Windows "file://localhost/tmp/foo" is accessed via SMB. On this final
point, after much discussion and back and forth, I went with RFC 1738's
specification of "localhost", leaving Windows and IE noncompliant.

So I'm not as far from 1738 as I could be.


Also, RFC 1630 is more explicit that the file scheme is for *local* file
> access.  The following from https://tools.ietf.org/html/rfc1630 is much
> clearer about this than RFC1738:
>
> [[
> file
>
>    The other URI schemes (except nntp) share the property that they are
>    equally valid at any geographical place.
>
>    There is however a real practical requirement to be able to generate
>    a URL for an object in a machine's local file system.
>
>    The syntax is similar to the ftp syntax, but in this case the slash
>    is used to donate boundaries between directory levels of a
>    hierarchical file system is used.  The "client" software converts the
>    file URL into a file name in the local file name conventions.  This
>    allows local files to be treated just as network objects without any
>    necessity to use a network server for access.  This may be used for
>    example for defining a user's "home" document in WWW.
>
>    There is clearly a danger of confusion that a link made to a local
>    file should be followed by someone on a different system, with
>    unexpected and possibly harmful results.  Therefore, the convention
>    is that even a "file" URL is provided with a host part.  This allows
>    a client on another system to know that it cannot access the file
>    system, or perhaps to use some other local mecahnism to access the
>    file.
>
>    The special value "localhost" is used in the host field to indicate
>    that the filename should really be used on whatever host one is.
>    This for example allows links to be made to files which are
>    distribted on many machines, or to "your unix local password file"
>    subject of course to consistency across the users of the data.
>
>    A void host field is equivalent to "localhost".
> ]]
>
>
This text was dropped when it was effectively replaced by RFC 1738. It also
says "...or perhaps to use some other local mechanism to access the file."

In either case, I don't believe I've violated the intent (or at least the
letter) of 1738 too badly. It may be my failure to communicate -- we'll see
as I address the comments below.



> Reviewing https://tools.ietf.org/html/draft-ietf-appsawg-file-scheme-06
> again in light of this, I have the following comments:
>
> ...
>
> Section 1:
>
> [[
> This document defines a syntax that is compatible with most extant
>    implementations, while attempting to push towards a stricter subset
>    of "ideal" constructs.  In many cases it simultaneously acknowledges
>    and deprecates some less common or outdated constructs.
> ]]
>
> I don't think "deprecates" is right here, as it doesn't discourage any
> behaviours specifically allowed by RFC1732.  (cf. Mark's comment.)
>
>
As per my replies to Dave and Mark, this wording is already changed.


...
>
> Section 1.2:
>
> It now seems to me that the role of a host name in UNC name is quite
> different to its (original) role in a file:// URI.  In light of this,
> should this section be dropped?
>
>
They both name files, and potentially allow files to be dereferenced.
That's what I mean by "similar technologies." If it was "identical but for
syntax" I wouldn't bother working on the file URI scheme.


...
>
> Section 2:
>
> [[
>       file-auth      = [ userinfo "@" ] host
> ]]
>
> The inclusion of "userinfo @" is an extension to previous *specifications*
> of the file URI.  As such, I question whether this change should be
> included. Dropping it doesn't affect compatibility with RFC3986.
>
>
Hmm, possibly. I'm trying to recall what discussions and ideas lead to it
looking the way it does in the draft. It may just be because that's what is
currently allowed/expected by those who do "use some other local mechanism
to access the file."

It could be removed to the relevant appendices, I think, without any issues.


...
>
> Section 3:
>
> [[
>    This specification neither defines nor forbids a mechanism for
>    accessing non-local files.  See SMB [MS-SMB], NFS [RFC7530], NCP
>    [NOVELL] for examples of protocols that can be used to access files
>    over a network.  Also see Appendix C.2 for a non-normative discussion
>    on translating non-local file URIs to and from UNC strings.
> ]]
>
> Given the new information noted, this seems extraneous to me.  Suggest
> dropping.
>
>
I admit it might come across that having the references here are a weaselly
way of saying, "I won't tell you what to do, but you should do this." I
think they belong in the document somewhere, though; perhaps in a relevant
appendix.


...
>
> Section 3.1:
>
> There is an option to include the host name even for local files, as an
> indication that the same file should not be expected to exist on other
> hosts.
>
> I think the default position, in practice, is to not specify a host name.
> But if the applications expects that the full absolute URI may be passed to
> another system it may make sense to include it to avoid dereferencing the
> value in an inappropriate context.
>
>
I might just remove section 3.1 altogether; it should be enough to have the
ABNF syntax and then describe in prose the way the segments map to their
equivalents in a file path. Lazily, that also puts the burden of
identifying and dealing with all these edge cases on not-me.

All I really wanted to emphases was that you can't use backslashes, even in
Windows.


...
>
> Section 3.2:
>
> Given the new information noted, this section seems extraneous to me.
> Suggest dropping.
>
>
...or moving to an appendix, since in my experience lots of file URL
minters expect this to be a thing that works.


...
>
> Section 4:
>
> Given the new information, the discussion of encoding when sending a file
> URI to a different system seems less relevant.  I would be inclined to drop
> all but the first paragraph of this section.
>
>
​Except that there's no rule that says "you MUST NOT send a file URI to
another computer".

Imagine you're into modding your favourite obscure video game, but most of
the cool developers are Ukrainian. You unzip a self-contained bundle of
libraries and some very beautiful hand-crafted HTML-based documentation
where all the relative URLs look like "%E3%B3%E4/123.jpg". A clever person
knows the percent-encoded bit is in Windows-1251 and matches the directory
called "гід", but a dumb computer doesn't.

So we do still need to document cross-system encoding. That said, on
re-reading I do think the first sentence isn't so much an introductory
statement for the other two as a condensed and less-specified restatement.
I should work on it.


...
>
> Section 6:
>
> If the userinfo option is removed (see above), the final paragraph becomes
> moot.  Suggest drop.
>
> ...
>
> Appendix A:
>
> The characterization of "local" and "non-local" files isn't really
> germane. Suggest drop the sub-categorization and just list the examples.
>
> ...
>
> (I'm not reviewing Appendices B and C at this time, as they aren't
> affected by my new perspective.)
>
> ...
>
>
So in summary you're suggesting:

1. if the hostname is given and resolves to the local machine, it's the
same as "localhost"

2. remove the userinfo from the authority

3. don't mention UNC or network shares or "non-local" files in the main text

Is that right (if glib)?

Regarding #1, we might have issues with Windows, IE, and Chrome, among
others.

For #2 I'll need to ask around and see what is expected here.

And as to #3, I don't know. I might be able to clean it up much more as
part of my effort to make the whole thing more readable and clear of
purpose. I'll give it some solid work this afternoon and tomorrow.

Cheers
-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/