Re: [apps-discuss] URL definitions and draft-ruby-url-problem
Sam Ruby <rubys@intertwingly.net> Fri, 19 December 2014 19:47 UTC
Return-Path: <rubys@intertwingly.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 39A8C1A6FE5 for <apps-discuss@ietfa.amsl.com>; Fri, 19 Dec 2014 11:47:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id JTmzPNRy3pVT for <apps-discuss@ietfa.amsl.com>; Fri, 19 Dec 2014 11:47:11 -0800 (PST)
Received: from cdptpa-oedge-vip.email.rr.com (cdptpa-outbound-snat.email.rr.com [107.14.166.231]) by ietfa.amsl.com (Postfix) with ESMTP id 4E65B1ACD88 for <apps-discuss@ietf.org>; Fri, 19 Dec 2014 11:47:11 -0800 (PST)
Received: from [98.27.51.253] ([98.27.51.253:52519] helo=rubix) by cdptpa-oedge02 (envelope-from <rubys@intertwingly.net>) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id 0C/F0-29348-EB084945; Fri, 19 Dec 2014 19:47:10 +0000
Received: from [192.168.1.102] (unknown [192.168.1.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: rubys) by rubix (Postfix) with ESMTPSA id 004DE140D25; Fri, 19 Dec 2014 14:47:09 -0500 (EST)
Message-ID: <549480BD.6080309@intertwingly.net>
Date: Fri, 19 Dec 2014 14:47:09 -0500
From: Sam Ruby <rubys@intertwingly.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Bjoern Hoehrmann <derhoermi@gmx.net>
References: <B53877D1-0996-448F-982D-4536805F2B1E@vpnc.org> <00o89a147re95aor21u3l9a7aarrhg0vts@hive.bjoern.hoehrmann.de>
In-Reply-To: <00o89a147re95aor21u3l9a7aarrhg0vts@hive.bjoern.hoehrmann.de>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
X-RR-Connecting-IP: 107.14.168.130:25
X-Cloudmark-Score: 0
Archived-At: http://mailarchive.ietf.org/arch/msg/apps-discuss/itrBpLovEnKxe_Rka9WO_vnisJg
Cc: Paul Hoffman <paul.hoffman@vpnc.org>, Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] URL definitions and draft-ruby-url-problem
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 19 Dec 2014 19:47:14 -0000
On 12/19/2014 01:41 PM, Bjoern Hoehrmann wrote: > * Paul Hoffman wrote: >> This seems like an important document for us to look at, and possibly >> adopt. Section 3 is pretty scary, and section 4 seems like a very >> reasonable solution. > > I have reviewed this document. Sections 1 and 2 seem reasonable to me. Thanks! > Section 3 has > > The main problem is conflicting specifications that overlap but don't > match each other. > > Additionally, the following are issues that need to be resolves to > make URL processing unambiguous and stable. > > o Nomenclature: over the years, a number of different sets of > terminology has been used. URL / URI / IRI is not the only > difference. [tantek-slice] chronicles a number of differences. > > The latter refers to differences among APIs for manipulating resource > identifiers. I do not think that is a problem and does not need solving. Can I ask why not? To be clear, I'm not suggesting that everybody adopt the same terms. I am suggesting that somewhere we document what terms are in use and how they map. > o Parameterization: standards in this area need to define such > matters as normalization forms and values for parameters such as > UseSTD3ASCIIRules. > > Where the relevant standards allow implementations to choose options, it > is usually because there are good reasons to do so. Implementers ought > to document their choices properly, and it is a good thing when similar > implementations make the same choices, it might even be useful to have a > specification saying "Web browsers must use these options: A, B, C", but > that would just be a matter of doing it. So I am not sure what problem > needs solving in this regard. First, it is not just web browsers. It is runtime libraries that accompany various programming languages. It is code embedded in word processors and web servers. I don't believe that having the URLs mean different things based on the context is in any body's best interests. At a minimum, we should consider clearly documenting the set of URLs that are may be interpreted differently in different contexts. Perhaps we should identify those URLs as non-conforming. However, that's not enough. Consider ICANN approved non-ASCII domain names. Different RFC 3986 compliant libraries handle https URLs with such hostnames differently. We can't simply tell people to avoid such. > o Interoperability: even after accounting for the above, there is a > demonstrable lack of interoperability across popular libraries and > browsers. [whatwg-interop] identifies a number of such > differences. > > There are different classes of problems in this regard, e.g. there may > be existing requirements that are widely ignored, there may be ambiguous > requirements interpreted differently across implementations, there may > be implementation-defined behavior that varies across implementations in > a harmful manner, or there may be widely deployed behavior where further > standardisation might be useful, to mention a few. I think it would be > helpful to discuss these classes of problems separately. I agree! To seed the discussion, I offer the following web page with a number of interesting test cases: https://url.spec.whatwg.org/interop/test-results/ Feel free to propose additional test cases, suggest categorizations, or changes to either specs or to libraries, or even additional libraries that should be included. > o Specific scheme definitions: some UR* scheme definitions are > woefully out of date, incomplete, or don't correspond to current > practice, but updating their definitions is unclear. This > includes "file:", for which there is a current effort, but there > are others which need review (including 'ftp:', 'data'). > > An open question here seems to be how to separate concerns. Can updating > specifications for individual schemes be done independently? If so, that > also would seem to simply require somebody doing it, so I am not sure of > the problem indicated here. One thing that is important to recognize is that every modern programming language is going to have a URI or URL parse function, method, subroutine, or whatever. While it may make sense to allow new schemes to impose additional validity requirements specific to their scheme or additional semantics, it is in everybody's best interests if we define a standard way in which URLs which make use of unknown URL schemes are to be parsed. Note: I am *NOT* suggesting that what currently is in the WHATWG URL Living Standard is that definition. I think we need something a bit closer to what RFC 3986 defines. I welcome input on what that behavior should be. Let's work on it together. Meanwhile, https://www.w3.org/Bugs/Public/show_bug.cgi?id=27233 is open on this issue. > As for the "Outline of Potential Solution" in section 4, I agree that a > plan should be built. How many specifications, for instance, should we > have, what would be their scope, and what would be their contents? With- > out a plan, considering the other points in the section seems premature; > perhaps some of them are reasonable things to do independently of any > plan; in that case, they could simply be done. There is indeed a big chicken and egg problem here. My best guess at this moment is that RFC 3987 needs to be retired and work needs to start on a RFC 3986bis, and that probably needs a new Working Group. And that needs the discussion we are currently having to happen. I will say that I'm willing to work with anybody, anywhere. If work on a RFC 3986bis starts, I will contribute. I also am not looking to make this somebody else's problem. If it means I need to step up to be that editor, I will do that too. And in the process, I will actively encourage others to contribute, and by that I mean GitHub pull requests. But I will also say that while we should be having the high level discussion at this time and in parallel to the technical discussions about how various classes of input strings should be parsed, it really is the latter (parser) discussion that we should be doing the deep dive on. Once we have a starter set list of proposed changes, we can circle back and determine whether a set of errata is sufficient or if the overhead of an entire Working Group is required. However, that is just my preference. Should you be interested in suggesting changes to draft-ruby-url-problem, I encourage pull requests for that too. Julian Reschke has already submitted a few. You could be next! - Sam Ruby
- [apps-discuss] URL definitions and draft-ruby-url… Paul Hoffman
- Re: [apps-discuss] URL definitions and draft-ruby… Bjoern Hoehrmann
- Re: [apps-discuss] URL definitions and draft-ruby… Sam Ruby
- Re: [apps-discuss] URL definitions and draft-ruby… Bjoern Hoehrmann
- Re: [apps-discuss] URL definitions and draft-ruby… Martin J. Dürst
- Re: [apps-discuss] URL definitions and draft-ruby… Martin J. Dürst
- Re: [apps-discuss] URL definitions and draft-ruby… Sam Ruby
- Re: [apps-discuss] URL definitions and draft-ruby… Sam Ruby
- Re: [apps-discuss] URL definitions and draft-ruby… Larry Masinter