Re: [apps-discuss] Fun with URLs and regex
Sam Ruby <rubys@intertwingly.net> Thu, 08 January 2015 16:55 UTC
Return-Path: <rubys@intertwingly.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C385C1A87EF for <apps-discuss@ietfa.amsl.com>; Thu, 8 Jan 2015 08:55:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NORMAL_HTTP_TO_IP=0.001, RCVD_IN_DNSWL_NONE=-0.0001, WEIRD_PORT=0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fzmlQEscCsG7 for <apps-discuss@ietfa.amsl.com>; Thu, 8 Jan 2015 08:54:59 -0800 (PST)
Received: from cdptpa-oedge-vip.email.rr.com (cdptpa-outbound-snat.email.rr.com [107.14.166.227]) by ietfa.amsl.com (Postfix) with ESMTP id 5B7541A87EC for <apps-discuss@ietf.org>; Thu, 8 Jan 2015 08:54:59 -0800 (PST)
Received: from [98.27.51.253] ([98.27.51.253:62337] helo=rubix) by cdptpa-oedge02 (envelope-from <rubys@intertwingly.net>) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id D9/EB-31080-266BEA45; Thu, 08 Jan 2015 16:54:58 +0000
Received: from [192.168.1.115] (unknown [192.168.1.115]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: rubys) by rubix (Postfix) with ESMTPSA id 8BFC3140CFD; Thu, 8 Jan 2015 11:54:57 -0500 (EST)
Message-ID: <54AEB660.1020701@intertwingly.net>
Date: Thu, 08 Jan 2015 11:54:56 -0500
From: Sam Ruby <rubys@intertwingly.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Mark Nottingham <mnot@mnot.net>, Matthew Kerwin <matthew@kerwin.net.au>
References: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net> <CACweHNBVOrVMesB7HOjPNHe5FtzL1k9XDGAHUXAx5DbOSYv5jA@mail.gmail.com> <A1E5B0EC-FAD5-4178-8C7B-540BEB61DC06@mnot.net>
In-Reply-To: <A1E5B0EC-FAD5-4178-8C7B-540BEB61DC06@mnot.net>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 8bit
X-RR-Connecting-IP: 107.14.168.130:25
X-Cloudmark-Score: 0
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/pLPD4BSGWWmqw52m9dsDrJug0-U>
Cc: Alex Russell <slightlyoff@google.com>, Domenic Denicola <d@domenic.me>, IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Fun with URLs and regex
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Jan 2015 16:55:02 -0000
On 01/08/2015 10:07 AM, Mark Nottingham wrote: > Fixed, thanks. > > I’m hoping to put this into a proper github project soon and refactor it, to make reading it and making contributions easier. First, Mark: thanks for doing this! Would you consider putting it in https://github.com/webspecs/url, perhaps in the evaluate directory? Wherever it is placed, I plan to write scripts that make use of this information. For context (and in full disclosure, I discussed much of this F2F with Mark yesterday at the TAG meeting, but am including the background here for everybody's benefit): Different people care about different things, and that's all right. As an example, I was able to surprise Alex Russell and Dominic Denicola (both Google employees) by showing that the following expression produced different results on Chrome on Windows as compared to Chrome on OS/X: new URL("file:c|foo").pathname Why does this matter? Well in the case of web browsers, developers of those web browsers want content that shows up on the web to behave interoperabily, both across platforms, and across competing implementations. More specifically, if you can construct a web page (including javascript) that produces different results on Internet Explorer on Windows vs Apple's Safari on OS/X, then that's a problem. In the case of libraries like Node.js, there is a desire to parse URLs just like web browsers do. Given that Dominic cares about node.js and Chrome, I can point him to date such as the following https://url.spec.whatwg.org/interop/test-results/?select=nodejs&baseline=chrome Looks like there is quite a bit of yellow on that page. Since I care about the URL Standard, I created converted the prose into executable form so that it could be compared. This allows me to very easily produce a similar page comparing nodejs and (by proxy) the URL standard: https://url.spec.whatwg.org/interop/test-results/?select=nodejs With this data, we can triangulate: and have a discussion with real data as to whether node.js should change or whether the URL standard should change. Mark cares about valid URIs. He's certainly not alone in that. What he has done is express his interests not merely in high level prose, but in concrete, executable form. Given that he has done that, I can pose some interesting questions. For example, if you consider the process of canonicalizing a href value on an <a> element and stringifying the result, an implementation like Chrome will produce something that will be sent across the wire. I've captured the results here: https://raw.githubusercontent.com/webspecs/url/develop/evaluate/useragent-results/chrome Given that data and Mark's script, I can produce a list of outputs that Mark doesn't consider valid: "http://f:21/%20b%20?%20d%20# e": false, "http://example.com/foo%": false, "http://2001::1]/": false, "http://[www.google.com]/": false, "http://f:fifty-two/c": false, "http://foo:-80/": false, "http://example.com/foo/.%2": false, "http://2001::1/": false, "http://[google.com]/": false, "http://example.org/foo/bar#\\": false, "http://example.org/foo/[61:24:74]:98": false, "gopher://example.com/": false, "http://example.org/foo/bar#\u03b2": false, "http://example.com/foo%2%C3%82%C2%A9zbar": false, "gopher://foo/": false, "http://example.com/foo%2zbar": false, "http://f:b/c": false, "a: foo.com": false, "http://[1::2]:3:4/": false, "http://example.org/foo/[61:27]/:foo": false, "http://f:%2021%20/%20b%20?%20d%20# e": false, "http://example.com/foo%2": false, "http://foo/path;a??e#f#g": false, "data:example.com/": false, "http://www.google.com/foo?bar=baz# \u00bb": false, "http://f:%20/c": false, "http://:www.example.com/": false, "data:/example.com/": false With this data, we can have a discussion as to whether Mark's script should be updated, or Chrome should change, or some spec should change. I have taken this a step further. I wrote a script that will convert Mark's script from Python to JavaScript. This means that he can maintain his script in Python and I can include an equivalent script on web pages and use that information to filter out things that aren't interesting or highlight things that are. Some people here may have different interests, and that is OK too. I encourage everybody to find a way to express their interests in the form of code. While my preference is JavaScript, I'm otherwise pretty agnostic. XSLT, Perl, PHP, C#: doesn't matter to me. If you do so, I'll find a way to include that as a filter or as a base for comparison. Doing so means that as the test suite grows, or implementation results change(*), you will be able to get instant results to the questions that interest you. - Sam Ruby (*) As an example, I'm trying to get updated results for IE. Current status: http://intertwingly.net/blog/2015/01/08/Ununzippable-Modern-IE
- [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Matthew Kerwin
- Re: [apps-discuss] Fun with URLs and regex Bjoern Hoehrmann
- Re: [apps-discuss] Fun with URLs and regex Martin Thomson
- Re: [apps-discuss] Fun with URLs and regex Martin J. Dürst
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Nico Williams
- Re: [apps-discuss] Fun with URLs and regex Julian Reschke
- Re: [apps-discuss] Fun with URLs and regex Roy T. Fielding
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Julian Reschke
- Re: [apps-discuss] Fun with URLs and regex Roy T. Fielding
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Nico Williams
- Re: [apps-discuss] Fun with URLs and regex Nico Williams
- Re: [apps-discuss] Fun with URLs and regex Matthew Kerwin
- Re: [apps-discuss] Fun with URLs and regex Larry Masinter
- Re: [apps-discuss] Fun with URLs and regex Roy T. Fielding
- Re: [apps-discuss] Fun with URLs and regex Matthew Kerwin
- Re: [apps-discuss] Fun with URLs and regex Julian Reschke
- Re: [apps-discuss] Fun with URLs and regex Sean Leonard
- Re: [apps-discuss] Fun with URLs and regex t.petch
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Bjoern Hoehrmann