Re: [apps-discuss] Fun with URLs and regex
Sam Ruby <rubys@intertwingly.net> Wed, 28 January 2015 12:32 UTC
Return-Path: <rubys@intertwingly.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5C60D1A1A2F for <apps-discuss@ietfa.amsl.com>; Wed, 28 Jan 2015 04:32:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Zxu5a_6Z6Dck for <apps-discuss@ietfa.amsl.com>; Wed, 28 Jan 2015 04:32:47 -0800 (PST)
Received: from cdptpa-oedge-vip.email.rr.com (cdptpa-outbound-snat.email.rr.com [107.14.166.228]) by ietfa.amsl.com (Postfix) with ESMTP id 225531A1B26 for <apps-discuss@ietf.org>; Wed, 28 Jan 2015 04:32:46 -0800 (PST)
Received: from [98.27.51.253] ([98.27.51.253:5713] helo=rubix) by cdptpa-oedge02 (envelope-from <rubys@intertwingly.net>) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id 1C/E2-22623-DE6D8C45; Wed, 28 Jan 2015 12:32:45 +0000
Received: from [192.168.1.102] (unknown [192.168.1.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: rubys) by rubix (Postfix) with ESMTPSA id 1B59B140742; Wed, 28 Jan 2015 07:32:46 -0500 (EST)
Message-ID: <54C8D6EC.4030306@intertwingly.net>
Date: Wed, 28 Jan 2015 07:32:44 -0500
From: Sam Ruby <rubys@intertwingly.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Mark Nottingham <mnot@mnot.net>
References: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net> <CACweHNBVOrVMesB7HOjPNHe5FtzL1k9XDGAHUXAx5DbOSYv5jA@mail.gmail.com> <A1E5B0EC-FAD5-4178-8C7B-540BEB61DC06@mnot.net> <54AEB660.1020701@intertwingly.net> <F122ADA8-4A96-4F88-BB9F-3C5C6A544067@mnot.net> <54C84872.5040902@intertwingly.net> <EF1E36FA-6A30-4A65-9520-5A31571EE445@mnot.net>
In-Reply-To: <EF1E36FA-6A30-4A65-9520-5A31571EE445@mnot.net>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
X-RR-Connecting-IP: 107.14.168.130:25
X-Cloudmark-Score: 0
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/Trhz66nTmHxjFes-Yyd1g4ELs-k>
Cc: IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Fun with URLs and regex
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jan 2015 12:32:51 -0000
On 01/28/2015 12:36 AM, Mark Nottingham wrote: > >> On 28 Jan 2015, at 1:24 pm, Sam Ruby <rubys@intertwingly.net> wrote: >> >> On 1/27/15 8:53 PM, Mark Nottingham wrote: >>> Hi Sam, >>> >>>> On 9 Jan 2015, at 3:54 am, Sam Ruby <rubys@intertwingly.net> wrote: >>>> >>>> Mark cares about valid URIs. He's certainly not alone in that. What he has done is express his interests not merely in high level prose, but in concrete, executable form. Given that he has done that, I can pose some interesting questions. For example, if you consider the process of canonicalizing a href value on an <a> element and stringifying the result, an implementation like Chrome will produce something that will be sent across the wire. I've captured the results here: >>>> >>>> https://raw.githubusercontent.com/webspecs/url/develop/evaluate/useragent-results/chrome >>>> >>>> Given that data and Mark's script, I can produce a list of outputs that Mark doesn't consider valid: >>> >>> [...] >>> >>>> With this data, we can have a discussion as to whether Mark's script should be updated, or Chrome should change, or some spec should change. >>> >>> What I was hoping for was an update of the "valid URI" filter to take this into account at <https://url.spec.whatwg.org/interop/test-results/?filter=valid>. >>> >>> E.g., that list currently includes test case 63, "https:/example.com/", which is valid according to the generic syntax in RFC3986, but not when you consider the scheme-specific constraints for HTTPS in RFC7230. >>> >>> By filtering out these cases, we can see the places we potentially need to pay attention to in the RFCs. >>> >>> Is that possible? >> >> Since that code runs filters on the browser, it would be easiest for me integrate code written in JavaScript. >> >> Can you review the generated JavaScript? >> >> https://url.spec.whatwg.org/reference-implementation/uri-validate.js >> https://url.spec.whatwg.org/reference-implementation/uri-validate.html > > It looks like a reasonable transcription. I do notice you copied my error: > > return new RegExp("^" + known[scheme] + "(#|$)").test(string) Not a copy: it's actually different. My version checks for a match on the regular expression followed by either a hash mark or the end of the string. > I think it should just be: > > return new RegExp("^" + known[scheme] + "$").test(string) > > ... which brings about another interesting observation -- only http and https define fragments in their syntax; the other schemes do not. I can make that change. >> Also, I would like to discuss where this code should live: >> >> http://www.ietf.org/mail-archive/web/apps-discuss/current/msg13635.html > > I want to keep it going in Python, so I'll probably create a separate repo at some point; is it bothersome to just keep the JS in your repo? I'm not proposing changing it from Python. What I have is a script that does the conversion: https://gist.github.com/rubys/b9ccaf304f06cf3e2e88 > If bugs are found in the regex themselves, I'll make sure to notify you (and would appreciate the same). I'd suggest putting them both in the same repository. In any case, later today I'll update my filter to replace the validation check with yours. > Cheers, > > -- > Mark Nottingham https://www.mnot.net/ - Sam Ruby
- [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Matthew Kerwin
- Re: [apps-discuss] Fun with URLs and regex Bjoern Hoehrmann
- Re: [apps-discuss] Fun with URLs and regex Martin Thomson
- Re: [apps-discuss] Fun with URLs and regex Martin J. Dürst
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Nico Williams
- Re: [apps-discuss] Fun with URLs and regex Julian Reschke
- Re: [apps-discuss] Fun with URLs and regex Roy T. Fielding
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Julian Reschke
- Re: [apps-discuss] Fun with URLs and regex Roy T. Fielding
- Re: [apps-discuss] Fun with URLs and regex Mark Nottingham
- Re: [apps-discuss] Fun with URLs and regex Nico Williams
- Re: [apps-discuss] Fun with URLs and regex Nico Williams
- Re: [apps-discuss] Fun with URLs and regex Matthew Kerwin
- Re: [apps-discuss] Fun with URLs and regex Larry Masinter
- Re: [apps-discuss] Fun with URLs and regex Roy T. Fielding
- Re: [apps-discuss] Fun with URLs and regex Matthew Kerwin
- Re: [apps-discuss] Fun with URLs and regex Julian Reschke
- Re: [apps-discuss] Fun with URLs and regex Sean Leonard
- Re: [apps-discuss] Fun with URLs and regex t.petch
- Re: [apps-discuss] Fun with URLs and regex Sam Ruby
- Re: [apps-discuss] Fun with URLs and regex Bjoern Hoehrmann