Re: [apps-discuss] Fun with URLs and regex

Sam Ruby <rubys@intertwingly.net> Wed, 28 January 2015 02:24 UTC

Return-Path: <rubys@intertwingly.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 077921ACC86 for <apps-discuss@ietfa.amsl.com>; Tue, 27 Jan 2015 18:24:55 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ujdeA87X3sRN for <apps-discuss@ietfa.amsl.com>; Tue, 27 Jan 2015 18:24:52 -0800 (PST)
Received: from cdptpa-oedge-vip.email.rr.com (cdptpa-outbound-snat.email.rr.com [107.14.166.232]) by ietfa.amsl.com (Postfix) with ESMTP id BF1651ACC82 for <apps-discuss@ietf.org>; Tue, 27 Jan 2015 18:24:52 -0800 (PST)
Received: from [98.27.51.253] ([98.27.51.253:17582] helo=rubix) by cdptpa-oedge01 (envelope-from <rubys@intertwingly.net>) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id A0/05-10978-37848C45; Wed, 28 Jan 2015 02:24:52 +0000
Received: from rubymacb.local (cpe-098-027-051-253.nc.res.rr.com [98.27.51.253]) (using TLSv1 with cipher ECDHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: rubys) by rubix (Postfix) with ESMTPSA id 523E71408AD; Tue, 27 Jan 2015 21:24:52 -0500 (EST)
Message-ID: <54C84872.5040902@intertwingly.net>
Date: Tue, 27 Jan 2015 21:24:50 -0500
From: Sam Ruby <rubys@intertwingly.net>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Mark Nottingham <mnot@mnot.net>
References: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net> <CACweHNBVOrVMesB7HOjPNHe5FtzL1k9XDGAHUXAx5DbOSYv5jA@mail.gmail.com> <A1E5B0EC-FAD5-4178-8C7B-540BEB61DC06@mnot.net> <54AEB660.1020701@intertwingly.net> <F122ADA8-4A96-4F88-BB9F-3C5C6A544067@mnot.net>
In-Reply-To: <F122ADA8-4A96-4F88-BB9F-3C5C6A544067@mnot.net>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
X-RR-Connecting-IP: 107.14.168.118:25
X-Cloudmark-Score: 0
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/wib8HqJsEf8s4Hl6JUBntXMnMoQ>
Cc: IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Fun with URLs and regex
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 28 Jan 2015 02:24:55 -0000

On 1/27/15 8:53 PM, Mark Nottingham wrote:
> Hi Sam,
>
>> On 9 Jan 2015, at 3:54 am, Sam Ruby <rubys@intertwingly.net> wrote:
>>
>> Mark cares about valid URIs.  He's certainly not alone in that.  What he has done is express his interests not merely in high level prose, but in concrete, executable form.  Given that he has done that, I can pose some interesting questions.  For example, if you consider the process of canonicalizing a href value on an <a> element and stringifying the result, an implementation like Chrome will produce something that will be sent across the wire.  I've captured the results here:
>>
>> https://raw.githubusercontent.com/webspecs/url/develop/evaluate/useragent-results/chrome
>>
>> Given that data and Mark's script, I can produce a list of outputs that Mark doesn't consider valid:
>
> [...]
>
>> With this data, we can have a discussion as to whether Mark's script should be updated, or Chrome should change, or some spec should change.
>
> What I was hoping for was an update of the "valid URI" filter to take this into account at <https://url.spec.whatwg.org/interop/test-results/?filter=valid>.
>
> E.g., that list currently includes test case 63, "https:/example.com/", which is valid according to the generic syntax in RFC3986, but not when you consider the scheme-specific constraints for HTTPS in RFC7230.
>
> By filtering out these cases, we can see the places we potentially need to pay attention to in the RFCs.
>
> Is that possible?

Since that code runs filters on the browser, it would be easiest for me 
integrate code written in JavaScript.

Can you review the generated JavaScript?

https://url.spec.whatwg.org/reference-implementation/uri-validate.js
https://url.spec.whatwg.org/reference-implementation/uri-validate.html

Also, I would like to discuss where this code should live:

http://www.ietf.org/mail-archive/web/apps-discuss/current/msg13635.html

> Cheers,
>
> --
> Mark Nottingham   https://www.mnot.net/

- Sam Ruby