Re: [apps-discuss] Fun with URLs and regex

Sam Ruby <rubys@intertwingly.net> Thu, 29 January 2015 15:25 UTC

Return-Path: <rubys@intertwingly.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7F8011A1A9C for <apps-discuss@ietfa.amsl.com>; Thu, 29 Jan 2015 07:25:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id gtSRDe3J7FZn for <apps-discuss@ietfa.amsl.com>; Thu, 29 Jan 2015 07:25:19 -0800 (PST)
Received: from cdptpa-oedge-vip.email.rr.com (cdptpa-outbound-snat.email.rr.com [107.14.166.227]) by ietfa.amsl.com (Postfix) with ESMTP id 093811A1AB2 for <apps-discuss@ietf.org>; Thu, 29 Jan 2015 07:25:18 -0800 (PST)
Received: from [98.27.51.253] ([98.27.51.253:4361] helo=rubix) by cdptpa-oedge02 (envelope-from <rubys@intertwingly.net>) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id E3/2F-19792-ED05AC45; Thu, 29 Jan 2015 15:25:18 +0000
Received: from [192.168.1.102] (unknown [192.168.1.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: rubys) by rubix (Postfix) with ESMTPSA id 22EC5140AEA; Thu, 29 Jan 2015 10:25:19 -0500 (EST)
Message-ID: <54CA50DD.50805@intertwingly.net>
Date: Thu, 29 Jan 2015 10:25:17 -0500
From: Sam Ruby <rubys@intertwingly.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0
MIME-Version: 1.0
To: Mark Nottingham <mnot@mnot.net>
References: <C5B10293-E6F6-4348-9782-C9C00A4476CE@mnot.net> <CACweHNBVOrVMesB7HOjPNHe5FtzL1k9XDGAHUXAx5DbOSYv5jA@mail.gmail.com> <A1E5B0EC-FAD5-4178-8C7B-540BEB61DC06@mnot.net> <54AEB660.1020701@intertwingly.net> <F122ADA8-4A96-4F88-BB9F-3C5C6A544067@mnot.net>
In-Reply-To: <F122ADA8-4A96-4F88-BB9F-3C5C6A544067@mnot.net>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
X-RR-Connecting-IP: 107.14.168.130:25
X-Cloudmark-Score: 0
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/bJ7Psj2gMAEezrPiOzkj5el9UEw>
Cc: IETF Apps Discuss <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] Fun with URLs and regex
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 29 Jan 2015 15:25:23 -0000

On 01/27/2015 08:53 PM, Mark Nottingham wrote:
>
> What I was hoping for was an update of the "valid URI" filter to take
> this into account at
> <https://url.spec.whatwg.org/interop/test-results/?filter=valid>.
>
> E.g., that list currently includes test case 63,
> "https:/example.com/", which is valid according to the generic syntax
> in RFC3986, but not when you consider the scheme-specific constraints
> for HTTPS in RFC7230.
>
> By filtering out these cases, we can see the places we potentially
> need to pay attention to in the RFCs.
>
> Is that possible?

Done and deployed.  Let me know if you see any problems.

For now, I decided to match your Python's script's behavior with respect
to fragments and known but non-HTTP schemes.  This doesn't affect any of
the existing items in the current urltestdata set.

- Sam Ruby