Re: [apps-discuss] Parsing text into URLs that doesn't conform to RFC 3986

<darrel.miller@gmail.com> Sun, 11 January 2015 21:49 UTC

Return-Path: <darrel.miller@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 677A51A1A86 for <apps-discuss@ietfa.amsl.com>; Sun, 11 Jan 2015 13:49:14 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GTTwZSbAUBs1 for <apps-discuss@ietfa.amsl.com>; Sun, 11 Jan 2015 13:49:12 -0800 (PST)
Received: from mail-qa0-x236.google.com (mail-qa0-x236.google.com [IPv6:2607:f8b0:400d:c00::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 88B871A0BE8 for <apps-discuss@ietf.org>; Sun, 11 Jan 2015 13:49:12 -0800 (PST)
Received: by mail-qa0-f54.google.com with SMTP id i13so13341025qae.13 for <apps-discuss@ietf.org>; Sun, 11 Jan 2015 13:49:11 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:mime-version:from:to:subject:importance:date:in-reply-to :references:content-type; bh=fyo6nddJ5og9wbgcQvX7tnWgakBSUrZurFGR6Npu4wE=; b=MCvEIojH2jgT/uICW0RBV9IvH5P1dE7KcThE4g8tFggNpcpKq88UeX9tfPSim6o1M3 +z03qC45BCZXWYiDpzRfFxArkGJcst0KmDkLryOBBm4ch3GmXmh5Jrwde78zeDadX48K YHeu/VkZXm08NOmMjqB95Z6zbvf/AjM79fujenyz2IaD/BVcmPq8E4cwMWvmdZ0350gh NhlxIP/ig1kcuYPZLfecDr7njY2llrtUlI1GeEley2My4pLZVWlMicRhBHJrI72OGfD0 H/UVan9+C5V43uZas7qOIshsbs+bf4E14dBvcZ7hRBSjq7fQWFcgGUFF3ftpYYkyr5aq 4A8w==
X-Received: by 10.224.166.71 with SMTP id l7mr20618368qay.50.1421012951787; Sun, 11 Jan 2015 13:49:11 -0800 (PST)
Received: from Pecan (static-207-253-111-30.vtl.net. [207.253.111.30]) by mx.google.com with ESMTPSA id o59sm13140413qga.0.2015.01.11.13.49.10 for <apps-discuss@ietf.org> (version=TLSv1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Sun, 11 Jan 2015 13:49:11 -0800 (PST)
Message-ID: <54b2efd7.c1308c0a.2b65.47f7@mx.google.com>
MIME-Version: 1.0
From: darrel.miller@gmail.com
To: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Importance: Normal
Date: Sun, 11 Jan 2015 21:24:27 +0000
In-Reply-To: <54B2C6FA.80802@intertwingly.net>
References: <20140926010029.26660.82167.idtracker@ietfa.amsl.com> <CACweHNBEYRFAuw9-vfeyd_wf703cvM3ykZoRMqAokRFYG_O7hQ@mail.gmail.com> <DM2PR0201MB09602B351692D424A49C6B0DC3650@DM2PR0201MB0960.namprd02.prod.outlook.com> <CACweHNBN_Bv=jeXQ_VwXi2HzHKNEwZJ1NiF-BJJo_9-mhO60gQ@mail.gmail.com> <54A557E1.6050502@intertwingly.net> <CACweHNCQZg1U1u8U=-f6h0+BPnp6Wr_T=r_wGiPAbhTbuMCGWQ@mail.gmail.com> <54A94109.5010901@intertwingly.net> <00cf01d02cc7$d5dba4c0$4001a8c0@gateway.2wire.net> <54B16C2B.9050604@seantek.com> <54B17BBE.4000900@intertwingly.net> <54B18B61.8010308@seantek.com> <54B19435.8070401@intertwingly.net> <54B1B211.3050807@seantek.com> <54B1B682.3070609@intertwingly.net> <54B28E0F.8070306@gmx.de> <54B2936B.7030805@intertwingly.net> <05AD7DE2-1C54-45CD-B33A-13766D771E57@mnot.net> <54B2A2CD.5080502@gmx.de> <1A5BBD25-FEBD-49B1-9EFB-4EF8877BF0E7@mnot.net> <54B2A4F9.2070909@gmx.de> <54B2A894.4020201@intertwingly.net> <54B2ABA8.6030205@gmx.de>,<54B2C6FA.80802@intertwingly.net>
Content-Type: multipart/alternative; boundary="_DE553726-AC86-491D-99F0-0C7C09D63C16_"
Archived-At: <http://mailarchive.ietf.org/arch/msg/apps-discuss/de0rHxzOdY7AlSBscgt41ElxD1c>
Subject: Re: [apps-discuss] Parsing text into URLs that doesn't conform to RFC 3986
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Jan 2015 21:49:14 -0000

What really concerns me about the apparent objective of making all URL parsers produce identical results from the set of tests produced by Whatwg, is that it seems to ignore the fact that URLs are created and consumed in very different contexts.




When text is entered in the address bar of a web browser and it is parsed as a URL, then I understand the desire to be as tolerant as possible of errors.  I see minimal danger in making an educated guess as to what the user intended because the browser will likely correct the error and show the user what the corrected URL is, before making the safe request.




However, when I use the System.Uri class to parse some text that I find in a HTTP response or configuration file, it would really concern me if the parser started making significant assumptions about how to generate valid URLs from text that doesn’t conform to the syntax of RFC 3986.




I would be really interested in seeing the development of a specification that describes the optimum way of converting a string that doesn't conform to RFC 3986 syntax into one that does.  That I can get behind.  Changing .NET's System.Uri to make best guesses, I can't.




Darrel