Re: [apps-discuss] URI parsing tests: userinfo handling

Sam Ruby <rubys@intertwingly.net> Fri, 02 January 2015 14:40 UTC

Return-Path: <rubys@intertwingly.net>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E20371A8752 for <apps-discuss@ietfa.amsl.com>; Fri, 2 Jan 2015 06:40:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.3
X-Spam-Level:
X-Spam-Status: No, score=-1.3 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, J_CHICKENPOX_65=0.6, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zWVSO16nMltv for <apps-discuss@ietfa.amsl.com>; Fri, 2 Jan 2015 06:40:22 -0800 (PST)
Received: from cdptpa-oedge-vip.email.rr.com (cdptpa-outbound-snat.email.rr.com [107.14.166.228]) by ietfa.amsl.com (Postfix) with ESMTP id 81B3D1A874B for <apps-discuss@ietf.org>; Fri, 2 Jan 2015 06:40:15 -0800 (PST)
Received: from [98.27.51.253] ([98.27.51.253:14978] helo=rubix) by cdptpa-oedge02 (envelope-from <rubys@intertwingly.net>) (ecelerity 3.5.0.35861 r(Momo-dev:tip)) with ESMTP id A7/2B-27763-ECDA6A45; Fri, 02 Jan 2015 14:40:14 +0000
Received: from [192.168.1.102] (unknown [192.168.1.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: rubys) by rubix (Postfix) with ESMTPSA id 093651401E4; Fri, 2 Jan 2015 09:40:13 -0500 (EST)
Message-ID: <54A6ADCD.1090500@intertwingly.net>
Date: Fri, 02 Jan 2015 09:40:13 -0500
From: Sam Ruby <rubys@intertwingly.net>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Julian Reschke <julian.reschke@gmx.de>, Graham Klyne <gk@ninebynine.org>
References: <20140926010029.26660.82167.idtracker@ietfa.amsl.com> <EAACE200D9B0224D94BF52CF2DD166A425A68A90@ex10mb6.qut.edu.au> <CACweHNBEYRFAuw9-vfeyd_wf703cvM3ykZoRMqAokRFYG_O7hQ@mail.gmail.com> <DM2PR0201MB09602B351692D424A49C6B0DC3650@DM2PR0201MB0960.namprd02.prod.outlook.com> <CACweHNBN_Bv=jeXQ_VwXi2HzHKNEwZJ1NiF-BJJo_9-mhO60gQ@mail.gmail.com> <54A5730C.8040501@ninebynine.org> <54A583DD.9010602@intertwingly.net> <54A59651.4060306@ninebynine.org> <54A59B26.5000408@intertwingly.net> <54A66C73.7000508@gmx.de> <54A67659.1070207@intertwingly.net> <54A68CEE.1060701@gmx.de> <54A69DB0.7010001@intertwingly.net> <54A6A0C0.4020405@gmx.de>
In-Reply-To: <54A6A0C0.4020405@gmx.de>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
X-RR-Connecting-IP: 107.14.168.130:25
X-Cloudmark-Score: 0
Archived-At: http://mailarchive.ietf.org/arch/msg/apps-discuss/77v83R64MI42Fg82-SEVBZKAJ2E
Cc: apps-discuss@ietf.org
Subject: Re: [apps-discuss] URI parsing tests: userinfo handling
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Jan 2015 14:40:26 -0000

On 01/02/2015 08:44 AM, Julian Reschke wrote:
> On 2015-01-02 14:31, Sam Ruby wrote:
>>
>> On 01/02/2015 07:19 AM, Julian Reschke wrote:
>>> On 2015-01-02 11:43, Sam Ruby wrote:
>>>> On 01/02/2015 05:01 AM, Julian Reschke wrote:
>>>>> On 2015-01-01 20:08, Sam Ruby wrote:
>>>>>  > ...
>>>>>  > I have evidence that RFC 3986 doesn't match a variety of user agent
>>>>>> behavior.  Agents that aren't limited to browsers, but also to
>>>>>> libraries
>>>>>> that are used by what you would consider "middleware".
>>>>>>
>>>>>> Here is a filtered list of test results that only considers RFC 3986
>>>>>> valid URI references as inputs:
>>>>>>
>>>>>> https://url.spec.whatwg.org/interop/test-results/?filter=valid
>>>>>> ...
>>>>>
>>>>>
>>>>> So I'm now looking at the first valid entry with UA differences:
>>>>>
>>>>> <https://url.spec.whatwg.org/interop/test-results/19b44e58a2?filter=valid>
>>>>>
>>>>> which tests:
>>>>>
>>>>>    http://user:pass@foo:21/bar;par?b#c
>>>>>
>>>>> This one is valid per the RFC 3986 ABNF and it contains a "userinfo"
>>>>> component.
>>>>>
>>>>> RFC 3986:
>>>>>
>>>>> "...Use of the format "user:password" in the userinfo field is
>>>>> deprecated. Applications should not render as clear text any data
>>>>> after
>>>>> the first colon (":") character found within a userinfo subcomponent
>>>>> unless the data after the colon is the empty string (indicating no
>>>>> password). Applications may choose to ignore or reject such data
>>>>> when it
>>>>> is received as part of a reference and should reject the storage of
>>>>> such
>>>>> data in unencrypted form. The passing of authentication information in
>>>>> clear text has proven to be a security risk in almost every case where
>>>>> it has been used...." --
>>>>> <http://greenbytes.de/tech/webdav/rfc3986.html#rfc.section.3.2.1.p.1>
>>>>>
>>>>> RFC 7230:
>>>>>
>>>>> "The URI generic syntax for authority also includes a deprecated
>>>>> userinfo subcomponent ([RFC3986], Section 3.2.1) for including user
>>>>> authentication information in the URI. Some implementations make
>>>>> use of
>>>>> the userinfo component for internal configuration of authentication
>>>>> information, such as within command invocation options, configuration
>>>>> files, or bookmark lists, even though such usage might expose a user
>>>>> identifier or password. A sender MUST NOT generate the userinfo
>>>>> subcomponent (and its "@" delimiter) when an "http" URI reference is
>>>>> generated within a message as a request target or header field value.
>>>>> Before making use of an "http" URI reference received from an
>>>>> untrusted
>>>>> source, a recipient SHOULD parse for userinfo and treat its
>>>>> presence as
>>>>> an error; it is likely being used to obscure the authority for the
>>>>> sake
>>>>> of phishing attacks." --
>>>>> <http://greenbytes.de/tech/webdav/rfc7230.html#rfc.section.2.7.1.p.8>
>>>>>
>>>>> Looking at the test results suggests that
>>>>>
>>>>> a) only the Python implementation is really broken (it probably
>>>>> follows
>>>>> RFC 2396 which treated params as a separate component; Firefox had a
>>>>> similar problem until a few years ago; but that was fixed by me)
>>>>
>>>> FWIW, there was a problem with my script that captured test results, I
>>>> wasn't capturing username and password, and this is now fixed:
>>>>
>>>> https://github.com/webspecs/url/commit/72b286722209771b4a18a53a3493c20e88d95736
>>>>
>>>> But that is likely separate from what you are commenting on, namely the
>>>> problem with an incomplete pathname.
>>>> ...
>>>
>>> FWIW, it could also be an incorrect use of Python's urllib.parse
>>> (<https://docs.python.org/3.0/library/urllib.parse.html>) which is
>>> documented to extract the last path segment's parameter as a separate
>>> component. So test code will need to properly reconstruct the path with
>>> that information.
>>
>> I'm inclined to think otherwise as Python's implementation treats the
>> ';' as a delimiter and not a part of the path.  As an example, the
>> following two calls produce identical results:
>>
>> urlparse('http://foo/a/b')
>> urlparse('http://foo/a/b;')
>
> Actually I was wrong; special handling of parameters dates back to RFC
> 1808 and was removed in RFC 2396, see
> <https://tools.ietf.org/html/rfc1808#section-2.1>:
>
>> <scheme>://<net_loc>/<path>;<params>?<query>#<fragment>
>
> So this is what these implementations try to do.
>
> To reconstruct the path, given that API, you'll need to re-append the
> params. And yes, the problem with Python's implementation probably is
> that you can't distinguish an empty params component from absence of
> that component. This should be raised as a bug report against the Python
> libraries.
>
> (My point being that this -- when properly used -- affects only the edge
> case of a path ending in a ";")

Given that it isn't documented as such, I would consider that to be more 
of a workaround than proper use, but it isn't worth arguing over:

https://github.com/webspecs/url/commit/020175339c39166201f2a30109ca0b48371b455c

Thanks for opening the issue against Python!

> Best regards, Julian

- Sam Ruby