Re: Interpreting "+" in query component of https:// scheme URIs.

Matthew Kerwin <matthew@kerwin.net.au> Wed, 14 September 2016 10:27 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1479612B278 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 14 Sep 2016 03:27:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.427
X-Spam-Level:
X-Spam-Status: No, score=-8.427 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.001, RP_MATCHES_RCVD=-1.508, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hAkDzzsSmIXD for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Wed, 14 Sep 2016 03:27:46 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DCBCC12B292 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Wed, 14 Sep 2016 03:27:41 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1bk7Jd-0003Lh-MX for ietf-http-wg-dist@listhub.w3.org; Wed, 14 Sep 2016 10:21:17 +0000
Resent-Date: Wed, 14 Sep 2016 10:21:17 +0000
Resent-Message-Id: <E1bk7Jd-0003Lh-MX@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <phluid61@gmail.com>) id 1bk7JR-0003Eb-Bu for ietf-http-wg@listhub.w3.org; Wed, 14 Sep 2016 10:21:05 +0000
Received: from mail-it0-f52.google.com ([209.85.214.52]) by maggie.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <phluid61@gmail.com>) id 1bk7J7-0001Ha-VZ for ietf-http-wg@w3.org; Wed, 14 Sep 2016 10:21:00 +0000
Received: by mail-it0-f52.google.com with SMTP id o3so36842778ita.1 for <ietf-http-wg@w3.org>; Wed, 14 Sep 2016 03:20:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc; bh=U8aOD/uqeNtYbA14MGOuNyWFwewlP2tRjSDu34+13ZU=; b=UfgOk28EEg1mQg5AmEdPcr9+p6y37TgDpygtRA7873idNQjbgnzoJBvfRL+qigv5hM BUNdJwzo08eDFIcR2rywtvErlx8TwXbnScz4FiwAVDNsBRVRGGQj2ePjCr81NCPSUC3A 0TVnrHMHDp2yAFOnbM3EsrMPKBfxwt33res30KC1pGktMXuhxcZgc7XsbvGgkhQF0feH U/6i2cvh2+ypKnkkcW/YZfWZ+dztkju+PAg5GXPSO0cjI2Omk/8uzZx3qifOxkv8lJzq 0PvENeiq3xO04xsTfSKju/4Se718l2fKzuynOzwAs8eNAJOmBG+LFttiwEGp3xix05Gf 2CKA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to:cc; bh=U8aOD/uqeNtYbA14MGOuNyWFwewlP2tRjSDu34+13ZU=; b=CoVvYUxlLLTNIwtY9yaB2pJGd5EOcyEH0X8W9u5Oqv1dZv7Gav67BLM4MnPjtOMcIq +3n8QdF3yxQsQ+2Fy/U3pAk7LF2awrZQM28jpXhCav49q8ofmgU3Jxpfhr3oLt0kCXe4 EzyPG//I/TADOPUPT0oBss+V8Yh04WLeEtXQHeJTmDQteSJ9MhoaFQU3FKEEN4T9x08M ZtI7nGPzfTc9PwTjnnp9UF+T+8Q9O/w6W/7+seuKaZSyziF8V6hVGUhnJ/q5zDSfXJXH hL/tEjWYJzIlsTsDhcmCiMKP2TgwxtVtcTFSAIm89Ve1VTIpBvr75Gyn0UX+Ad+FGO10 U/Yg==
X-Gm-Message-State: AE9vXwPxw1QQ80FGIpmFV5K6RFnTNCqZkU+v3Vuo+V07wL/d3PPTHJb7Bge61yYLPiVXKnnkTOJW+dKD/l3K+A==
X-Received: by 10.107.185.3 with SMTP id j3mr4617372iof.3.1473847938343; Wed, 14 Sep 2016 03:12:18 -0700 (PDT)
MIME-Version: 1.0
Sender: phluid61@gmail.com
Received: by 10.107.158.207 with HTTP; Wed, 14 Sep 2016 03:12:17 -0700 (PDT)
In-Reply-To: <CANDH0ys6cLwGhRF+9DOTohczajCCR6LWppFpU6Ctr0DBJvZPhA@mail.gmail.com>
References: <CANDH0ys6cLwGhRF+9DOTohczajCCR6LWppFpU6Ctr0DBJvZPhA@mail.gmail.com>
From: Matthew Kerwin <matthew@kerwin.net.au>
Date: Wed, 14 Sep 2016 20:12:17 +1000
X-Google-Sender-Auth: M6FeSAjan9JLCnlfg6dSMy5hjbs
Message-ID: <CACweHNB15af2NUr4kNCqzsdxE+B8CAyhy7g10_GvdBpd-pjVWw@mail.gmail.com>
To: Matt Randall <matthew.a.randall@gmail.com>
Cc: "ietf-http-wg@w3.org" <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="94eb2c071c30d17206053c74f601"
Received-SPF: pass client-ip=209.85.214.52; envelope-from=phluid61@gmail.com; helo=mail-it0-f52.google.com
X-W3C-Hub-Spam-Status: No, score=-4.9
X-W3C-Hub-Spam-Report: AWL=-1.037, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1bk7J7-0001Ha-VZ 143adec99e12344b187f46a780d578e6
X-Original-To: ietf-http-wg@w3.org
Subject: Re: Interpreting "+" in query component of https:// scheme URIs.
Archived-At: <http://www.w3.org/mid/CACweHNB15af2NUr4kNCqzsdxE+B8CAyhy7g10_GvdBpd-pjVWw@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/32393
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

On 10 September 2016 at 08:35, Matt Randall <matthew.a.randall@gmail.com>
wrote:

> Hopefully this is a quick question with a straightforward answer.  The
> https URI scheme (RFC7230) denotes that it simply follows the definition of
> the query component from the base URI RFC (RFC3986).  Query seems to allow
> for all reserved and unreserved characters (with some caveats around "?"
> and "/") in the value, and reserves none of the reserved characters as
> delimiters.
>
> From purely a specifications perspective, my assumption (absent de-facto
> legacy behaviors of certain clients and www-form-urlencoded query string
> behaviors) would be to treat the plus sign literally, just as if I would in
> the path component.  Would this be a correct interpretation given the
> following statement in section 2.2?:
>
> If a reserved character is found in a URI component and
> no delimiting role is known for that character, then it must be
> interpreted as representing the data octet corresponding to that
> character's encoding in US-ASCII.
>
> I couldn't find anything in the current specifications that would indicate that "+" has a
> defined delimiting role for the https:// URI scheme.
>
> Thank you in advance,
>
> Matt Randall
>
>
> ​From [1]:

   ... other
   subcomponents may be defined by a URI scheme's specification, or *by*
*   the implementation-specific syntax of a URI's dereferencing*
*   algorithm*, provided that such subcomponents are delimited by
   characters in the reserved set allowed within that component.

The plus sign is used in application/x-www-form-urlencoded data[2][3],
which -- by design -- can be used directly in the query component of a
URI.  So if your application follows the HTML specs, it falls under the
implementation-specific category, so "+" is treated according to its
reserved sub-delim status, and so is different from, say, "%2B".

And if you don't care about HTML, then yeah, it's just a plus sign.

It also depends what you're doing; if you're writing a HTTP middleware then
sure, ignore the plus sign (the higher-level application will deal with
it.) If you're writing a cache, then you have choices to make.

Unless I've misunderstood something.

Cheers

[1]: https://tools.ietf.org/html/rfc3986#section-2.2
[2]: https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1
[3]: https://www.w3.org/TR/html5/forms.html#url-encoded-form-data
-- 
  Matthew Kerwin
  http://matthew.kerwin.net.au/