Re: [OAUTH-WG] Preliminary OAuth Core draft -29

Julian Reschke <julian.reschke@gmx.de> Thu, 12 July 2012 08:31 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: oauth@ietfa.amsl.com
Delivered-To: oauth@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B590221F87A4 for <oauth@ietfa.amsl.com>; Thu, 12 Jul 2012 01:31:35 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -105.167
X-Spam-Level:
X-Spam-Status: No, score=-105.167 tagged_above=-999 required=5 tests=[AWL=-2.568, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mVTfykyApRz1 for <oauth@ietfa.amsl.com>; Thu, 12 Jul 2012 01:31:35 -0700 (PDT)
Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.22]) by ietfa.amsl.com (Postfix) with SMTP id 1491221F87A9 for <oauth@ietf.org>; Thu, 12 Jul 2012 01:31:33 -0700 (PDT)
Received: (qmail invoked by alias); 12 Jul 2012 08:32:06 -0000
Received: from p5DD96972.dip.t-dialin.net (EHLO [192.168.178.36]) [93.217.105.114] by mail.gmx.net (mp037) with SMTP; 12 Jul 2012 10:32:06 +0200
X-Authenticated: #1915285
X-Provags-ID: V01U2FsdGVkX1/h9IdYnTXhRJTOvlrZ6iB7JKy7WAaQweFyoPN7tz Cr/0QIFv3CUg8m
Message-ID: <4FFE8B56.6030306@gmx.de>
Date: Thu, 12 Jul 2012 10:31:18 +0200
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:13.0) Gecko/20120614 Thunderbird/13.0.1
MIME-Version: 1.0
To: Mike Jones <Michael.Jones@microsoft.com>
References: <4E1F6AAD24975D4BA5B16804296739436657C93A@TK5EX14MBXC283.redmond.corp.microsoft.com> <4FFAE2C8.5000109@gmx.de> <4E1F6AAD24975D4BA5B16804296739436657CE30@TK5EX14MBXC283.redmond.corp.microsoft.com> <4FFAF24D.5050805@gmx.de>
In-Reply-To: <4FFAF24D.5050805@gmx.de>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
Cc: "oauth@ietf.org" <oauth@ietf.org>
Subject: Re: [OAUTH-WG] Preliminary OAuth Core draft -29
X-BeenThere: oauth@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: OAUTH WG <oauth.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/oauth>, <mailto:oauth-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/oauth>
List-Post: <mailto:oauth@ietf.org>
List-Help: <mailto:oauth-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/oauth>, <mailto:oauth-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jul 2012 08:31:35 -0000

On 2012-07-09 17:01, Julian Reschke wrote:
> On 2012-07-09 16:48, Mike Jones wrote:
>> HTML5 is not cited because it's a working draft - not an approved
>> standard.  In what way is "the definition of the media type in HTML4
>> is known to be insufficient"?  People have been successfully
>> implementing form-urlencoding with it for quite some time. :-)  Is
>> there a specific wording change that you'd suggest that we make that
>> doesn't involve citing a working draft, rather than an approved standard?
>
> For instance, the HTML4 "definition" doesn't even mention what to do
> with non-ASCII characters.
>
> I understand that it's not particularly attractive, but citing HTML4
> just because it's a "standard" isn't really helpful for people who
> actually follow the link and try to understand what needs to be
> implemented.
> ...

Here's an attempt to describe the encoding in terms of HTML4, plus 
additional instruction. This would need to be referenced anyway where 
the spec currently refers to the HTML4 media type definition:

-- snip --
Appendix X. Use of the application/x-www-form-urlencoded Media Type

At the time of publication of this specification, the 
"application/x-www-form-urlencoded" media type was defined in Section 
17.13.4 of [HTML4], but not registered in the IANA media types registry 
(<http://www.iana.org/assignments/media-types/index.html>). Furthermore, 
the definition is incomplete as it does not consider non-US-ASCII 
characters.

To address this shortcoming, when generating payloads using this media 
type, names and values MUST be encoded using the "UTF-8" character 
encoding scheme ([RFC3629]) first; the resulting octet sequence then 
needs to be further encoded using the escaping rules defined in [HTML4].

When parsing data from a payload using this media type, the names and 
values resulting from reversing the name/value encoding consequently 
need to be treated as octet sequences, to be decoded using the "UTF-8" 
character encoding scheme.

Example: A value consisting of the six Unicode code points (1) U+0020 
(SPACE), (2) U+0025 (PERCENT SIGN), (3) U+0026 (AMPERSAND), (4) U+002B 
(PLUS SIGN), (5) U+00A3 (POUND SIGN), and (6) U+20AC (EURO SIGN) would 
be encoded into the octet sequence below (using hexadecimal notation):

   20 25 26 2B C2 A3 E2 82 AC

and then represented in the payload as:

   +%25%26%2B%C2%A3%E2%82%AC

-- snip --

Best regards, Julian