Re: [ietf-types] The application/www-form-urlencoded format

Bjoern Hoehrmann <derhoermi@gmx.net> Sun, 26 September 2010 20:19 UTC

Return-Path: <derhoermi@gmx.net>
X-Original-To: ietf-types@core3.amsl.com
Delivered-To: ietf-types@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 7C23F3A6B82 for <ietf-types@core3.amsl.com>; Sun, 26 Sep 2010 13:19:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.615
X-Spam-Level:
X-Spam-Status: No, score=-2.615 tagged_above=-999 required=5 tests=[AWL=-0.616, BAYES_00=-2.599, J_CHICKENPOX_14=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ayKl5uxrkycL for <ietf-types@core3.amsl.com>; Sun, 26 Sep 2010 13:19:43 -0700 (PDT)
Received: from pechora3.lax.icann.org (pechora3.icann.org [208.77.188.38]) by core3.amsl.com (Postfix) with ESMTP id 2EC743A69D6 for <ietf-types@ietf.org>; Sun, 26 Sep 2010 13:19:42 -0700 (PDT)
Received: from mail.gmx.net (mailout-de.gmx.net [213.165.64.23]) by pechora3.lax.icann.org (8.13.8/8.13.8) with SMTP id o8QKJhL5019269 for <ietf-types@iana.org>; Sun, 26 Sep 2010 13:20:04 -0700
Received: (qmail invoked by alias); 26 Sep 2010 20:19:41 -0000
Received: from dslb-094-223-218-126.pools.arcor-ip.net (EHLO hive) [94.223.218.126] by mail.gmx.net (mp019) with SMTP; 26 Sep 2010 22:19:41 +0200
X-Authenticated: #723575
X-Provags-ID: V01U2FsdGVkX19L858fIEJLAyj8bBW2T6qnLAI3K/GTgqkBDg5/E0 DD8cJJ7ndENLrL
From: Bjoern Hoehrmann <derhoermi@gmx.net>
To: Anne van Kesteren <annevk@opera.com>
Date: Sun, 26 Sep 2010 22:19:40 +0200
Message-ID: <679v96hgl3jqatro4epeqlneqoms020uhe@hive.bjoern.hoehrmann.de>
References: <k1os96p03o78p78490hei104biadpiepit@hive.bjoern.hoehrmann.de> <op.vjmuz10364w2qv@anne-van-kesterens-macbook-pro.local>
In-Reply-To: <op.vjmuz10364w2qv@anne-van-kesterens-macbook-pro.local>
X-Mailer: Forte Agent 3.3/32.846
MIME-Version: 1.0
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Y-GMX-Trusted: 0
X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.0 (pechora3.lax.icann.org [208.77.188.38]); Sun, 26 Sep 2010 13:20:04 -0700 (PDT)
Cc: ietf-types@iana.org
Subject: Re: [ietf-types] The application/www-form-urlencoded format
X-BeenThere: ietf-types@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: "Media \(MIME\) type review" <ietf-types.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ietf-types>, <mailto:ietf-types-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf-types>
List-Post: <mailto:ietf-types@ietf.org>
List-Help: <mailto:ietf-types-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-types>, <mailto:ietf-types-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 26 Sep 2010 20:19:44 -0000

* Anne van Kesteren wrote:
>I think it is unfortunate it still allows encoding in various ways. So  
>while things could be more readable as you pointed out in the past user  
>agents are still allowed to obscure most everything.

I do think that encoder implementers can make reasonable choices about
that. If an implementer decides it's best to escape, say, the zero-width
space character because it's invisible then I see nothing wrong with
that. If another implementer decides to not escape it, that's fine too.

>The bug about + seems to be still be there. Escapes are first decoded and  
>then + is replaced with U+0020. Also application/x-www-form-urlencoded is  
>on its way of being standardized as part of HTML5 now.

I don't think there is anything that "HTML5" could standardize about it,
other than define how form submission using that type should be imple-
mented in HTML implementations. As for the +, the text refers to the ab-
stract syntax tree you get when parsing a string using the grammar, so
an escaped plus sign is no instance of `plus`. I take it that's one of
the text's rough edges I should smooth out.

>>       Note: The media type does not have a 'charset' parameter, it
>>       is incorrect specify one and to associate any significance to
>>       it if specified. The character encoding is always UTF-8. The
>>       Unicode encoding form signature is not supported; a leading
>>       U+FEFF character will be considered part of a <name>.
>
>Most other such formats ignore a leading U+FEFF.

I can't think of any format that's always UTF-8 encoded yet allows for
it, but anyway, treating it as part of a name is simpler than ignoring
it, I think people are more likely to implement that correctly than if
I were to require recognizing it.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/