[apps-discuss] Feedback about "Update to MIME regarding Charset Parameter Handling in Textual Media Types"
Henri Sivonen <hsivonen@iki.fi> Wed, 22 February 2012 13:25 UTC
Return-Path: <hsivonen@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EA00021F879B for <apps-discuss@ietfa.amsl.com>; Wed, 22 Feb 2012 05:25:12 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.977
X-Spam-Level:
X-Spam-Status: No, score=-2.977 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0bZ5bavtPYEu for <apps-discuss@ietfa.amsl.com>; Wed, 22 Feb 2012 05:25:08 -0800 (PST)
Received: from mail-yw0-f44.google.com (mail-yw0-f44.google.com [209.85.213.44]) by ietfa.amsl.com (Postfix) with ESMTP id DDDB121F864C for <apps-discuss@ietf.org>; Wed, 22 Feb 2012 05:25:07 -0800 (PST)
Received: by yhkk25 with SMTP id k25so13155yhk.31 for <apps-discuss@ietf.org>; Wed, 22 Feb 2012 05:25:07 -0800 (PST)
Received-SPF: pass (google.com: domain of hsivonen@gmail.com designates 10.236.161.232 as permitted sender) client-ip=10.236.161.232;
Authentication-Results: mr.google.com; spf=pass (google.com: domain of hsivonen@gmail.com designates 10.236.161.232 as permitted sender) smtp.mail=hsivonen@gmail.com; dkim=pass header.i=hsivonen@gmail.com
Received: from mr.google.com ([10.236.161.232]) by 10.236.161.232 with SMTP id w68mr42229558yhk.56.1329917107620 (num_hops = 1); Wed, 22 Feb 2012 05:25:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=WFASOV7LhhwZoaEhwSKborB43ZzImP6bO9FWqxLPsnw=; b=xmB6yI7CD9jfCuuNqXzniOPS0NNH0VNOLkXFzEniX2P10JoN/STsFFO7eixeuSRFxb AGWlF8MrvrxGpM5KXOv6fU20BDHsrFvrNhQyqIQWAc4Gy/o1f+3i9dvIEprExhR8snep z8ORcr2Oj0raYhvW8k1bS3dwm7UZIcs9mg3BQ=
MIME-Version: 1.0
Received: by 10.236.161.232 with SMTP id w68mr32951524yhk.56.1329917102032; Wed, 22 Feb 2012 05:25:02 -0800 (PST)
Sender: hsivonen@gmail.com
Received: by 10.101.170.17 with HTTP; Wed, 22 Feb 2012 05:25:02 -0800 (PST)
Date: Wed, 22 Feb 2012 15:25:02 +0200
X-Google-Sender-Auth: 9_1CRRtiXa-kDpUrhRzz36VRe9A
Message-ID: <CAJQvAudekOKa2mzas-igD_6pa2je000Darin2HDNda-sk9TLCQ@mail.gmail.com>
From: Henri Sivonen <hsivonen@iki.fi>
To: apps-discuss@ietf.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Cc: Anne van Kesteren <annevk@opera.com>
Subject: [apps-discuss] Feedback about "Update to MIME regarding Charset Parameter Handling in Textual Media Types"
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 Feb 2012 13:27:26 -0000
In reference to https://svn.tools.ietf.org/svn/wg/appsawg/draft-ietf-appsawg-mime-default-charset/latest/draft-ietf-appsawg-mime-default-charset.html First of all, thank you for finally taking on the much-needed update to RFC 2046 rules. Unfortunately, the draft doesn't address the problem the right way in my opinion. Quotes from the draft. > Each subtype of the "text" media type which uses the "charset" parameter can define its own default value for the "charset" parameter, including absence of any default. Additionally, media types should be able to define circumstances where in-band indicators override the charset parameter even if the charset parameter is present. In particular, media types should be allowed to override the charset parameter if the first two or three bytes of the payload look like an UTF-16 or UTF-8 BOM. See: https://www.w3.org/Bugs/Public/show_bug.cgi?id=15359 https://bugzilla.mozilla.org/show_bug.cgi?id=716579 https://bugzilla.mozilla.org/show_bug.cgi?id=687859 > In order to improve interoperability with deployed agents, "text/*" media type definitions SHOULD either a) specify that the "charset" parameter is not used for the defined subtype, because the charset information is transported inside the payload (as in "text/xml") This seems wrong. If the charset parameter is present, it has an effect for text/xml. > or b) require explicit unconditional inclusion of the "charset" parameter eliminating the need for a default value. This seems naïve. Formats need to specify what happens when a charset parameter is missing, since no matter how much the format says it's "required", the party sending data can omit the charset parameter. > In accordance with option (a), above, "text/*" media types that can transport charset information inside the corresponding payloads, specifically including "text/html" and "text/xml", SHOULD NOT specify the use of a "charset" parameter, nor any default value, in order to avoid conflicting interpretations should the charset parameter value and the value specified in the payload disagree. For backwards compatibility, pretty much every existing text/* type will have to violate this "SHOULD NOT". > New subtypes of the "text" media type, thus, SHOULD NOT define a default "charset" value. If there is a strong reason to do so despite this advice, they SHOULD use the "UTF-8" [RFC3629] charset as the default. Seems reasonable. > Specifications of how to specify the "charset" parameter, and what default value, if any, is used, are subtype-specific, NOT protocol-specific. Seems reasonable. > Protocols that use MIME, therefore, MUST NOT override default charset values for "text/*" media types to be different for their specific protocol. The protocol definitions MUST leave that to the subtype definitions. Seems reasonable. > The default charset parameter value for text/plain is unchanged from [RFC2046] and remains as "US-ASCII". This is incompatible with reality. Web browsers, for instance, assume a configuration-dependent default (which correlates with browser localization) and may also (depending on configuration which, again, correlates with localization by default) perform a heuristic analysis on the payload. I suggest specifying the following instead of sections 3 and 4 of the draft: 3. New rules for determining the character encoding for text/* media types Each text/* media type MUST specify an algorithm for establishing the character encoding of the entity body from the entity body (or preferably the first N bytes thereof, preferably with N = 1024), the charset parameter and Other Information. Other Information MAY include configuration, an encoding label supplied by the referrer, the previous encoding of an entity body retrieved from the same location or the encoding of the referrer. New text/* media types MUST not use Other Information in the algorithms they specify. New text/* media types SHOULD use the following algorithm: The character encoding is UTF-8. Terminate these steps. 4. Determining the character encoding for text/plain If the first 2 octets of the entity body are 0xFE followed by 0xFF, the character encoding is big-endian UTF-16. Terminate these steps. If the first 2 octets of the entity body are 0xFF followed by 0xFE, the character encoding is little-endian UTF-16. Terminate these steps. If the first 3 octets of the entity body are 0xEF followed by 0xBB followed by 0xBF, the character encoding is UTF-8. Terminate these steps. If the value of the charset parameter is a ASCII case-insensitive[1] match for a label[2] of a supported encoding, the character encoding is the encoding whose label was matched. Terminate these steps. If the entity is being navigate to in a browsing context[3] and the previous document had the same origin[4] as the text/plain entity, the character encoding is the encoding of the referring document. Terminate these steps. (Disclaimer: I'm not 100% sure that this step is in the right order relative to the others.) Optional: If a heuristic detector recognizes the octets of the entity body as being encoding according to an encoding, the character encoding is that encoding. Terminate these steps. This step SHOULD NOT be implemented for locales where it has not been implemented traditionally. If the entity is being loaded into a nested browsing context that has the same origin as the parent browsing context, the encoding is the encoding of the document loaded in the parent browsing context. Terminate these steps. If the entity is being loaded into a browsing context and is being fetched from a location from which an entity has been loaded before and the previous character encoding has been cached, the character encoding is the cached encoding. Terminate these steps. If the entity is being loaded via a non-browsing context mechanism (such as XMLHttpRequest) that defines a fallback encoding, use that encoding. Terminate these steps. Otherwise, the character encoding is a configuration-dependent encoding. The default configuration SHOULD depend on the locale of the user agent according to the table given in step 8 in [5]. Terminate these steps. [1] http://www.whatwg.org/specs/web-apps/current-work/#ascii-case-insensitive [2] http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#concept-encoding-label [3] http://www.whatwg.org/specs/web-apps/current-work/#browsing-context [4] http://tools.ietf.org/html/rfc6454 [5] http://www.whatwg.org/specs/web-apps/current-work/#determining-the-character-encoding -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
- [apps-discuss] Feedback about "Update to MIME reg… Henri Sivonen
- Re: [apps-discuss] Feedback about "Update to MIME… Martin J. Dürst
- Re: [apps-discuss] Feedback about "Update to MIME… Ned Freed
- Re: [apps-discuss] Feedback about "Update to MIME… Henri Sivonen
- Re: [apps-discuss] Feedback about "Update to MIME… Henri Sivonen
- Re: [apps-discuss] Feedback about "Update to MIME… Ned Freed
- Re: [apps-discuss] Feedback about "Update to MIME… Julian Reschke
- Re: [apps-discuss] Feedback about "Update to MIME… Ned Freed
- Re: [apps-discuss] Feedback about "Update to MIME… Martin J. Dürst
- Re: [apps-discuss] Feedback about "Update to MIME… Julian Reschke
- Re: [apps-discuss] Feedback about "Update to MIME… Ned Freed
- Re: [apps-discuss] Feedback about "Update to MIME… Alexey Melnikov
- Re: [apps-discuss] Feedback about "Update to MIME… Julian Reschke
- Re: [apps-discuss] Feedback about "Update to MIME… Ned Freed
- Re: [apps-discuss] Feedback about "Update to MIME… Henri Sivonen
- Re: [apps-discuss] Feedback about "Update to MIME… Henri Sivonen