[apps-discuss] Thoughts on text/* encoding defaults
Julian Reschke <julian.reschke@gmx.de> Mon, 06 June 2011 12:42 UTC
Return-Path: <julian.reschke@gmx.de>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D535711E8135 for <apps-discuss@ietfa.amsl.com>; Mon, 6 Jun 2011 05:42:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kNBnaFT5yAXO for <apps-discuss@ietfa.amsl.com>; Mon, 6 Jun 2011 05:42:18 -0700 (PDT)
Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.23]) by ietfa.amsl.com (Postfix) with SMTP id 949E111E8133 for <apps-discuss@ietf.org>; Mon, 6 Jun 2011 05:42:16 -0700 (PDT)
Received: (qmail invoked by alias); 06 Jun 2011 12:42:15 -0000
Received: from mail.greenbytes.de (EHLO [192.168.1.140]) [217.91.35.233] by mail.gmx.net (mp013) with SMTP; 06 Jun 2011 14:42:15 +0200
X-Authenticated: #1915285
X-Provags-ID: V01U2FsdGVkX1+reiMucm6rVutIQvUt1rNHsyk4h+u7dL9r/qC8sA FMECnLVL+s13xF
Message-ID: <4DECCB27.4030209@gmx.de>
Date: Mon, 06 Jun 2011 14:42:15 +0200
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: IETF Apps Discuss <apps-discuss@ietf.org>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
Subject: [apps-discuss] Thoughts on text/* encoding defaults
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 06 Jun 2011 12:42:18 -0000
Hi there. In Prague, we had a few hallway conversations with respect to the default encoding of text/* media types. Below are my notes (references to the relevant spec sections, information about a recent change in HTTPbis, and a rough proposal about how to proceed). Best regards, Julian -- snip -- 1) RFC 2046 says that the default is US-ASCII "Note that the character set used, if anything other than US- ASCII, must always be explicitly specified in the Content-Type field." -- <http://greenbytes.de/tech/webdav/rfc2046.html#rfc.section.4.1.2.p.18> 2) RFC 2616 says it's ISO-8859-1 "The "charset" parameter is used with some media types to define the character set (Section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See Section 3.4.1 for compatibility problems." -- <http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.3.7.1.p.4> 3) For text/xml, RFC 3023 says it's US-ASCII, no matter what 2616 says :-) "Conformant with [RFC2046], if a text/xml entity is received with the charset parameter omitted, MIME processors and XML processors MUST use the default charset value of "us-ascii"[ASCII]. In cases where the XML MIME entity is transmitted via HTTP, the default charset value is still "us-ascii". (Note: There is an inconsistency between this specification and HTTP/1.1, which uses ISO-8859-1[ISO8859] as the default for a historical reason. Since XML is a new format, a new default should be chosen for better I18N. US-ASCII was chosen, since it is the intersection of UTF-8 and ISO-8859-1 and since it is already used by MIME.)" -- <http://tools.ietf.org/html/rfc3023#section-3.1> The problem Recipients do not implement this; they take the absence of encoding information as indicator to inspect the payload; this is at least true for text/xml and text/html (see <http://www.w3.org/TR/REC-xml/#sec-guessing> and <http://www.w3.org/TR/2011/WD-html5-20110405/parsing.html#determining-the-character-encoding>) Current development: HTTPbis, P3 has dropped drop the default and delegate to the relevant media type definitions (see <http://trac.tools.ietf.org/wg/httpbis/trac/ticket/20>, <http://greenbytes.de/tech/webdav/draft-ietf-httpbis-p3-payload-14.html>). Left to do: a) Revise RFC 2046; allow text/* types that carry encoding information inline to do the expected thing (overriding the US-ASCII default); warn against doing so in new registrations (recommend to only support UTF-8, and require to always explicitly include the charset parameter, such as text/vcard is going to do it?) b) Revise RFC 3023 to delegate text/xml charset defaults to revision of 2046? Best regards, Julian
- [apps-discuss] Thoughts on text/* encoding defaul… Julian Reschke
- Re: [apps-discuss] Thoughts on text/* encoding de… Dzonatas Sol