Re: [apps-discuss] I-D Action: draft-ietf-appsawg-mime-default-charset-01.txt

Julian Reschke <julian.reschke@gmx.de> Wed, 18 April 2012 16:54 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D78A121F85C4 for <apps-discuss@ietfa.amsl.com>; Wed, 18 Apr 2012 09:54:44 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.289
X-Spam-Level:
X-Spam-Status: No, score=-103.289 tagged_above=-999 required=5 tests=[AWL=-0.690, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1bnb8j1JboBN for <apps-discuss@ietfa.amsl.com>; Wed, 18 Apr 2012 09:54:44 -0700 (PDT)
Received: from mailout-de.gmx.net (mailout-de.gmx.net [213.165.64.23]) by ietfa.amsl.com (Postfix) with SMTP id CDC9821F85C0 for <discuss@apps.ietf.org>; Wed, 18 Apr 2012 09:54:42 -0700 (PDT)
Received: (qmail invoked by alias); 18 Apr 2012 16:54:41 -0000
Received: from mail.greenbytes.de (EHLO [192.168.1.140]) [217.91.35.233] by mail.gmx.net (mp019) with SMTP; 18 Apr 2012 18:54:41 +0200
X-Authenticated: #1915285
X-Provags-ID: V01U2FsdGVkX1+32wxjpJy3c50stayacViJjKId3L3mwkI7LUQG9i 7P3KUtI4Gba+6H
Message-ID: <4F8EF1D0.50001@gmx.de>
Date: Wed, 18 Apr 2012 18:54:40 +0200
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:11.0) Gecko/20120327 Thunderbird/11.0.1
MIME-Version: 1.0
To: Bill McQuillan <McQuilWP@pobox.com>
References: <20120330125228.15497.35035.idtracker@ietfa.amsl.com> <1271382236.20120330141948@pobox.com>
In-Reply-To: <1271382236.20120330141948@pobox.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Y-GMX-Trusted: 0
Cc: Apps-Discusssion <discuss@apps.ietf.org>
Subject: Re: [apps-discuss] I-D Action: draft-ietf-appsawg-mime-default-charset-01.txt
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 18 Apr 2012 16:54:45 -0000

On 2012-03-30 23:19, Bill McQuillan wrote:
> In section 3:
>
> ----------
>     In order to improve interoperability with deployed agents, "text/*"
>     media type registrations SHOULD either
>
>     a.  specify that the "charset" parameter is not used for the defined
>         subtype, because the charset information is transported inside
>         the payload (such as in "text/xml"), or
>     b.  require explicit unconditional inclusion of the "charset"
>         parameter eliminating the need for a default value.
>
>     In accordance with option (a), above, registrations for "text/*"
>     media types that can transport charset information inside the
>     corresponding payloads (such as "text/html" and "text/xml") SHOULD
>     NOT specify the use of a "charset" parameter, nor any default value,
>     in order to avoid conflicting interpretations should the charset
>     parameter value and the value specified in the payload disagree.
> ----------
>
> Doesn't option (a) actually mean that a new default charset is
> now defined, perhaps called "embedded-ascii", in which all octets
> with values less than 128 must have the same meaning as the
> correspondding ASCII values and that all octet values greater
> than 127 may be ignored? This would allow naively processing a
> newly specified text/* type by displaying the content first using
> the "embedded-ascii" charset (ignoring non-ascii octets) and,
> hopefully, finding, by eye, the actual charset specified within
> and then re-displaying the content using that discovered charset.
>
> For instance how would a newly specified type similar to
> text/html with a document using the internal charset of "ebcdic"
> be handled? The current specification would deal with this merely
> by ensuring that a "charset=ebcdic" appeared in the Content-Type
> Mime field and also within the document itself.

I'm not sure I understand the question.

Types that transport charset information in-line will need to define how 
to detect it. An example would be the algorithm in

   http://www.w3.org/TR/xml/#sec-guessing

And yes, that works best if the encoding is compatible to US-ASCII (that 
is, octets 0..127 represent the same characters as in the US-ASCII 
encoding).

Do you think there's something we need to clarify here?

Best regards, Julian