Re: [apps-discuss] I-D Action: draft-ietf-appsawg-mime-default-charset-01.txt

Bill McQuillan <McQuilWP@pobox.com> Thu, 19 April 2012 07:59 UTC

Return-Path: <McQuilWP@pobox.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8B3FE21F853C for <apps-discuss@ietfa.amsl.com>; Thu, 19 Apr 2012 00:59:33 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CEcXASyMGYPv for <apps-discuss@ietfa.amsl.com>; Thu, 19 Apr 2012 00:59:29 -0700 (PDT)
Received: from smtp.pobox.com (b-pb-sasl-quonix.pobox.com [208.72.237.35]) by ietfa.amsl.com (Postfix) with ESMTP id 591C521F8532 for <discuss@apps.ietf.org>; Thu, 19 Apr 2012 00:59:27 -0700 (PDT)
Received: from smtp.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 373BA59BA; Thu, 19 Apr 2012 03:59:26 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date:from :message-id:to:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; s=sasl; bh=3EwseO+Dxsaw MMXrfgVfC2uxHj4=; b=qRKcqy+p2GBvpDlBI3VBaRX4Ufla5oGN79ANmdufayUc sRARYbOGAbfzYY8rfk1PB24vT5PTYk60i2LBK+87voEzH2+WZT1UgaLO+7Vogi4f 0qFnXsKpLSwKPrCae9YRYRDOWIGFcShn3QRGK/BPp10csvhpmUBaUaUfEgHKBLg=
DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:from :message-id:to:subject:in-reply-to:references:mime-version :content-type:content-transfer-encoding; q=dns; s=sasl; b=k+evrD 8yybSQdXHh5KC8S676qWACbcUUab8f/83niUv19em4xYtNO6pbg1qr3Zt7AV2gSr xz0vCrludYLKlNYPc4lr+5zGrIzkH+IOaqmS9mSpwUApSdBQ8Buc/DSfvxyOZXiR +moreJT5bw/IVmIc5sLK6hC3GGjjTEPCfIVFI=
Received: from b-pb-sasl-quonix.pobox.com (unknown [127.0.0.1]) by b-sasl-quonix.pobox.com (Postfix) with ESMTP id 27E5259B9; Thu, 19 Apr 2012 03:59:26 -0400 (EDT)
Received: from [192.168.0.3] (unknown [68.107.110.211]) by b-sasl-quonix.pobox.com (Postfix) with ESMTPA id 670A159B8; Thu, 19 Apr 2012 03:59:25 -0400 (EDT)
Date: Thu, 19 Apr 2012 00:59:23 -0700
From: Bill McQuillan <McQuilWP@pobox.com>
X-Priority: 3 (Normal)
Message-ID: <427958429.20120419005923@pobox.com>
To: Apps-Discusssion <discuss@apps.ietf.org>
In-Reply-To: <4F8EF1D0.50001@gmx.de>
References: <20120330125228.15497.35035.idtracker@ietfa.amsl.com> <1271382236.20120330141948@pobox.com> <4F8EF1D0.50001@gmx.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Pobox-Relay-ID: 94EEF164-89F5-11E1-9A7F-9DB42E706CDE-02871704!b-pb-sasl-quonix.pobox.com
Subject: Re: [apps-discuss] I-D Action: draft-ietf-appsawg-mime-default-charset-01.txt
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Apr 2012 07:59:33 -0000

On Wed, 2012-04-18, Julian Reschke wrote:
> On 2012-03-30 23:19, Bill McQuillan wrote:
>> In section 3:
>>
>> ----------
>>     In order to improve interoperability with deployed agents, "text/*"
>>     media type registrations SHOULD either
>>
>>     a.  specify that the "charset" parameter is not used for the defined
>>         subtype, because the charset information is transported inside
>>         the payload (such as in "text/xml"), or
>>     b.  require explicit unconditional inclusion of the "charset"
>>         parameter eliminating the need for a default value.
>>
>>     In accordance with option (a), above, registrations for "text/*"
>>     media types that can transport charset information inside the
>>     corresponding payloads (such as "text/html" and "text/xml") SHOULD
>>     NOT specify the use of a "charset" parameter, nor any default value,
>>     in order to avoid conflicting interpretations should the charset
>>     parameter value and the value specified in the payload disagree.
>> ----------
>>
>> Doesn't option (a) actually mean that a new default charset is
>> now defined, perhaps called "embedded-ascii", in which all octets
>> with values less than 128 must have the same meaning as the
>> correspondding ASCII values and that all octet values greater
>> than 127 may be ignored? This would allow naively processing a
>> newly specified text/* type by displaying the content first using
>> the "embedded-ascii" charset (ignoring non-ascii octets) and,
>> hopefully, finding, by eye, the actual charset specified within
>> and then re-displaying the content using that discovered charset.
>>
>> For instance how would a newly specified type similar to
>> text/html with a document using the internal charset of "ebcdic"
>> be handled? The current specification would deal with this merely
>> by ensuring that a "charset=ebcdic" appeared in the Content-Type
>> Mime field and also within the document itself.

> I'm not sure I understand the question.

> Types that transport charset information in-line will need to define how
> to detect it. An example would be the algorithm in

>    http://www.w3.org/TR/xml/#sec-guessing

> And yes, that works best if the encoding is compatible to US-ASCII (that
> is, octets 0..127 represent the same characters as in the US-ASCII
> encoding).

> Do you think there's something we need to clarify here?

As I understand it, the reason, originally, for having a major
type of "text" is that if the minor type is unknown to the end
user, it can be treated as "text/plain" and examined by a simple
text editor to perhaps discover the content or a hint as to the
appropriate application to process it. If the proposal here
doesn't have this characteristic, it shouldn't be called "text",
IMHO. It should be labeled "application/octet-stream".

Now the most straight-forward way to accomplish this capability
seems to be that one could examine the content with a text
processor that displays octets below 128 as ASCII and, at
least, does not self-destruct when it encounters octets above 127.

This is different from the previous guarantee of only 7-bit ASCII
as the default, but should be handled by many modern text
handling programs. Thus my conclusion of the defacto default
charset "embedded-ascii" alluded to above.

-- 
Bill McQuillan <McQuilWP@pobox.com>