Re: [apps-discuss] Feedback about "Update to MIME regarding Charset Parameter Handling in Textual Media Types"

"Martin J. Dürst" <duerst@it.aoyama.ac.jp> Thu, 23 February 2012 05:53 UTC

Return-Path: <duerst@it.aoyama.ac.jp>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A800611E8075 for <apps-discuss@ietfa.amsl.com>; Wed, 22 Feb 2012 21:53:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.978
X-Spam-Level:
X-Spam-Status: No, score=-100.978 tagged_above=-999 required=5 tests=[AWL=-1.188, BAYES_00=-2.599, HELO_EQ_JP=1.244, HOST_EQ_JP=1.265, MIME_8BIT_HEADER=0.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pVvdE1OCer7x for <apps-discuss@ietfa.amsl.com>; Wed, 22 Feb 2012 21:53:40 -0800 (PST)
Received: from scintmta02.scbb.aoyama.ac.jp (scintmta02.scbb.aoyama.ac.jp [133.2.253.34]) by ietfa.amsl.com (Postfix) with ESMTP id 4771B21F85D8 for <apps-discuss@ietf.org>; Wed, 22 Feb 2012 21:53:39 -0800 (PST)
Received: from scmse01.scbb.aoyama.ac.jp ([133.2.253.231]) by scintmta02.scbb.aoyama.ac.jp (secret/secret) with SMTP id q1N5rVoZ013936 for <apps-discuss@ietf.org>; Thu, 23 Feb 2012 14:53:31 +0900
Received: from (unknown [133.2.206.133]) by scmse01.scbb.aoyama.ac.jp with smtp id 65f0_80ab_b715ca7c_5de2_11e1_83a3_001d096c566a; Thu, 23 Feb 2012 14:53:30 +0900
Received: from [IPv6:::1] ([133.2.210.1]:34006) by itmail.it.aoyama.ac.jp with [XMail 1.22 ESMTP Server] id <S159EAA2> for <apps-discuss@ietf.org> from <duerst@it.aoyama.ac.jp>; Thu, 23 Feb 2012 14:53:35 +0900
Message-ID: <4F45D452.9010702@it.aoyama.ac.jp>
Date: Thu, 23 Feb 2012 14:53:22 +0900
From: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
Organization: Aoyama Gakuin University
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.9) Gecko/20100722 Eudora/3.0.4
MIME-Version: 1.0
To: Henri Sivonen <hsivonen@iki.fi>
References: <CAJQvAudekOKa2mzas-igD_6pa2je000Darin2HDNda-sk9TLCQ@mail.gmail.com>
In-Reply-To: <CAJQvAudekOKa2mzas-igD_6pa2je000Darin2HDNda-sk9TLCQ@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 8bit
Cc: Anne van Kesteren <annevk@opera.com>, apps-discuss@ietf.org
Subject: Re: [apps-discuss] Feedback about "Update to MIME regarding Charset Parameter Handling in Textual Media Types"
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Feb 2012 05:53:42 -0000

Hello Henri, Alex,

Some comments to Henri and some to Alex inline.

On 2012/02/22 22:25, Henri Sivonen wrote:
> In reference to
> https://svn.tools.ietf.org/svn/wg/appsawg/draft-ietf-appsawg-mime-default-charset/latest/draft-ietf-appsawg-mime-default-charset.html
>
> First of all, thank you for finally taking on the much-needed update
> to RFC 2046 rules.
>
> Unfortunately, the draft doesn't address the problem the right way in
> my opinion. Quotes from the draft.
>
>> Each subtype of the "text" media type which uses the "charset" parameter can define its own default value for the "charset" parameter, including absence of any default.

I'm not opposed to this in principle, but I'm worried that as written, 
it will lead to more and more unnecessarily different 'algorithms' and 
heuristics. The spec should make clear that where possible, established 
patters should be followed to reduce variety.


>> In order to improve interoperability with deployed agents, "text/*" media type definitions SHOULD either a) specify that the "charset" parameter is not used for the defined subtype, because the charset information is transported inside the payload (as in "text/xml")
>
> This seems wrong. If the charset parameter is present, it has an
> effect for text/xml.

Yes indeed.

>> or b) require explicit unconditional inclusion of the "charset" parameter eliminating the need for a default value.
>
> This seems naïve. Formats need to specify what happens when a charset
> parameter is missing, since no matter how much the format says it's
> "required", the party sending data can omit the charset parameter.
>
>> In accordance with option (a), above, "text/*" media types that can transport charset information inside the corresponding payloads, specifically including "text/html" and "text/xml", SHOULD NOT specify the use of a "charset" parameter, nor any default value, in order to avoid conflicting interpretations should the charset parameter value and the value specified in the payload disagree.
>
> For backwards compatibility, pretty much every existing text/* type
> will have to violate this "SHOULD NOT".

Yes. I think it's inappropriate to give such advice on existing formats. 
If we agree that changes are needed (which I think in the cases at hand 
would be somewhat wishful thinking), they should be specified directly, 
either in this document or preferably in an update to the spec of the 
format itself.


>> The default charset parameter value for text/plain is unchanged from [RFC2046] and remains as "US-ASCII".
>
> This is incompatible with reality. Web browsers, for instance, assume
> a configuration-dependent default (which correlates with browser
> localization) and may also (depending on configuration which, again,
> correlates with localization by default) perform a heuristic analysis
> on the payload.
>
> I suggest specifying the following instead of sections 3 and 4 of the draft:

> 4. Determining the character encoding for text/plain

This looks like a very long algorithm. It may work pretty well in a Web 
context (definitely better than an US-ASCII default), but what about 
other contexts (e.g. mail)?


[Part of the algorithm cut out to save electrons.]

> If the entity is being loaded into a browsing context and is being
> fetched from a location from which an entity has been loaded before
> and the previous character encoding has been cached, the character
> encoding is the cached encoding. Terminate these steps.
>
> If the entity is being loaded via a non-browsing context mechanism
> (such as XMLHttpRequest) that defines a fallback encoding, use that
> encoding. Terminate these steps.
>
> Otherwise, the character encoding is a configuration-dependent
> encoding. The default configuration SHOULD depend on the locale of the
> user agent according to the table given in step 8 in [5]. Terminate
> these steps.

What is missing here is that there is sometimes a need for user 
intervention if all else fails. While it may be possible to look at this 
as something outside of this algorithm, or as something subsumed under 
"configuration" or whatever, I'm afraid that the current text will give 
many implementers the impression that user overrides are not allowed. So 
I suggest adding something like:

    In an interactive context, the user may occasionally want to
    override the character encoding determined by this algorithm.


Regards,    Martin.

>
> [1] http://www.whatwg.org/specs/web-apps/current-work/#ascii-case-insensitive
> [2] http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#concept-encoding-label
> [3] http://www.whatwg.org/specs/web-apps/current-work/#browsing-context
> [4] http://tools.ietf.org/html/rfc6454
> [5] http://www.whatwg.org/specs/web-apps/current-work/#determining-the-character-encoding
>