Re: [apps-discuss] Feedback about "Update to MIME regarding Charset Parameter Handling in Textual Media Types"

Henri Sivonen <hsivonen@iki.fi> Thu, 23 February 2012 09:51 UTC

Return-Path: <hsivonen@gmail.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id EF10621F861B for <apps-discuss@ietfa.amsl.com>; Thu, 23 Feb 2012 01:51:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.977
X-Spam-Level:
X-Spam-Status: No, score=-2.977 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xg8frY4i5I3V for <apps-discuss@ietfa.amsl.com>; Thu, 23 Feb 2012 01:51:36 -0800 (PST)
Received: from mail-gx0-f172.google.com (mail-gx0-f172.google.com [209.85.161.172]) by ietfa.amsl.com (Postfix) with ESMTP id BADDE21F85D4 for <apps-discuss@ietf.org>; Thu, 23 Feb 2012 01:51:36 -0800 (PST)
Received: by ggnq2 with SMTP id q2so538186ggn.31 for <apps-discuss@ietf.org>; Thu, 23 Feb 2012 01:51:36 -0800 (PST)
Received-SPF: pass (google.com: domain of hsivonen@gmail.com designates 10.236.161.232 as permitted sender) client-ip=10.236.161.232;
Authentication-Results: mr.google.com; spf=pass (google.com: domain of hsivonen@gmail.com designates 10.236.161.232 as permitted sender) smtp.mail=hsivonen@gmail.com; dkim=pass header.i=hsivonen@gmail.com
Received: from mr.google.com ([10.236.161.232]) by 10.236.161.232 with SMTP id w68mr971051yhk.56.1329990696281 (num_hops = 1); Thu, 23 Feb 2012 01:51:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=cXO5JEuKEeQxmmDrrWtiz5vfsblFQR5H1BjwVQwjAnU=; b=J2QPip5C46cyeD5moGteIX5GteJXdqeCA+/fek5GCWXh6EJlUMDE8+xLF4aBr3THGV LLFl0r/qlXr6I+EhhxxsUGEXQgjtBV0LocSlKUrH1i8AJhwWXKXrFOjD8tY+j8lFZiE6 7Wa1aC6UNAZBqD/3cZzBtYPOdvYIrzAeKX6lk=
MIME-Version: 1.0
Received: by 10.236.161.232 with SMTP id w68mr744459yhk.56.1329990696037; Thu, 23 Feb 2012 01:51:36 -0800 (PST)
Sender: hsivonen@gmail.com
Received: by 10.101.170.17 with HTTP; Thu, 23 Feb 2012 01:51:35 -0800 (PST)
In-Reply-To: <4F45D452.9010702@it.aoyama.ac.jp>
References: <CAJQvAudekOKa2mzas-igD_6pa2je000Darin2HDNda-sk9TLCQ@mail.gmail.com> <4F45D452.9010702@it.aoyama.ac.jp>
Date: Thu, 23 Feb 2012 11:51:35 +0200
X-Google-Sender-Auth: E7ChVdKWbIwwv1-w9XFXDkZceuE
Message-ID: <CAJQvAucL=SbE-0CfBwiFcb2+eKnA-_tNx7gMxtpMANqcUimXFQ@mail.gmail.com>
From: Henri Sivonen <hsivonen@iki.fi>
To: apps-discuss@ietf.org
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Cc: Anne van Kesteren <annevk@opera.com>
Subject: Re: [apps-discuss] Feedback about "Update to MIME regarding Charset Parameter Handling in Textual Media Types"
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Feb 2012 09:51:38 -0000

On Thu, Feb 23, 2012 at 7:53 AM, "Martin J. Dürst"
<duerst@it.aoyama.ac.jp> wrote:
> I'm not opposed to this in principle, but I'm worried that as written, it
> will lead to more and more unnecessarily different 'algorithms' and
> heuristics. The spec should make clear that where possible, established
> patters should be followed to reduce variety.

I suggesting that new stuff be UTF-8 only.

For old stuff, everything is a bit different. Off the top of my head:
text/javascript uses HTTP charset or the BOM or declaration from the
referrer (the charset="" attribute on <script>) or the encoding of the
referrer
text/css uses what text/javascript uses or an in-band declaration
text/plain uses what I described in my message (well, strictly what I
said was a synthesis of what Gecko, IE and WebKit do; Gecko will do it
when I get around to changing the precedence of the BOM to work as in
WebKit and IE)
text/html uses what text/plain uses or an in-band declaration
text/xml uses HTTP charset or the BOM or an in-band declaration

text/cache-manifest is sane an always uses UTF-8 (HTTP charset is ignored).

For new stuff, I'd much rather recommend the text/cache-manifest
pattern than the BOM/HTTP charset/in-band commonality of text/css,
text/html and text/xml.

>> 4. Determining the character encoding for text/plain
>
> This looks like a very long algorithm. It may work pretty well in a Web
> context (definitely better than an US-ASCII default), but what about other
> contexts (e.g. mail)?

I don't see why mail would need to be different. There are just some
steps in the algorithm (same-origin navigation and XMLHttpRequest or
similar) that never apply to mail.

>> If the entity is being loaded into a browsing context and is being
>> fetched from a location from which an entity has been loaded before
>> and the previous character encoding has been cached, the character
>> encoding is the cached encoding. Terminate these steps.
>>
>> If the entity is being loaded via a non-browsing context mechanism
>> (such as XMLHttpRequest) that defines a fallback encoding, use that
>> encoding. Terminate these steps.
>>
>> Otherwise, the character encoding is a configuration-dependent
>> encoding. The default configuration SHOULD depend on the locale of the
>> user agent according to the table given in step 8 in [5]. Terminate
>> these steps.
>
>
> What is missing here is that there is sometimes a need for user intervention
> if all else fails. While it may be possible to look at this as something
> outside of this algorithm, or as something subsumed under "configuration" or
> whatever, I'm afraid that the current text will give many implementers the
> impression that user overrides are not allowed. So I suggest adding
> something like:
>
>   In an interactive context, the user may occasionally want to
>   override the character encoding determined by this algorithm.

Right. User override should come right after the BOM steps.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/