Re: [apps-discuss] Feedback about "Update to MIME regarding Charset Parameter Handling in Textual Media Types"

Ned Freed <ned.freed@mrochek.com> Thu, 23 February 2012 17:34 UTC

Return-Path: <ned.freed@mrochek.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 40CB121F84F7 for <apps-discuss@ietfa.amsl.com>; Thu, 23 Feb 2012 09:34:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.584
X-Spam-Level:
X-Spam-Status: No, score=-2.584 tagged_above=-999 required=5 tests=[AWL=0.015, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9MF8oQ5qwO3a for <apps-discuss@ietfa.amsl.com>; Thu, 23 Feb 2012 09:34:45 -0800 (PST)
Received: from mauve.mrochek.com (mauve.mrochek.com [66.59.230.40]) by ietfa.amsl.com (Postfix) with ESMTP id C49EA21F84DD for <apps-discuss@ietf.org>; Thu, 23 Feb 2012 09:34:34 -0800 (PST)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01OCBPOGT2CG005TR0@mauve.mrochek.com> for apps-discuss@ietf.org; Thu, 23 Feb 2012 09:34:32 -0800 (PST)
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01OCBNQWFSE8012404@mauve.mrochek.com>; Thu, 23 Feb 2012 09:34:29 -0800 (PST)
Message-id: <01OCBPOF7M1W012404@mauve.mrochek.com>
Date: Thu, 23 Feb 2012 08:42:33 -0800
From: Ned Freed <ned.freed@mrochek.com>
In-reply-to: "Your message dated Thu, 23 Feb 2012 12:11:56 +0200" <CAJQvAufpNOJ85QpQs5DgWO1dztdxi-8DtQv0ZdVS-fs7reYB4Q@mail.gmail.com>
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset="utf-8"
References: <CAJQvAudekOKa2mzas-igD_6pa2je000Darin2HDNda-sk9TLCQ@mail.gmail.com> <01OCB41ZCJES00ZUIL@mauve.mrochek.com> <CAJQvAufpNOJ85QpQs5DgWO1dztdxi-8DtQv0ZdVS-fs7reYB4Q@mail.gmail.com>
To: Henri Sivonen <hsivonen@iki.fi>
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=mrochek.com; s=mauve; t=1330018477; bh=Vi4FpDs+yyG1MEgXZRaDZAHgqZWLYNrqtBQOUAGxTJI=; h=Cc:Message-id:Date:From:Subject:In-reply-to:MIME-version: Content-type:References:To; b=a4S6x7RPcNoLJ0WC3hWeEzuWRB6PNQEo6rhNjAlUeT8L2RotZZ2ABx1wtdMK+TbrH O4cQAb0AeJ9KnqdYQheLSoz37DSM878bw11fIVAMAMkA+5q0RhmCUwt6hv6LeQw3du xNKDxK3p63B66AxrQUVXXlMdE7LW5lEivDbcYM3c=
Cc: Anne van Kesteren <annevk@opera.com>, apps-discuss@ietf.org
Subject: Re: [apps-discuss] Feedback about "Update to MIME regarding Charset Parameter Handling in Textual Media Types"
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 23 Feb 2012 17:34:46 -0000

> On Thu, Feb 23, 2012 at 5:59 AM, Ned Freed <ned.freed@mrochek.com> wrote:
> >> Additionally, media types should be able to define circumstances where
> >> in-band indicators override the charset parameter even if the charset
> >> parameter is present.
> >
> > That's a terrible way to do it

> It's reality in WebKit and IE for text/html.

You continue to confuse what implementations are allowed to do with what
registrations are allowed to contain. This is the fundamental error you make
repeatedly.

Again, if the labelling is wrong, the object is incompliant and
*implementations* are free to handle that case however they wish. But this
doesn't mean we should codify specific implementation practice in our
registratration process. In fact, as I said previously, this has the effect of
*limiting* implemetaitons options, so when the world changes, as it always
does, we end up with inflexible standards incapable of handling that new
reality.

If you want a specific example of why this sort of thing is a bad idea, you
need look no further than the original HTTP specifications. Those
specifications changed the default charset for text/plain from us-ascii to
iso-8859-1 for HTTP. It may have been a good idea at the time (or not), but how
about now?

> So far, evidence suggests
> that Gecko would serve users better if it changed to treat the BOM as
> having higher precedence than the HTTP-level charset parameter for
> text/html and text/javascript.

You're missing a critical qualifier here: *current* evidence. It may be true
now. What happens if when practices change and we end up stuck with a bunch of
advice that doesn't match whatever practices show up in the future?

> (text/plain is loaded using the HTML
> parser per the HTML spec and per reality, so it makes sense to do the
> same for text/plain even though I don't have evidence to show about
> compatibility issues either way.)

> > - if the type is self-identifying in terms of
> > charset, a charset parameter should simply not be defined for the type -
> > exactly what the current specification says to do.

> Logically, yes, but that's not how text/html, text/xml and text/css work.

> >> In particular, media types should be allowed to override the charset
> >> parameter if the first two or three bytes of the payload look like an
> >> UTF-16 or UTF-8 BOM.
> >
> > There are quite a few charsets in existence where it is perfectly permissible
> > for the first few bytes to match a BOM, except that it means something entirely
> > different.

> Seems to work for IE and WebKit in practice.

It doesn't work well at all in my experience, but even if you're generally
right at present, what guarantees can you offer for tomorrow?

Anyway, I see no point in continuing to argue this further. I've registered my
strong objections to all but one of your proposed changes, and that's
sufficient. I'm done.

				Ned