Re: [apps-discuss] RFC 6657 on Update to MIME regarding "charset" Parameter Handling in Textual Media Types

Graham Klyne <Graham.Klyne@zoo.ox.ac.uk> Tue, 10 July 2012 23:25 UTC

Return-Path: <Graham.Klyne@zoo.ox.ac.uk>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 50BF311E80FE for <apps-discuss@ietfa.amsl.com>; Tue, 10 Jul 2012 16:25:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.299
X-Spam-Level:
X-Spam-Status: No, score=-6.299 tagged_above=-999 required=5 tests=[AWL=0.300, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fHTKvHipxb1O for <apps-discuss@ietfa.amsl.com>; Tue, 10 Jul 2012 16:25:14 -0700 (PDT)
Received: from relay8.mail.ox.ac.uk (relay8.mail.ox.ac.uk [129.67.1.171]) by ietfa.amsl.com (Postfix) with ESMTP id 9654511E80E0 for <apps-discuss@ietf.org>; Tue, 10 Jul 2012 16:25:13 -0700 (PDT)
Received: from smtp1.mail.ox.ac.uk ([129.67.1.207]) by relay8.mail.ox.ac.uk with esmtp (Exim 4.75) (envelope-from <Graham.Klyne@zoo.ox.ac.uk>) id 1SojoI-0007nZ-RF; Wed, 11 Jul 2012 00:25:38 +0100
Received: from gklyne.plus.com ([80.229.154.156] helo=Eskarina.local) by smtp1.mail.ox.ac.uk with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from <Graham.Klyne@zoo.ox.ac.uk>) id 1SojoI-0003FN-3P; Wed, 11 Jul 2012 00:25:38 +0100
Message-ID: <4FFCB395.9030400@zoo.ox.ac.uk>
Date: Tue, 10 Jul 2012 23:58:29 +0100
From: Graham Klyne <Graham.Klyne@zoo.ox.ac.uk>
Organization: Oxford University
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:6.0) Gecko/20110812 Thunderbird/6.0
MIME-Version: 1.0
To: Ned Freed <ned.freed@mrochek.com>
References: <20120710000754.6BF59B1E006@rfc-editor.org> <4FFBE454.1020601@zoo.ox.ac.uk> <01OHOK4TIDIW0006TF@mauve.mrochek.com>
In-Reply-To: <01OHOK4TIDIW0006TF@mauve.mrochek.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Oxford-Username: zool0635
X-Mailman-Approved-At: Wed, 11 Jul 2012 09:06:20 -0700
Cc: apps-discuss@ietf.org
Subject: Re: [apps-discuss] RFC 6657 on Update to MIME regarding "charset" Parameter Handling in Textual Media Types
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 Jul 2012 23:25:16 -0000

On 10/07/2012 18:18, Ned Freed wrote:
>> On 10/07/2012 01:07, rfc-editor@rfc-editor.org wrote:
>> >
>> > A new Request for Comments is now available in online RFC libraries.
>> >
>> >
>> > RFC 6657
>> >
>> > Title: Update to MIME regarding "charset"
>> > Parameter Handling in Textual Media Types
>
>> I didn't see this one coming.
>
> It was discussed at considerable length both here and on the IETF list.

Sure, I just meant that I missed it.

>> I'm a bit confused by the specification.
>
> You need to keep in mind that this only applies to subtypes of text.

Ack.

>> If we define a media type that is *always* UTF-8, does this count as
>> transporting its own charset information?
>
> That's one approach you can use. The alternatives are to allow or require
> a charset parameter, always with the value utf-8. The best approach depends
> on the specifics of the type.
>
>> Should we say that the media type
>> SHOULD NOT be included, or that it SHOULD be included with value UTF-8?
>
> Included where? Within the content? If so, that's up to the registration to
> say. There are plenty of utf-8 based formats that don't provide for inclusion
> of media type information - and that includes some that use XML syntax.

Doh... I meant media type "charset" parameter.

There's no character encoding information in the content.

>> Section
>> 3 implies the latter, but it also talks about media types defining their own
>> default encoding.
>
> Relying on defaults is discouraged for historical reasons - they don't work
> very well. As such, if it's possible for the type to explicitly say what the
> charset is, that's probably the best way to do it. If the type isn't capable of
> that for whatever reason, your options are to simply say it's always utf-8 or
> alternately allow or require a charset parameter with utf-8 as the only value.
> The best approach depends on the situation, which is why the document is full
> of SHOULDs, not MUSTs.

Yeah, one of those.  I expect it will come up for review very soon, so you can 
comment if we've made the wrong call.

>> (This is not an academic question - a W3C group I'm involved with is about to
>> submit a registration for a UTF-8 only text/... media type)
>
> Does this type actually meet the criteria for text specified in RFC 2046
> section 4.1? I rather suspect it doesn't. If not, it really has no business
> being a text subtype, and all of this is moot.

I believe it does.  We're not talking XML or anything like that.  It's a textual 
notation for provenance information, intended for human and occasional machine 
consumption.

#g
--