Re: [apps-discuss] Last Call: <draft-ietf-appsawg-mime-default-charset-03.txt> (Update to MIME regarding Charset Parameter Handling in Textual Media Types) to Proposed Standard

Ned Freed <ned.freed@mrochek.com> Tue, 08 May 2012 21:28 UTC

Return-Path: <ned.freed@mrochek.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5BC9211E8079 for <apps-discuss@ietfa.amsl.com>; Tue, 8 May 2012 14:28:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.492
X-Spam-Level:
X-Spam-Status: No, score=-2.492 tagged_above=-999 required=5 tests=[AWL=0.107, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id clQPScXLlsuw for <apps-discuss@ietfa.amsl.com>; Tue, 8 May 2012 14:28:28 -0700 (PDT)
Received: from mauve.mrochek.com (mauve.mrochek.com [66.59.230.40]) by ietfa.amsl.com (Postfix) with ESMTP id 9C04E11E8072 for <apps-discuss@ietf.org>; Tue, 8 May 2012 14:28:28 -0700 (PDT)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01OF8RSR6RB4000VXS@mauve.mrochek.com> for apps-discuss@ietf.org; Tue, 8 May 2012 14:28:26 -0700 (PDT)
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset="iso-8859-1"
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01OF7HODY84G0006TF@mauve.mrochek.com>; Tue, 8 May 2012 14:28:23 -0700 (PDT)
Message-id: <01OF8RSPPS320006TF@mauve.mrochek.com>
Date: Tue, 08 May 2012 14:09:32 -0700
From: Ned Freed <ned.freed@mrochek.com>
In-reply-to: "Your message dated Tue, 08 May 2012 10:06:30 -0400" <CAC4RtVDZfXi1JwGJLGwOVgsGuU-1dH-uj8bXTGCmjrva80mNhg@mail.gmail.com>
References: <20120423132812.32410.11259.idtracker@ietfa.amsl.com> <CAC4RtVDZfXi1JwGJLGwOVgsGuU-1dH-uj8bXTGCmjrva80mNhg@mail.gmail.com>
To: Barry Leiba <barryleiba@computer.org>
Cc: apps-discuss@ietf.org
Subject: Re: [apps-discuss] Last Call: <draft-ietf-appsawg-mime-default-charset-03.txt> (Update to MIME regarding Charset Parameter Handling in Textual Media Types) to Proposed Standard
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 08 May 2012 21:28:29 -0000

> > Abstract
> >   This document changes RFC 2046 rules regarding default charset
> >   parameter values for text/* media types to better align with common
> >   usage by existing clients and servers.
> ...
> > The file can be obtained via
> > http://datatracker.ietf.org/doc/draft-ietf-appsawg-mime-default-charset/
> >
> > IESG discussion can be tracked via
> > http://datatracker.ietf.org/doc/draft-ietf-appsawg-mime-default-charset/ballot/

> This document sailed through IETF last call with no comments, but
> something has come up in IESG evaluation -- Robert chatted with me
> about this, and I suggested that he put a DISCUSS ballot in until we
> resolve it.

> The document makes it very clear that "existing registrations" are not
> affected (and therefore retain their RFC 2046 default of US-ASCII for
> charset).  But how does someone TELL that a subtype is "existing"?

This was brought up and resolved during last call comments.

The current rules allow a text/* media type to say anything about charsets,
and if it allows a optional charset parameter, there's no need to state what
the default is.

The new rules say that a media type either

(1) Specifies that no charset parameter is used and that the charset is
    determined from inspection of the content, or

(2) Requires inclusion of a charset parameter specifying what the charset
    is, or

(3) Explicitly states what the default charset is. (Either with or without
    allowing an optional charset parameter as a means of overriding the
    default.)

The last option is dicouraged (SHOULD NOT).

No other options are given, so it follows that a media type that fails to do
one of these things is supposed to be rejected by the reviewer.

As such, there is no possibility of confusion. An old text/* media type can get
away without specifying charset information and relying on the language in RFC
2045; a new one cannot.

> Chasing documentation pointers and checking dates is not a reasonable
> way.  Five years from now, when 60 more text subtypes have been added
> to the 60 that are there, how will anyone know *which* of those 120
> are affected by this spec and which are not?

A nonissue. See above.

> Further, both (a) and (b) in section 3 are things that SHOULD be done;
> what if a new registration does neither, violating both SHOULDs?  How
> will someone know what its default is?

And this is precisely why it is so important to be able to use compliance
language in registration RFCs. SHOULD means "do it unless you have a good
reason not to". In this case this isn't a judgement call made by an
implementor; the question is whether or not you can convince the reviewer you
have sufficient cause to violate the SHOULD.

> I think the right way to fix this is to put new text in near the end
> of section 3, just to make things absolutely clear:
> --------
> OLD
>   New subtypes of the "text" media type, thus, SHOULD NOT define a
>   default "charset" value.  If there is a strong reason to do so
>   despite this advice, they SHOULD use the "UTF-8" [RFC3629] charset as
>   the default.

> NEW
>   New subtypes of the "text" media type, thus, SHOULD NOT define a
>   default "charset" value.  If there is a strong reason to do so
>   despite this advice, they SHOULD use the "UTF-8" [RFC3629] charset as
>   the default.

>   To maintain compatibility with existing registrations, this fallback rule
>   applies: any subtype of the "text" media type that does not comply with
>   the rules above retains US-ASCII as its default, as originally specified
>   in RFC 2046.
> --------

That's a really bad idea for all sorts of reasons, including but not limited to
it makes the document self-contradictory. You shouldn't remove a rule then say
it's OK to fall back on it.

Now, if you want to add something that says:

  Regardless of the approach chosen, all text/* registrations MUST clearly
  specify how the charset of the content is determined and MUST NOT rely
  on the RFC 2045 rule.

I think this falls out of the existing text, but if you want to make it crystal
clear I don't have a problem.

> The document editors are OK with this text, but we want to pass it by
> the working group for comment.  Does anyone object to this suggestion?

I strongly object.

>  Does anyone think the issue should be addressed differently?

See above.

				Ned