Re: [apps-discuss] Last Call: <draft-ietf-appsawg-mime-default-charset-03.txt> (Update to MIME regarding Charset Parameter Handling in Textual Media Types) to Proposed Standard
Ned Freed <ned.freed@mrochek.com> Wed, 09 May 2012 05:37 UTC
Return-Path: <ned.freed@mrochek.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1BE8021F85E1 for <apps-discuss@ietfa.amsl.com>; Tue, 8 May 2012 22:37:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.473
X-Spam-Level:
X-Spam-Status: No, score=-2.473 tagged_above=-999 required=5 tests=[AWL=0.082, BAYES_00=-2.599, DATE_IN_PAST_03_06=0.044]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 3dPvIt4d5j02 for <apps-discuss@ietfa.amsl.com>; Tue, 8 May 2012 22:37:52 -0700 (PDT)
Received: from mauve.mrochek.com (mauve.mrochek.com [66.59.230.40]) by ietfa.amsl.com (Postfix) with ESMTP id 2100021F85D9 for <apps-discuss@ietf.org>; Tue, 8 May 2012 22:37:52 -0700 (PDT)
Received: from dkim-sign.mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01OF98WHN0JK001ATE@mauve.mrochek.com> for apps-discuss@ietf.org; Tue, 8 May 2012 22:37:49 -0700 (PDT)
MIME-version: 1.0
Content-type: TEXT/PLAIN; charset="ISO-8859-1"
Received: from mauve.mrochek.com by mauve.mrochek.com (PMDF V6.1-1 #35243) id <01OF7HODY84G0006TF@mauve.mrochek.com>; Tue, 8 May 2012 22:37:46 -0700 (PDT)
Message-id: <01OF98WFXDKI0006TF@mauve.mrochek.com>
Date: Tue, 08 May 2012 19:33:43 -0700
From: Ned Freed <ned.freed@mrochek.com>
In-reply-to: "Your message dated Tue, 08 May 2012 21:18:20 -0400" <CALaySJLFrKSF9JPBC54j0EaTQ6SNXM2+tag2uU2SmVjWxE7Erg@mail.gmail.com>
References: <20120423132812.32410.11259.idtracker@ietfa.amsl.com> <CAC4RtVDZfXi1JwGJLGwOVgsGuU-1dH-uj8bXTGCmjrva80mNhg@mail.gmail.com> <01OF8RSPPS320006TF@mauve.mrochek.com> <CALaySJLFrKSF9JPBC54j0EaTQ6SNXM2+tag2uU2SmVjWxE7Erg@mail.gmail.com>
To: Barry Leiba <barryleiba@computer.org>
Cc: Ned Freed <ned.freed@mrochek.com>, apps-discuss@ietf.org
Subject: Re: [apps-discuss] Last Call: <draft-ietf-appsawg-mime-default-charset-03.txt> (Update to MIME regarding Charset Parameter Handling in Textual Media Types) to Proposed Standard
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 09 May 2012 05:37:53 -0000
> > The new rules say that a media type either > > > > (1) Specifies that no charset parameter is used and that the charset is > > determined from inspection of the content, or > > > > (2) Requires inclusion of a charset parameter specifying what the charset > > is, or > > > > (3) Explicitly states what the default charset is. (Either with or without > > allowing an optional charset parameter as a means of overriding the > > default.) > The document makes all of those SHOULD. From section 3: > In order to improve interoperability with deployed agents, "text/*" > media type registrations SHOULD either > a. specify that the "charset" parameter is not used for the defined > subtype, because the charset information is transported inside > the payload (such as in "text/xml"), or > b. require explicit unconditional inclusion of the "charset" > parameter eliminating the need for a default value. > There's nothing that I read in this document that says that > registrations MUST do anything with regard to charset parameters. I never said it did. The requirement falls out of the options this document provides coupled with something I thought was obvious: The requirement that registrations not be obviously ambiguous. > Now, you may say that the designated expert would never accept a > registration that didn't (and well you might say that), but someone > looking at the registrations and their associated documentation won't > know that. So what you're saying is that someone could look at this document and conclude that it allows inherently ambiguous registrations? I guess I give the people who'd actually bother to read these specifications at this level of detail a bit more credit than that, but fair enough. > > As such, there is no possibility of confusion. An old text/* media type can get > > away without specifying charset information and relying on the language in RFC > > 2045; a new one cannot. > There is absolutely a possibility of confusion. Someone who doesn't > know you and the process well may see something registered and be > completely uncertain whether it's old (and therefore gets US-ASCII as > its default) or whether it's new and doesn't comply with the SHOULDs > here (and therefore gets &deity only knows what as its default). This has nothing to do with me. I would expect any reviewer to perform this level of checking. In fact one of the reasons expert review was instituted was that IANA internal review wasn't familiar enough with the technical details and thus was incapable of performing these sorts of checks. > > the question is whether or not you can convince the reviewer you > > have sufficient cause to violate the SHOULD. > And if you do so convince the reviewer, what does the default value > wind up being? And, again, how's an arbitrary person to know? Well, as it happens a similar case has come up in the past, so I don't have to guess. There was a text media type that was intended for use with video subtitles, where the format of the text was controlled by bilateral agreement between the sender and the receiver. So the security considerations could not be specified because they were entirely dependent on the format arranged by that agreement (could contain dangerous active content, might be nothing but ASCII, or anything in between). But the closest thing we allow for is to say "security considerations have not be assessed", which doesn't really fit. In that case I allowed the registration to say "... the security considerations of this type cannot be assessed because the format is determined by bilateral agreement ...". But what would never be allowed is to do something similar with a charset definition without saying *why* it was being done. > >> To maintain compatibility with existing registrations, this fallback rule > >> applies: any subtype of the "text" media type that does not comply with > >> the rules above retains US-ASCII as its default, as originally specified > >> in RFC 2046. > > > > That's a really bad idea for all sorts of reasons, including but not limited to > > it makes the document self-contradictory. You shouldn't remove a rule then say > > it's OK to fall back on it. > Well, I disagree with that. No matter, see below. > > Now, if you want to add something that says: > > > > Regardless of the approach chosen, all text/* registrations MUST clearly > > specify how the charset of the content is determined and MUST NOT rely > > on the RFC 2045 rule. > I'd change that to "all new text/* registrations", and I still want to > add a sentence that says that existing text/* registrations that don't > specify this retain the 2045 rule. It's still a bad idea to continue to rely on the old rule in any way. The other problem with the RFC 2045 rule is that it fell apart the minute HTTP overrode it and said that text/html without a charset parameter defaults to iso-8859-1. So right there you have what is likely the most commonly used subtype of text at this point breaking the rule you're now saying is what old types fall back to. Talk about confusing! (Really, in regards to that rule, all this document is doing is formalizing something that's been true for almost two decades.) So how about: Rgardless of the approach chosen, all new text/* registrations MUST clearly specify how the charset of the content is determined; relying on the RFC 2045 is no longer permitted. However, existing text/* registrations that fail to specify how the charset is determined still default to US-ASCII. Note that this is different than the old rule, which says that if no charset parameter is present the charset must default to us-ascii. And we probably want to update the main registration document with something similar. Ned
- [apps-discuss] Last Call: <draft-ietf-appsawg-mim… The IESG
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Barry Leiba
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Ned Freed
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Ned Freed
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Barry Leiba
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Ned Freed
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Barry Leiba
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Julian Reschke
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Ned Freed
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… t.petch
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Ned Freed
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Julian Reschke
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… t.petch
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Julian Reschke
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Ned Freed
- Re: [apps-discuss] Last Call: <draft-ietf-appsawg… Ned Freed