Re: [apps-discuss] Working Group Last Call: draft-ietf-appsawg-xml-mediatypes
Julian Reschke <julian.reschke@gmx.de> Tue, 17 September 2013 20:22 UTC
Return-Path: <julian.reschke@gmx.de>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 897C411E8319 for <apps-discuss@ietfa.amsl.com>; Tue, 17 Sep 2013 13:22:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.299
X-Spam-Level:
X-Spam-Status: No, score=-104.299 tagged_above=-999 required=5 tests=[AWL=-2.300, BAYES_00=-2.599, J_CHICKENPOX_34=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VC1-wR-DlQeW for <apps-discuss@ietfa.amsl.com>; Tue, 17 Sep 2013 13:22:14 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) by ietfa.amsl.com (Postfix) with ESMTP id DA42811E81A7 for <apps-discuss@ietf.org>; Tue, 17 Sep 2013 13:22:13 -0700 (PDT)
Received: from [192.168.1.102] ([217.91.35.233]) by mail.gmx.com (mrgmx001) with ESMTPSA (Nemesis) id 0LjaEi-1VssZ63JM0-00bbig for <apps-discuss@ietf.org>; Tue, 17 Sep 2013 22:22:09 +0200
Message-ID: <5238B9E9.7010204@gmx.de>
Date: Tue, 17 Sep 2013 22:22:01 +0200
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0
MIME-Version: 1.0
To: "Murray S. Kucherawy" <superuser@gmail.com>, "apps-discuss@ietf.org" <apps-discuss@ietf.org>
References: <828708BA-E4BF-48DE-9E44-3C21063AA3D8@gmail.com>
In-Reply-To: <828708BA-E4BF-48DE-9E44-3C21063AA3D8@gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:BKifSCCUGcR3lCRHUKSppPlp54+3pZMW1WKB9I6tqi2fTzdX3I3 0+0gM/8s8spSfePYfazuJRByFFPqXXj0plL1KZ3w55k1oJCQGNSNwW1F0grPN9RNas/15zE ZTVJFXOpBB1Y0COb5vAt3aUWcMnQ08vqVEa/GrHFrQXl1KRsHAV43B7NNBVEYT7UZmMX8qq 8KWI40rPONLWd202rvr0Q==
Subject: Re: [apps-discuss] Working Group Last Call: draft-ietf-appsawg-xml-mediatypes
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Sep 2013 20:22:20 -0000
On 2013-07-29 10:08, Murray S. Kucherawy wrote: > This note begins a Working Group Last Call for draft-ietf-appsawg-xml-mediatypes, ending on Friday, August 16. Please provide reviews and comments on this list or privately to the authors as soon as possible. > ... Here's my late feedback (IETF, interim meetings, vacation, etc pp): Updates: 4289, 6839 (if approved) Really? Major differences from [RFC3023] are alignment of charset handling for text/xml and text/xml-external-parsed-entity with application/ xml, the addition of XPointer and XML Base as fragment identifiers and base URIs, respectively, mention of the XPointer Registry, and updating of many references. I don't think this needs to be in the Abstract. Also, references are discouraged here because the abstract should be usable stand-alone. So maybe move into the Introduction. document entities The media types application/xml or text/xml MAY be used s/used/used./ Application/xml and application/xml-external-parsed-entity are recommended. Compared to [RFC2376] or [RFC3023], this specification alters the charset handling of text/xml and text/xml-external-parsed- entity, treating them no differently from the respective application/ types. The reasons are as follows: s/Application/application/ Also, avoid lowercase "recommended" it it's not a "RECOMMENDED". Conflicting specifications regarding the character encoding have caused confusion. On the one hand, [RFC2046] specifies "The default character set, which must be assumed in the absence of a charset parameter, is US-ASCII.", [RFC2616] Section 3.7.1, defines that "media subtypes of the 'text' type are defined to have a default charset value of 'ISO-8859-1'", and [RFC2376] as well as [RFC3023] specify the default charset is US-ASCII. I think this just repeats history already captureed in RFC 6557. Do we really need to repeat it over here? The current situation, reflected in this specification, has been simplified by [RFC6657] updating [RFC2046] to remove the US-ASCII default. Furthermore, in accordance with [RFC6657]'s other recommendations, [HTTPbis] changes [RFC2616] by removing the ISO-8859-1 default and not defining any default at all. This is a bit misleading as the change in httpbis predates RFC6657 significantly. The top-level media type "text" has some restrictions on MIME entities and they are described in [RFC2045] and [RFC2046]. In particular, for transports other than HTTP [RFC2616] or HTTPS (which uses a MIME-like mechanism). the UTF-16 family, UCS-4, and It would be helpful if the reference to 2045/6 would be a bite more specific. I'd also prefer to get rid of all RFC2616 references except when referring to the specification's history. However, developers of such media types are STRONGLY RECOMMENDED to use this specification as a basis for their registration. In particular, the charset parameter, if used, MUST agree with the in- band XML encoding of the XML entity, as described in Section 3.6, in order to enhance interoperability. There's no "STRONGLY" keyword. In general, I'd avoid to use BCP14 keywords for recommendations to people. Encoding considerations: This media type MAY be encoded as appropriate for the charset and the capabilities of the underlying MIME transport. For 7-bit transports, data in either UTF-8 or I don't understand the "MAY" here. Published specification: Extensible Markup Language (XML) 1.0 (Fifth Edition) [XML], Extensible Markup Language (XML) 1.1 (Second Edition) [XML1.1]. OK, so I can use the same media type for both XML 1.0 and 1.1. However, the way this is phrased makes it appear as if XML 1.1 is somehow more ... recent when in fact it was a dead-end. I recommend dropping the references about 1.1 from everywhere, and just have a single place that points out that what's said about 1.0 is also true for 1.1. Interoperability considerations: XML DTDs have proven to be interoperable by DTD authoring tools and XML browsers, among others. What is an "XML browser"? If this is about web browsers I really have my doubts that they work interoperably :-) The charset parameter MUST only be used, when the charset is reliably known and agrees with the in-band XML encoding declaration. This s/used,/used/ Also, what if there is no in-band declaration? authoritatively the charset of the XML MIME entity. The charset parameter can also be used to provide protocol-specific operations, such as charset-based content negotiation in HTTP. That's misleading. charset-based content negotiation happens by use of Accept-Encoding, bot the charset parameter. There are several reasons that the charset parameter is optionally allowed. First, recent web servers have been improved so that users That text is 12 years old. We may want to drop or rephrase it :-) can specify the charset parameter. Second, [RFC2130] (informative) specifies that the recommended specification scheme is the "charset" parameter. That refers to a document from 1996. Is this really relevant here? On the other hand, it has been argued that the charset parameter should be omitted and the mechanism described in Appendix F of [XML] (which is non-normative) should be solely relied on. This approach would allow users to avoid configuration of the charset parameter; an XML document stored in a file is likely to contain a correct encoding declaration or BOM (if necessary), since the operating system does not typically provide charset information for files. If users would like to rely on the in-band XML encoding declaration or BOM and/or to conceal charset information from non-XML processors, they can omit the parameter. This now is really the recommended approach, no? Maybe the whole of 3.6.1 should be removed then. Uniform Resource Identifiers (URIs) may contain fragment identifiers (see Section 3.5 of [RFC3986]). Likewise, Internationalized Resource Identifiers (IRIs) [RFC3987] may contain fragment identifiers. s/may/can/ Also, the reference to RFC3987 really doesn't add anything useful here. See Section 8.1 for additional rquirements which apply when an XML- based MIME media type follows the naming convention '+xml'. s/rquirenents/requirements/ If [XPointerFramework] and [XPointerElement] are inappropriate for some XML-based media type, it SHOULD NOT follow the naming convention '+xml'. Really? Why not? What about application/xhtml+xml? When a URI has a fragment identifier, it is encoded by a limited subset of the repertoire of US-ASCII [ASCII] characters, as defined in [RFC3986]. When an IRI contains a fragment identifier, it is encoded by a much wider repertoire of characters. The conversion between IRI fragment identifiers and URI fragment identifiers is presented in Section 7 of [RFC3987]. I recommend to drop the IRI specific part. This is not specific to XML types. Note that the base URI may be embedded in a different MIME entity, since the default value for the xml:base attribute may be specified in an external DTD subset or external parameter entity. s/may/might/ s/may/can/ application/xml, application/xml-external-parsed-entity, and application/xml-dtd, text/xml and text/xml-external-parsed-entity are to be used with [XML] In all examples herein where version="1.0" is s/[XML]/[XML]./ This specification recommends the use of a naming convention (a suffix of '+xml') for identifying XML-based MIME media types, s/MIME// (there may be more instances of this) whatever their particular content may represent, in line with the What is the "whatever their particular content may represent" about? When a new media type is introduced for an XML-based format, the name of the media type SHOULD end with '+xml'. This convention will allow Which may be in conflict with the SHOULD NOT I complained about earlier on :-) NOTE: Section 14.1 of HTTP [RFC2616] does not support Accept headers of the form "Accept: */*+xml" and so this header MUST NOT be used in this way. Instead, content negotiation [RFC2703] could potentially be used if an XML-based MIME type were needed. Please cite HTTPbis P2. Also, content negotiation is defined by HTTP, not RFC 2703. XML generic processing is not always appropriate for XML-based media types. For example, authors of some such media types may wish that the types remain entirely opaque except to applications that are specifically designed to deal with that media type. By NOT following the naming convention '+xml', such media types can avoid XML-generic processing. Since generic processing will be useful in many cases, however -- including in some situations that are difficult to predict ahead of time -- those registering media types SHOULD use the '+xml' convention unless they have a particularly compelling reason not to. I recommend to avoid the use of SHOULD here. Just explain the pros and cons. The registration process for specific '+xml' media types is described in [RFC6838] and [RFC6839]. The registrar for the IETF tree will Just RFC6838, as far as I can tell. The use of the charset parameter is STRONGLY RECOMMENDED, since this information can be used by XML processors to determine authoritatively the charset of the XML MIME entity. If there are some reasons not to follow this advice, they SHOULD be included as part of the registration. As shown above, two such reasons are "UTF-8 only" or "UTF-8 or UTF-16 only". That's misleading. People may read it as saying that the *presence* of the charset parameter is RECOMMENDED. In practice these constraints imply that for a fragment identifier addressed to an instance of a specific "xxx/yyy+xml" type, there are three cases: For fragment identifiers matching the syntax defined in Section 5, where the fragment identifier resolves per the rules specified there, then process as specified there; Section 5 does not define the syntax (other then referencing XPointer). So this is a bit hard to process. For fragment identifiers _not_ matching the syntax defined in Section 5, then process as specified in "xxx/yyy+xml". What would be an example for this case? All the examples below apply to all five media types declared above in Section 3, as well as to any media types declared using the '+xml' convention. See the XML MIME entities table (Section 3, Paragraph 2) Well, unless that type does not define the charset parameter, right? This section is non-normative. In particular, note that all "MUST" language herein reproduces or summarizes the consequences of normative statement already made above, and have no independent normative force. Can we avoid the use of MUST here, then? :-) Content-type charset: charset="utf-8" Maybe it would be less confusing to say: "charset specified in content-type:" printable or base64. For an 8-bit clean transport (e.g., 8BITMIME ESMTP or NNTP), or a binary clean transport (e.g., HTTP), no content- transfer-encoding is necessary. ...as HTTP does not even define content-transfer-encoding. (same applies to parts below) As described in [RFC2781], the UTF-16 family MUST NOT be used with media types under the top-level type "text" except over HTTP or HTTPS (see section 19.4.1 of [RFC2616] for details). Hence this example is Not sure how that section of 2616 is relevant here. Omitting the charset parameter is NOT RECOMMENDED for application/... when used with transports other than HTTP or HTTPS---text/... SHOULD NOT be used for 16-bit MIME with transports other than HTTP or HTTPS (see discussion above (Section 9.2, Paragraph 6)). Please avoid uppercasing not-BCP14 keywords :-) Since the charset parameter is provided in the Content-Type header and differs from the XML encoding declaration, MIME and XML processors will not interoperate. MIME processors will treat the enclosed entity as UTF-8 encoded. That is, the "iso-8859-1" encoding will be ignored. XML processors on the other hand will ignore the charset parameter and treat the XML entity as encoded in iso-8859-1. Do we have a definition of "MIME processor"? As described in Section 8, this specification updates the [RFC6838] and [RFC6839] registration process for XML-based MIME types. My understanding is that the registration process is defined in 6838 only. the most dangerous option available to crackers is redefining default s/crackers/attackers/ Fourth, many references are updated, and the existence and relevance of XML 1.1 acknowledged. Finally, a number of justifications and As far as I can tell, XML 1.1 is totally irrelevant... Best regards, Julian
- [apps-discuss] Working Group Last Call: draft-iet… Murray S. Kucherawy
- Re: [apps-discuss] Working Group Last Call: draft… Murray S. Kucherawy
- Re: [apps-discuss] Working Group Last Call: draft… SM
- Re: [apps-discuss] Working Group Last Call: draft… Henry S. Thompson
- Re: [apps-discuss] Working Group Last Call: draft… Julian Reschke
- Re: [apps-discuss] Working Group Last Call: draft… Erik Wilde
- Re: [apps-discuss] Working Group Last Call: draft… Murray S. Kucherawy
- Re: [apps-discuss] Working Group Last Call: draft… Dave Cridland
- Re: [apps-discuss] Working Group Last Call: draft… Julian Reschke
- Re: [apps-discuss] Working Group Last Call: draft… Bjoern Hoehrmann
- Re: [apps-discuss] Working Group Last Call: draft… Erik Wilde
- Re: [apps-discuss] Working Group Last Call: draft… Bjoern Hoehrmann
- Re: [apps-discuss] Working Group Last Call: draft… Julian Reschke
- Re: [apps-discuss] Working Group Last Call: draft… Bjoern Hoehrmann
- Re: [apps-discuss] Working Group Last Call: draft… Julian Reschke
- Re: [apps-discuss] Working Group Last Call: draft… Henry S. Thompson
- Re: [apps-discuss] Working Group Last Call: draft… Julian Reschke
- Re: [apps-discuss] Working Group Last Call:draft-… t.petch
- Re: [apps-discuss] Working Group Last Call:draft-… Murray S. Kucherawy
- Re: [apps-discuss] Working Group Last Call:draft-… Bjoern Hoehrmann
- Re: [apps-discuss] Working Group Last Call:draft-… Henry S. Thompson
- Re: [apps-discuss] Working Group Last Call:draft-… Bjoern Hoehrmann
- Re: [apps-discuss] Working Group Last Call:draft-… Julian Reschke
- Re: [apps-discuss] Working Group Last Call:draft-… Erik Wilde
- Re: [apps-discuss] Working Group Last Call:draft-… Henry S. Thompson
- Re: [apps-discuss] Working Group Last Call:draft-… Bjoern Hoehrmann
- Re: [apps-discuss] Working Group Last Call: draft… Tony Hansen
- Re: [apps-discuss] Working Group Last Call:draft-… Julian Reschke
- Re: [apps-discuss] Working Group Last Call:draft-… Henry S. Thompson
- Re: [apps-discuss] Working Group Last Call:draft-… Julian Reschke
- Re: [apps-discuss] Working Group Last Call:draft-… Murray S. Kucherawy
- Re: [apps-discuss] Working Group Last Call:draft-… t.petch
- Re: [apps-discuss] Working Group Last Call:draft-… Julian Reschke
- Re: [apps-discuss] Working Group Last Call:draft-… Henry S. Thompson
- Re: [apps-discuss] Working Group Last Call:draft-… Henry S. Thompson
- [apps-discuss] Working Group Last Call: draft-iet… Murray S. Kucherawy