Re: [apps-discuss] Working Group Last Call: draft-ietf-appsawg-xml-mediatypes

Julian Reschke <julian.reschke@gmx.de> Tue, 17 September 2013 20:22 UTC

Message-ID: <5238B9E9.7010204@gmx.de>
Date: Tue, 17 Sep 2013 22:22:01 +0200
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0
MIME-Version: 1.0
To: "Murray S. Kucherawy" <superuser@gmail.com>, "apps-discuss@ietf.org" <apps-discuss@ietf.org>
References: <828708BA-E4BF-48DE-9E44-3C21063AA3D8@gmail.com>
In-Reply-To: <828708BA-E4BF-48DE-9E44-3C21063AA3D8@gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Subject: Re: [apps-discuss] Working Group Last Call: draft-ietf-appsawg-xml-mediatypes
Precedence: list

On 2013-07-29 10:08, Murray S. Kucherawy wrote:
> This note begins a Working Group Last Call for draft-ietf-appsawg-xml-mediatypes, ending on Friday, August 16. Please provide reviews and comments on this list or privately to the authors as soon as possible.
> ...

Here's my late feedback (IETF, interim meetings, vacation, etc pp):

Updates: 4289, 6839 (if approved)

Really?

Major differences from [RFC3023] are alignment of charset handling
for text/xml and text/xml-external-parsed-entity with application/
xml, the addition of XPointer and XML Base as fragment identifiers
and base URIs, respectively, mention of the XPointer Registry, and
updating of many references.

I don't think this needs to be in the Abstract. Also, references are
discouraged here because the abstract should be usable stand-alone. So
maybe move into the Introduction.

document entities The media types application/xml or text/xml MAY be
used

s/used/used./

Application/xml and application/xml-external-parsed-entity are
recommended. Compared to [RFC2376] or [RFC3023], this specification
alters the charset handling of text/xml and text/xml-external-parsed-
entity, treating them no differently from the respective application/
types. The reasons are as follows:

s/Application/application/

Also, avoid lowercase "recommended" it it's not a "RECOMMENDED".

Conflicting specifications regarding the character encoding have
caused confusion. On the one hand, [RFC2046] specifies "The
default character set, which must be assumed in the absence of a
charset parameter, is US-ASCII.", [RFC2616] Section 3.7.1, defines
that "media subtypes of the 'text' type are defined to have a
default charset value of 'ISO-8859-1'", and [RFC2376] as well as
[RFC3023] specify the default charset is US-ASCII.

I think this just repeats history already captureed in RFC 6557. Do we
really need to repeat it over here?

The current situation, reflected in this specification, has been
simplified by [RFC6657] updating [RFC2046] to remove the US-ASCII
default. Furthermore, in accordance with [RFC6657]'s other
recommendations, [HTTPbis] changes [RFC2616] by removing the
ISO-8859-1 default and not defining any default at all.

This is a bit misleading as the change in httpbis predates RFC6657
significantly.

The top-level media type "text" has some restrictions on MIME
entities and they are described in [RFC2045] and [RFC2046]. In
particular, for transports other than HTTP [RFC2616] or HTTPS
(which uses a MIME-like mechanism). the UTF-16 family, UCS-4, and

It would be helpful if the reference to 2045/6 would be a bite more
specific.

I'd also prefer to get rid of all RFC2616 references except when
referring to the specification's history.

However, developers of such media types are STRONGLY RECOMMENDED to
use this specification as a basis for their registration. In
particular, the charset parameter, if used, MUST agree with the in-
band XML encoding of the XML entity, as described in Section 3.6, in
order to enhance interoperability.

There's no "STRONGLY" keyword. In general, I'd avoid to use BCP14
keywords for recommendations to people.

Encoding considerations: This media type MAY be encoded as
appropriate for the charset and the capabilities of the underlying
MIME transport. For 7-bit transports, data in either UTF-8 or

I don't understand the "MAY" here.

Published specification: Extensible Markup Language (XML) 1.0 (Fifth
Edition) [XML], Extensible Markup Language (XML) 1.1 (Second
Edition) [XML1.1].

OK, so I can use the same media type for both XML 1.0 and 1.1. However,
the way this is phrased makes it appear as if XML 1.1 is somehow more
... recent when in fact it was a dead-end.

I recommend dropping the references about 1.1 from everywhere, and just
have a single place that points out that what's said about 1.0 is also
true for 1.1.

Interoperability considerations: XML DTDs have proven to be
interoperable by DTD authoring tools and XML browsers, among
others.

What is an "XML browser"? If this is about web browsers I really have my
doubts that they work interoperably :-)

The charset parameter MUST only be used, when the charset is reliably
known and agrees with the in-band XML encoding declaration. This

s/used,/used/

Also, what if there is no in-band declaration?

authoritatively the charset of the XML MIME entity. The charset
parameter can also be used to provide protocol-specific operations,
such as charset-based content negotiation in HTTP.

That's misleading. charset-based content negotiation happens by use of
Accept-Encoding, bot the charset parameter.

There are several reasons that the charset parameter is optionally
allowed. First, recent web servers have been improved so that users

That text is 12 years old. We may want to drop or rephrase it :-)

can specify the charset parameter. Second, [RFC2130] (informative)
specifies that the recommended specification scheme is the "charset"
parameter.

That refers to a document from 1996. Is this really relevant here?

On the other hand, it has been argued that the charset parameter
should be omitted and the mechanism described in Appendix F of [XML]
(which is non-normative) should be solely relied on. This approach
would allow users to avoid configuration of the charset parameter; an
XML document stored in a file is likely to contain a correct encoding
declaration or BOM (if necessary), since the operating system does
not typically provide charset information for files. If users would
like to rely on the in-band XML encoding declaration or BOM and/or to
conceal charset information from non-XML processors, they can omit
the parameter.

This now is really the recommended approach, no? Maybe the whole of
3.6.1 should be removed then.

Uniform Resource Identifiers (URIs) may contain fragment identifiers
(see Section 3.5 of [RFC3986]). Likewise, Internationalized Resource
Identifiers (IRIs) [RFC3987] may contain fragment identifiers.

s/may/can/

Also, the reference to RFC3987 really doesn't add anything useful here.

See Section 8.1 for additional rquirements which apply when an XML-
based MIME media type follows the naming convention '+xml'.

s/rquirenents/requirements/

If [XPointerFramework] and [XPointerElement] are inappropriate for
some XML-based media type, it SHOULD NOT follow the naming convention
'+xml'.

Really? Why not? What about application/xhtml+xml?

When a URI has a fragment identifier, it is encoded by a limited
subset of the repertoire of US-ASCII [ASCII] characters, as defined
in [RFC3986]. When an IRI contains a fragment identifier, it is
encoded by a much wider repertoire of characters. The conversion
between IRI fragment identifiers and URI fragment identifiers is
presented in Section 7 of [RFC3987].

I recommend to drop the IRI specific part. This is not specific to XML
types.

Note that the base URI may be embedded in a different MIME entity,
since the default value for the xml:base attribute may be specified
in an external DTD subset or external parameter entity.

s/may/might/ s/may/can/

application/xml, application/xml-external-parsed-entity, and
application/xml-dtd, text/xml and text/xml-external-parsed-entity are
to be used with [XML] In all examples herein where version="1.0" is

s/[XML]/[XML]./

This specification recommends the use of a naming convention (a
suffix of '+xml') for identifying XML-based MIME media types,

s/MIME// (there may be more instances of this)

whatever their particular content may represent, in line with the

What is the "whatever their particular content may represent" about?

When a new media type is introduced for an XML-based format, the name
of the media type SHOULD end with '+xml'. This convention will allow

Which may be in conflict with the SHOULD NOT I complained about earlier
on :-)

NOTE: Section 14.1 of HTTP [RFC2616] does not support Accept
headers of the form "Accept: */*+xml" and so this header MUST NOT
be used in this way. Instead, content negotiation [RFC2703] could
potentially be used if an XML-based MIME type were needed.

Please cite HTTPbis P2. Also, content negotiation is defined by HTTP,
not RFC 2703.

XML generic processing is not always appropriate for XML-based media
types. For example, authors of some such media types may wish that
the types remain entirely opaque except to applications that are
specifically designed to deal with that media type. By NOT following
the naming convention '+xml', such media types can avoid XML-generic
processing. Since generic processing will be useful in many cases,
however -- including in some situations that are difficult to predict
ahead of time -- those registering media types SHOULD use the '+xml'
convention unless they have a particularly compelling reason not to.

I recommend to avoid the use of SHOULD here. Just explain the pros and cons.

The registration process for specific '+xml' media types is described
in [RFC6838] and [RFC6839]. The registrar for the IETF tree will

Just RFC6838, as far as I can tell.

The use of the charset parameter is STRONGLY RECOMMENDED, since this
information can be used by XML processors to determine
authoritatively the charset of the XML MIME entity. If there are
some reasons not to follow this advice, they SHOULD be included as
part of the registration. As shown above, two such reasons are
"UTF-8 only" or "UTF-8 or UTF-16 only".

That's misleading. People may read it as saying that the *presence* of
the charset parameter is RECOMMENDED.

In practice these constraints imply that for a fragment
identifier addressed to an instance of a specific "xxx/yyy+xml"
type, there are three cases:

For fragment identifiers matching the syntax defined in
Section 5, where the fragment identifier resolves per the
rules specified there, then process as specified there;

Section 5 does not define the syntax (other then referencing XPointer).
So this is a bit hard to process.

For fragment identifiers _not_ matching the syntax defined
in Section 5, then process as specified in "xxx/yyy+xml".

What would be an example for this case?

All the examples below apply to all five media types declared above
in Section 3, as well as to any media types declared using the '+xml'
convention. See the XML MIME entities table (Section 3, Paragraph 2)

Well, unless that type does not define the charset parameter, right?

This section is non-normative. In particular, note that all "MUST"
language herein reproduces or summarizes the consequences of
normative statement already made above, and have no independent
normative force.

Can we avoid the use of MUST here, then? :-)

Content-type charset: charset="utf-8"

Maybe it would be less confusing to say: "charset specified in
content-type:"

printable or base64. For an 8-bit clean transport (e.g., 8BITMIME
ESMTP or NNTP), or a binary clean transport (e.g., HTTP), no content-
transfer-encoding is necessary.

...as HTTP does not even define content-transfer-encoding. (same applies
to parts below)

As described in [RFC2781], the UTF-16 family MUST NOT be used with
media types under the top-level type "text" except over HTTP or HTTPS
(see section 19.4.1 of [RFC2616] for details). Hence this example is

Not sure how that section of 2616 is relevant here.

Omitting the charset parameter is NOT RECOMMENDED for application/...
when used with transports other than HTTP or HTTPS---text/... SHOULD
NOT be used for 16-bit MIME with transports other than HTTP or HTTPS
(see discussion above (Section 9.2, Paragraph 6)).

Please avoid uppercasing not-BCP14 keywords :-)

Since the charset parameter is provided in the Content-Type header
and differs from the XML encoding declaration, MIME and XML
processors will not interoperate. MIME processors will treat the
enclosed entity as UTF-8 encoded. That is, the "iso-8859-1" encoding
will be ignored. XML processors on the other hand will ignore the
charset parameter and treat the XML entity as encoded in iso-8859-1.

Do we have a definition of "MIME processor"?

As described in Section 8, this specification updates the [RFC6838]
and [RFC6839] registration process for XML-based MIME types.

My understanding is that the registration process is defined in 6838 only.

the most dangerous option available to crackers is redefining default

s/crackers/attackers/

Fourth, many references are updated, and the existence and relevance
of XML 1.1 acknowledged. Finally, a number of justifications and

As far as I can tell, XML 1.1 is totally irrelevant...

Best regards, Julian

[apps-discuss] Working Group Last Call: draft-iet… Murray S. Kucherawy
Re: [apps-discuss] Working Group Last Call: draft… Murray S. Kucherawy
Re: [apps-discuss] Working Group Last Call: draft… SM
Re: [apps-discuss] Working Group Last Call: draft… Henry S. Thompson
Re: [apps-discuss] Working Group Last Call: draft… Julian Reschke
Re: [apps-discuss] Working Group Last Call: draft… Erik Wilde
Re: [apps-discuss] Working Group Last Call: draft… Murray S. Kucherawy
Re: [apps-discuss] Working Group Last Call: draft… Dave Cridland
Re: [apps-discuss] Working Group Last Call: draft… Julian Reschke
Re: [apps-discuss] Working Group Last Call: draft… Bjoern Hoehrmann
Re: [apps-discuss] Working Group Last Call: draft… Erik Wilde
Re: [apps-discuss] Working Group Last Call: draft… Bjoern Hoehrmann
Re: [apps-discuss] Working Group Last Call: draft… Julian Reschke
Re: [apps-discuss] Working Group Last Call: draft… Bjoern Hoehrmann
Re: [apps-discuss] Working Group Last Call: draft… Julian Reschke
Re: [apps-discuss] Working Group Last Call: draft… Henry S. Thompson
Re: [apps-discuss] Working Group Last Call: draft… Julian Reschke
Re: [apps-discuss] Working Group Last Call:draft-… t.petch
Re: [apps-discuss] Working Group Last Call:draft-… Murray S. Kucherawy
Re: [apps-discuss] Working Group Last Call:draft-… Bjoern Hoehrmann
Re: [apps-discuss] Working Group Last Call:draft-… Henry S. Thompson
Re: [apps-discuss] Working Group Last Call:draft-… Bjoern Hoehrmann
Re: [apps-discuss] Working Group Last Call:draft-… Julian Reschke
Re: [apps-discuss] Working Group Last Call:draft-… Erik Wilde
Re: [apps-discuss] Working Group Last Call:draft-… Henry S. Thompson
Re: [apps-discuss] Working Group Last Call:draft-… Bjoern Hoehrmann
Re: [apps-discuss] Working Group Last Call: draft… Tony Hansen
Re: [apps-discuss] Working Group Last Call:draft-… Julian Reschke
Re: [apps-discuss] Working Group Last Call:draft-… Henry S. Thompson
Re: [apps-discuss] Working Group Last Call:draft-… Julian Reschke
Re: [apps-discuss] Working Group Last Call:draft-… Murray S. Kucherawy
Re: [apps-discuss] Working Group Last Call:draft-… t.petch
Re: [apps-discuss] Working Group Last Call:draft-… Julian Reschke
Re: [apps-discuss] Working Group Last Call:draft-… Henry S. Thompson
Re: [apps-discuss] Working Group Last Call:draft-… Henry S. Thompson
[apps-discuss] Working Group Last Call: draft-iet… Murray S. Kucherawy