Re: [apps-discuss] Working Group Last Call: draft-ietf-appsawg-xml-mediatypes

Julian Reschke <julian.reschke@gmx.de> Tue, 17 September 2013 20:22 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 897C411E8319 for <apps-discuss@ietfa.amsl.com>; Tue, 17 Sep 2013 13:22:20 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.299
X-Spam-Level:
X-Spam-Status: No, score=-104.299 tagged_above=-999 required=5 tests=[AWL=-2.300, BAYES_00=-2.599, J_CHICKENPOX_34=0.6, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id VC1-wR-DlQeW for <apps-discuss@ietfa.amsl.com>; Tue, 17 Sep 2013 13:22:14 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.22]) by ietfa.amsl.com (Postfix) with ESMTP id DA42811E81A7 for <apps-discuss@ietf.org>; Tue, 17 Sep 2013 13:22:13 -0700 (PDT)
Received: from [192.168.1.102] ([217.91.35.233]) by mail.gmx.com (mrgmx001) with ESMTPSA (Nemesis) id 0LjaEi-1VssZ63JM0-00bbig for <apps-discuss@ietf.org>; Tue, 17 Sep 2013 22:22:09 +0200
Message-ID: <5238B9E9.7010204@gmx.de>
Date: Tue, 17 Sep 2013 22:22:01 +0200
From: Julian Reschke <julian.reschke@gmx.de>
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0
MIME-Version: 1.0
To: "Murray S. Kucherawy" <superuser@gmail.com>, "apps-discuss@ietf.org" <apps-discuss@ietf.org>
References: <828708BA-E4BF-48DE-9E44-3C21063AA3D8@gmail.com>
In-Reply-To: <828708BA-E4BF-48DE-9E44-3C21063AA3D8@gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
X-Provags-ID: V03:K0:BKifSCCUGcR3lCRHUKSppPlp54+3pZMW1WKB9I6tqi2fTzdX3I3 0+0gM/8s8spSfePYfazuJRByFFPqXXj0plL1KZ3w55k1oJCQGNSNwW1F0grPN9RNas/15zE ZTVJFXOpBB1Y0COb5vAt3aUWcMnQ08vqVEa/GrHFrQXl1KRsHAV43B7NNBVEYT7UZmMX8qq 8KWI40rPONLWd202rvr0Q==
Subject: Re: [apps-discuss] Working Group Last Call: draft-ietf-appsawg-xml-mediatypes
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 17 Sep 2013 20:22:20 -0000

On 2013-07-29 10:08, Murray S. Kucherawy wrote:
> This note begins a Working Group Last Call for draft-ietf-appsawg-xml-mediatypes, ending on Friday, August 16.  Please provide reviews and comments on this list or privately to the authors as soon as possible.
> ...

Here's my late feedback (IETF, interim meetings, vacation, etc pp):

Updates: 4289, 6839 (if approved)

Really?

    Major differences from [RFC3023] are alignment of charset handling
    for text/xml and text/xml-external-parsed-entity with application/
    xml, the addition of XPointer and XML Base as fragment identifiers
    and base URIs, respectively, mention of the XPointer Registry, and
    updating of many references.

I don't think this needs to be in the Abstract. Also, references are 
discouraged here because the abstract should be usable stand-alone. So 
maybe move into the Introduction.


    document entities  The media types application/xml or text/xml MAY be
       used

s/used/used./

    Application/xml and application/xml-external-parsed-entity are
    recommended.  Compared to [RFC2376] or [RFC3023], this specification
    alters the charset handling of text/xml and text/xml-external-parsed-
    entity, treating them no differently from the respective application/
    types.  The reasons are as follows:

s/Application/application/

Also, avoid lowercase "recommended" it it's not a "RECOMMENDED".

       Conflicting specifications regarding the character encoding have
       caused confusion.  On the one hand, [RFC2046] specifies "The
       default character set, which must be assumed in the absence of a
       charset parameter, is US-ASCII.", [RFC2616] Section 3.7.1, defines
       that "media subtypes of the 'text' type are defined to have a
       default charset value of 'ISO-8859-1'", and [RFC2376] as well as
       [RFC3023] specify the default charset is US-ASCII.

I think this just repeats history already captureed in RFC 6557. Do we 
really need to repeat it over here?

       The current situation, reflected in this specification, has been
       simplified by [RFC6657] updating [RFC2046] to remove the US-ASCII
       default.  Furthermore, in accordance with [RFC6657]'s other
       recommendations, [HTTPbis] changes [RFC2616] by removing the
       ISO-8859-1 default and not defining any default at all.

This is a bit misleading as the change in httpbis predates RFC6657 
significantly.

       The top-level media type "text" has some restrictions on MIME
       entities and they are described in [RFC2045] and [RFC2046].  In
       particular, for transports other than HTTP [RFC2616] or HTTPS
       (which uses a MIME-like mechanism).  the UTF-16 family, UCS-4, and

It would be helpful if the reference to 2045/6 would be a bite more 
specific.

I'd also prefer to get rid of all RFC2616 references except when 
referring to the specification's history.

    However, developers of such media types are STRONGLY RECOMMENDED to
    use this specification as a basis for their registration.  In
    particular, the charset parameter, if used, MUST agree with the in-
    band XML encoding of the XML entity, as described in Section 3.6, in
    order to enhance interoperability.

There's no "STRONGLY" keyword. In general, I'd avoid to use BCP14 
keywords for recommendations to people.

    Encoding considerations:  This media type MAY be encoded as
       appropriate for the charset and the capabilities of the underlying
       MIME transport.  For 7-bit transports, data in either UTF-8 or

I don't understand the "MAY" here.

    Published specification:  Extensible Markup Language (XML) 1.0 (Fifth
       Edition) [XML], Extensible Markup Language (XML) 1.1 (Second
       Edition) [XML1.1].

OK, so I can use the same media type for both XML 1.0 and 1.1. However, 
the way this is phrased makes it appear as if XML 1.1 is somehow more 
... recent when in fact it was a dead-end.

I recommend dropping the references about 1.1 from everywhere, and just 
have a single place that points out that what's said about 1.0 is also 
true for 1.1.

    Interoperability considerations:  XML DTDs have proven to be
       interoperable by DTD authoring tools and XML browsers, among
       others.

What is an "XML browser"? If this is about web browsers I really have my 
doubts that they work interoperably :-)

    The charset parameter MUST only be used, when the charset is reliably
    known and agrees with the in-band XML encoding declaration.  This

s/used,/used/

Also, what if there is no in-band declaration?

    authoritatively the charset of the XML MIME entity.  The charset
    parameter can also be used to provide protocol-specific operations,
    such as charset-based content negotiation in HTTP.

That's misleading. charset-based content negotiation happens by use of 
Accept-Encoding, bot the charset parameter.

    There are several reasons that the charset parameter is optionally
    allowed.  First, recent web servers have been improved so that users

That text is 12 years old. We may want to drop or rephrase it :-)

    can specify the charset parameter.  Second, [RFC2130] (informative)
    specifies that the recommended specification scheme is the "charset"
    parameter.

That refers to a document from 1996. Is this really relevant here?

    On the other hand, it has been argued that the charset parameter
    should be omitted and the mechanism described in Appendix F of [XML]
    (which is non-normative) should be solely relied on.  This approach
    would allow users to avoid configuration of the charset parameter; an
    XML document stored in a file is likely to contain a correct encoding
    declaration or BOM (if necessary), since the operating system does
    not typically provide charset information for files.  If users would
    like to rely on the in-band XML encoding declaration or BOM and/or to
    conceal charset information from non-XML processors, they can omit
    the parameter.

This now is really the recommended approach, no? Maybe the whole of 
3.6.1 should be removed then.

    Uniform Resource Identifiers (URIs) may contain fragment identifiers
    (see Section 3.5 of [RFC3986]).  Likewise, Internationalized Resource
    Identifiers (IRIs) [RFC3987] may contain fragment identifiers.

s/may/can/

Also, the reference to RFC3987 really doesn't add anything useful here.

    See Section 8.1 for additional rquirements which apply when an XML-
    based MIME media type follows the naming convention '+xml'.

s/rquirenents/requirements/

    If [XPointerFramework] and [XPointerElement] are inappropriate for
    some XML-based media type, it SHOULD NOT follow the naming convention
    '+xml'.

Really? Why not? What about application/xhtml+xml?

    When a URI has a fragment identifier, it is encoded by a limited
    subset of the repertoire of US-ASCII [ASCII] characters, as defined
    in [RFC3986].  When an IRI contains a fragment identifier, it is
    encoded by a much wider repertoire of characters.  The conversion
    between IRI fragment identifiers and URI fragment identifiers is
    presented in Section 7 of [RFC3987].

I recommend to drop the IRI specific part. This is not specific to XML 
types.

    Note that the base URI may be embedded in a different MIME entity,
    since the default value for the xml:base attribute may be specified
    in an external DTD subset or external parameter entity.

s/may/might/ s/may/can/

    application/xml, application/xml-external-parsed-entity, and
    application/xml-dtd, text/xml and text/xml-external-parsed-entity are
    to be used with [XML]  In all examples herein where version="1.0" is

s/[XML]/[XML]./

    This specification recommends the use of a naming convention (a
    suffix of '+xml') for identifying XML-based MIME media types,

s/MIME// (there may be more instances of this)

    whatever their particular content may represent, in line with the

What is the "whatever their particular content may represent" about?

    When a new media type is introduced for an XML-based format, the name
    of the media type SHOULD end with '+xml'.  This convention will allow

Which may be in conflict with the SHOULD NOT I complained about earlier 
on :-)

       NOTE: Section 14.1 of HTTP [RFC2616] does not support Accept
       headers of the form "Accept: */*+xml" and so this header MUST NOT
       be used in this way.  Instead, content negotiation [RFC2703] could
       potentially be used if an XML-based MIME type were needed.

Please cite HTTPbis P2. Also, content negotiation is defined by HTTP, 
not RFC 2703.

    XML generic processing is not always appropriate for XML-based media
    types.  For example, authors of some such media types may wish that
    the types remain entirely opaque except to applications that are
    specifically designed to deal with that media type.  By NOT following
    the naming convention '+xml', such media types can avoid XML-generic
    processing.  Since generic processing will be useful in many cases,
    however -- including in some situations that are difficult to predict
    ahead of time -- those registering media types SHOULD use the '+xml'
    convention unless they have a particularly compelling reason not to.

I recommend to avoid the use of SHOULD here. Just explain the pros and cons.

    The registration process for specific '+xml' media types is described
    in [RFC6838] and [RFC6839].  The registrar for the IETF tree will

Just RFC6838, as far as I can tell.

    The use of the charset parameter is STRONGLY RECOMMENDED, since this
    information can be used by XML processors to determine
    authoritatively the charset of the XML MIME entity.  If there are
    some reasons not to follow this advice, they SHOULD be included as
    part of the registration.  As shown above, two such reasons are
    "UTF-8 only" or "UTF-8 or UTF-16 only".

That's misleading. People may read it as saying that the *presence* of 
the charset parameter is RECOMMENDED.

          In practice these constraints imply that for a fragment
          identifier addressed to an instance of a specific "xxx/yyy+xml"
          type, there are three cases:

             For fragment identifiers matching the syntax defined in
             Section 5, where the fragment identifier resolves per the
             rules specified there, then process as specified there;

Section 5 does not define the syntax (other then referencing XPointer). 
So this is a bit hard to process.

             For fragment identifiers _not_ matching the syntax defined
             in Section 5, then process as specified in "xxx/yyy+xml".

What would be an example for this case?

    All the examples below apply to all five media types declared above
    in Section 3, as well as to any media types declared using the '+xml'
    convention.  See the XML MIME entities table (Section 3, Paragraph 2)

Well, unless that type does not define the charset parameter, right?


    This section is non-normative.  In particular, note that all "MUST"
    language herein reproduces or summarizes the consequences of
    normative statement already made above, and have no independent
    normative force.

Can we avoid the use of MUST here, then? :-)

    Content-type charset: charset="utf-8"

Maybe it would be less confusing to say: "charset specified in 
content-type:"

    printable or base64.  For an 8-bit clean transport (e.g., 8BITMIME
    ESMTP or NNTP), or a binary clean transport (e.g., HTTP), no content-
    transfer-encoding is necessary.

...as HTTP does not even define content-transfer-encoding. (same applies 
to parts below)

    As described in [RFC2781], the UTF-16 family MUST NOT be used with
    media types under the top-level type "text" except over HTTP or HTTPS
    (see section 19.4.1 of [RFC2616] for details).  Hence this example is

Not sure how that section of 2616 is relevant here.

    Omitting the charset parameter is NOT RECOMMENDED for application/...
    when used with transports other than HTTP or HTTPS---text/... SHOULD
    NOT be used for 16-bit MIME with transports other than HTTP or HTTPS
    (see discussion above (Section 9.2, Paragraph 6)).

Please avoid uppercasing not-BCP14 keywords :-)

    Since the charset parameter is provided in the Content-Type header
    and differs from the XML encoding declaration, MIME and XML
    processors will not interoperate.  MIME processors will treat the
    enclosed entity as UTF-8 encoded.  That is, the "iso-8859-1" encoding
    will be ignored.  XML processors on the other hand will ignore the
    charset parameter and treat the XML entity as encoded in iso-8859-1.

Do we have a definition of "MIME processor"?

    As described in Section 8, this specification updates the [RFC6838]
    and [RFC6839]  registration process for XML-based MIME types.

My understanding is that the registration process is defined in 6838 only.

    the most dangerous option available to crackers is redefining default

s/crackers/attackers/

    Fourth, many references are updated, and the existence and relevance
    of XML 1.1 acknowledged.  Finally, a number of justifications and

As far as I can tell, XML 1.1 is totally irrelevant...


Best regards, Julian