Re: [apps-discuss] Working Group Last Call: draft-ietf-appsawg-xml-mediatypes

Erik Wilde <dret@berkeley.edu> Tue, 15 October 2013 07:45 UTC

Return-Path: <dret@berkeley.edu>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 08DCA21E80B2 for <apps-discuss@ietfa.amsl.com>; Tue, 15 Oct 2013 00:45:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -5.999
X-Spam-Level:
X-Spam-Status: No, score=-5.999 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, J_CHICKENPOX_34=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2VdYWamv941y for <apps-discuss@ietfa.amsl.com>; Tue, 15 Oct 2013 00:45:53 -0700 (PDT)
Received: from cm06fe.IST.Berkeley.EDU (cm06fe.IST.Berkeley.EDU [169.229.218.147]) by ietfa.amsl.com (Postfix) with ESMTP id E167721E80AE for <apps-discuss@ietf.org>; Tue, 15 Oct 2013 00:45:50 -0700 (PDT)
Received: from rrcs-173-197-107-11.west.biz.rr.com ([173.197.107.11] helo=dretair.local) by cm06fe.ist.berkeley.edu with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.76) (auth plain:dret@berkeley.edu) (envelope-from <dret@berkeley.edu>) id 1VVzK8-0005ni-K6; Tue, 15 Oct 2013 00:45:50 -0700
Message-ID: <525CF2A8.2090904@berkeley.edu>
Date: Mon, 14 Oct 2013 21:45:44 -1000
From: Erik Wilde <dret@berkeley.edu>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130801 Thunderbird/17.0.8
MIME-Version: 1.0
To: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
References: <828708BA-E4BF-48DE-9E44-3C21063AA3D8@gmail.com> <5238B9E9.7010204@gmx.de>
In-Reply-To: <5238B9E9.7010204@gmx.de>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: Julian Reschke <julian.reschke@gmx.de>
Subject: Re: [apps-discuss] Working Group Last Call: draft-ietf-appsawg-xml-mediatypes
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Oct 2013 07:45:58 -0000

On 2013-09-17 10:22 , Julian Reschke wrote:
> On 2013-07-29 10:08, Murray S. Kucherawy wrote:
>> This note begins a Working Group Last Call for
>> draft-ietf-appsawg-xml-mediatypes, ending on Friday, August 16.
>> Please provide reviews and comments on this list or privately to the
>> authors as soon as possible.
> Here's my late feedback (IETF, interim meetings, vacation, etc pp):

here's my even later feedback for draft-ietf-appsawg-xml-mediatypes

first of all, i'd like to thank julian for his very thorough review, and 
i think for all the charset-related issues, he's definitely in a much 
better position to provide feedback than i am. therefore i am supporting 
the comments julian made in his review, and my review is focusing more 
on the XML side of things.

> Network Working Group                                          C. Lilley
> Internet-Draft                                                       W3C
> Obsoletes: 3023 (if approved)                                  M. Murata
> Updates: 4289, 6839 (if approved)      International University of Japan

i guess i have the same question here as julian: why would it update RFC 
6839, instead of just referencing it? if this is just the technicality 
of RFC 6839 referencing this draft, then that would probably answer my 
question.

>    This specification standardizes three media types -- application/xml,
>    application/xml-external-parsed-entity, and application/xml-dtd --
>    for use in exchanging network entities that are related to the
>    Extensible Markup Language (XML) while defining text/xml and text/
>    xml-external-parsed-entity as aliases for the respective application/
>    types.  This specification also standardizes a convention (using the
>    suffix '+xml') for naming media types outside of these five types
>    when those media types represent XML MIME entities.

isn't that convention standardized by RFC 6839 already? i guess the 
problem is that instead of defining a registry that can be updated, any 
change to the suffixes needs to update RFC 6839? maybe rephrase this to 
say "updates the convention", because when reading this abstract, people 
knowing RFC 6839 may be wondering what's going on.

>    Major differences from [RFC3023] are alignment of charset handling
>    for text/xml and text/xml-external-parsed-entity with application/
>    xml, the addition of XPointer and XML Base as fragment identifiers
>    and base URIs, respectively, mention of the XPointer Registry, and
>    updating of many references.

agree with julian that this is should not be part of the abstract. it's 
useful and probably could be easily moved to someplace else.

> 3.  XML Media Types
>    This specification standardizes three media types related to XML MIME
>    entities: application/xml (with text/xml as an alias), application/
>    xml-external-parsed-entity (with text/xml-external-parsed-entity as
>    an alias), and application/xml-dtd.  Registration information for
>    these media types is described in the sections below.

it would be useful to add application/xsd+xml in the updated spec, since 
XSD does not have its own media type.

>       The top-level media type "text" has some restrictions on MIME
>       entities and they are described in [RFC2045] and [RFC2046].  In
>       particular, for transports other than HTTP [RFC2616] or HTTPS
>       (which uses a MIME-like mechanism).  the UTF-16 family, UCS-4, and
>       UTF-32 are not allowed However, section 4.3.3 of [XML] says:

s/mechanism). the/mechanism), the

i am not quite understanding the paranthesis after HTTPS. isn't HTTP 
doing exactly the same as HTTP, only over a safe transport? what does 
"which uses a MIME-like mechanism" refer to?

>    XML provides a general framework for defining sequences of structured
>    data.  In some cases, it may be desirable to define new media types
>    that use XML but define a specific application of XML, perhaps due to
>    domain-specific display, editing, security considerations or runtime
>    information.

the "perhaps" part seems a bit modest. there's quite a large set of web 
service designers thinking that unless you are exposing generic XML 
facilities (such as an XML database), you shouldn't be using a generic 
XML media type. so i would not make this sound as restricted as it is 
sounding now.

>    Interoperability considerations:  XML has proven to be interoperable
>       across both generic and task-specific applications and for import
>       and export from multiple XML authoring and editting tools.  For

s/editting/editing/

>    Applications that use this media type:  XML is device-, platform-,
>       and vendor-neutral and is supported by a wide range of generic XML
>       tools (editors, parsers, Web agents, ...) and task-specific
>       applications.

i am not sure this needs to be spelled out, but in between generic XML 
and task-specific applications, there's a large set of generic XML-based 
formats (such as Atom) which does not really seem to fit well into this 
characterization of applications?

>          Although no byte sequences can be counted on to always be
>          present, XML MIME entities in ASCII-compatible charsets

s/charsets/character sets/

>          (including UTF-8) often begin with hexadecimal 3C 3F 78 6D 6C
>          ("<?xml"), and those in UTF-16 often begin with hexadecimal FE
>          FF 00 3C 00 3F 00 78 00 6D 00 6C or FF FE 3C 00 3F 00 78 00 6D
>          00 6C 00 (the Byte Order Mark (BOM) followed by "<?xml").  For
>          more information, see Appendix F of [XML].

maybe turn the reference [XML] into [XML1.0], so that it's always easy 
to see which version you're referencing?

>    The syntax and semantics of fragment identifiers for the XML media
>    types defined in this specification are based on the
>    [XPointerFramework] W3C Recommendation.  It allows simple names, and
>    more complex constructions based on named schemes.  When the syntax
>    of a fragment identifier part of any URI or IRI with a retrieved
>    media type governed by this specification conforms to the syntax
>    specified in [XPointerFramework], conformant applications MUST

not sure: s/conformant/conforming/

>    interpret such fragment identifiers as designating that part of the
>    retrieved representation specified by [XPointerFramework] and
>    whatever other specifications define any XPointer schemes used.
>    Conformant applications MUST support the 'element' scheme as defined

again, not sure: s/Conformant/Conforming/

>    See Section 8.1 for additional rquirements which apply when an XML-

s/rquirements/requirements/

>    When a URI has a fragment identifier, it is encoded by a limited
>    subset of the repertoire of US-ASCII [ASCII] characters, as defined
>    in [RFC3986].  When an IRI contains a fragment identifier, it is

this reads a bit odd. what about saying:

"URIs [RFC3986] are encoded in a limited subset of the repertoire of 
US-ASCII [ASCII], and therefore this encoding applies to fragment 
identifier parts of URIs as well."

an editorial note: sometime, the text is written like this:

- "For more information, see Appendix F of [XML]"

and sometimes like this:

- "a limited subset of the repertoire of US-ASCII [ASCII]"

maybe make the entire text consistent by either using references as a 
noun or not.

>    Section 5.1 of [RFC3986] specifies that the semantics of a relative
>    URI reference embedded in a MIME entity is dependent on the base URI.
>    The base URI is either (1) the base URI embedded in context, (2) the
>    base URI from the encapsulating entity, (3) the base URI from the
>    Retrieval URI, or (4) the default base URI, where (1) has the highest
>    precedence.

s/where (1) has the highest precedence./sorted by declining precedence./

>    The media type dependent mechanism for embedding the base URI in a
>    MIME entity of type application/xml, text/xml, application/xml-
>    external-parsed-entity or text/xml-external-parsed-entity is to use
>    the xml:base attribute described in detail in [XBase].

maybe rename the reference to [XMLBase] to reflect the name of the spec?

not that i think this matters in practice, but this means that it is 
impossible to use anything other than XML Base for this, right? XML 
itself really does not say anything about it, so it seems than text/xml 
goes a little bit in the direction of the infoset set, specifying 
additional (but fewer) constraints. is the intention to make this a 
MUST? if so, wouldn't it be appropriate to use normative language? and 
if not, wouldn't it be good to say that using XML Base probably is a 
Really Good Idea, but not actually required?

>    Note that the base URI may be embedded in a different MIME entity,
>    since the default value for the xml:base attribute may be specified
>    in an external DTD subset or external parameter entity.

so this means that the base URI changes depending on whether the used 
XML processor is validating or not, right? maybe it would be worth 
spelling this out, because it may come as a surprise to some.

>    MIME entities by comparing the subtype to the pattern '*/*+xml'.  (Of
>    course, 4 of the 5 media types defined in this specification -- text/
>    xml, application/xml, text/xml-external-parsed-entity, and
>    application/xml-external-parsed-entity -- also represent XML MIME
>    entities while not conforming to the '*/*+xml' pattern.)

maybe: s/Of Course/For historical reasons/

>    The registration process for specific '+xml' media types is described
>    in [RFC6838] and [RFC6839].

just [RFC6838], i think.

> 8.2.  +xml Structured Syntax Suffix Registration

maybe a formality, but does it need to be registered if it has been an 
integral part of RFC 6839, and will be updated by RFC 3032bis? doesn't 
this update mean it exists without having to be registered?

>    For application... cases, if sent using a 7-bit transport (e.g.,

s/application.../application\/.../

>    Omitting the charset parameter is NOT RECOMMENDED for application/...
>    when used with transports other than HTTP or HTTPS---text/... SHOULD
>    NOT be used for 16-bit MIME with transports other than HTTP or HTTPS
>    (see discussion above (Section 9.2, Paragraph 6)).

s/HTTPS---text/HTTPS. text/

>    XML MIME entities contain information which may be parsed and further
>    processed by the recipient's XML system.

s/XML system/system/

>    These entities may contain
>    and such systems may permit explicit system level commands to be
>    executed while processing the data.  To the extent that an XML system

s/XML system/XML-based system/

i guess i am struggling here to understand what "XML system" is supposed 
to mean. just the XML processing parts? or the complete system that 
works with XML-based data? clarifying this upfront might help.

>    will execute arbitrary command strings, recipients of XML MIME
>    entities may be a risk.  In general, it may be possible to specify
>    commands that perform unauthorized file operations or make changes to
>    the display processor's environment that affect subsequent
>    operations.

where do these two rather specific command types (file access, display 
processor) come from? probably from resolving references within XML 
content, and trying to render it, but maybe either make that a little 
more explicit, or keep the warning more general?

>    The simplest attack involves adding declarations that break
>    validation.  Adding extraneous declarations to a list of character
>    XML-entities can effectively "break the contract" used by documents.
>    A tiny change that produces a fatal error in a DTD could halt XML
>    processing on a large scale.  Extraneous declarations are fairly
>    obvious, but more sophisticated tricks, like changing attributes from
>    being optional to required, can be difficult to track down.  Perhaps
>    the most dangerous option available to crackers is redefining default
>    values for attributes: e.g., if developers have relied on defaulted
>    attributes for security, a relatively small change might expose
>    enormous quantities of information.

this of course only matters if the processing model actually uses a 
schema language that supports default values. that in itself is 
something that has been discussed for a long time in terms of benefits 
and risks. maybe it would be worth pointing out that the security 
problem only exists when a certain processing model (in this case 
defined by the choice of schema language) is chosen.

>    Apart from the structural possibilities, another option, "XML-entity
>    spoofing," can be used to insert text into documents, vandalizing and
>    perhaps conveying an unintended message.  Because XML permits
>    multiple XML-entity declarations, and the first declaration takes
>    precedence, it's possible to insert malicious content where an XML-

s/it's/it is/

kind regards,

dret.

-- 
erik wilde | mailto:dret@berkeley.edu  -  tel:+1-510-2061079 |
            | UC Berkeley  -  School of Information (ISchool) |
            | http://dret.net/netdret http://twitter.com/dret |