Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-07.txt

ht@inf.ed.ac.uk (Henry S. Thompson) Mon, 10 February 2014 10:07 UTC

Return-Path: <ht@inf.ed.ac.uk>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 601EC1A07D5 for <apps-discuss@ietfa.amsl.com>; Mon, 10 Feb 2014 02:07:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.749
X-Spam-Level:
X-Spam-Status: No, score=-4.749 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.548, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id p19E6523kDNs for <apps-discuss@ietfa.amsl.com>; Mon, 10 Feb 2014 02:07:48 -0800 (PST)
Received: from treacle.ucs.ed.ac.uk (treacle.ucs.ed.ac.uk [129.215.16.102]) by ietfa.amsl.com (Postfix) with ESMTP id BDA8F1A05DE for <apps-discuss@ietf.org>; Mon, 10 Feb 2014 02:07:46 -0800 (PST)
Received: from crunchie.inf.ed.ac.uk (crunchie.inf.ed.ac.uk [129.215.33.180]) by treacle.ucs.ed.ac.uk (8.13.8/8.13.4) with ESMTP id s1AA7VQT008002; Mon, 10 Feb 2014 10:07:36 GMT
Received: from troutbeck.inf.ed.ac.uk (troutbeck.inf.ed.ac.uk [129.215.25.32]) by crunchie.inf.ed.ac.uk (8.14.4/8.14.4) with ESMTP id s1AA7UmZ002947; Mon, 10 Feb 2014 10:07:30 GMT
Received: from troutbeck.inf.ed.ac.uk (localhost [127.0.0.1]) by troutbeck.inf.ed.ac.uk (8.14.4/8.14.4) with ESMTP id s1AA7Tk3022249; Mon, 10 Feb 2014 10:07:29 GMT
Received: (from ht@localhost) by troutbeck.inf.ed.ac.uk (8.14.4/8.14.4/Submit) id s1AA7ThM022245; Mon, 10 Feb 2014 10:07:29 GMT
X-Authentication-Warning: troutbeck.inf.ed.ac.uk: ht set sender to ht@inf.ed.ac.uk using -f
To: Tim Bray <tbray@textuality.com>
References: <20140206183642.28098.24139.idtracker@ietfa.amsl.com> <f5bsirvjf27.fsf@troutbeck.inf.ed.ac.uk> <CAHBU6iuTLWFDV-2qM-FDMQeK1ONS8x4hUOGg2ssYRXQGnTZNTA@mail.gmail.com>
From: ht@inf.ed.ac.uk
Date: Mon, 10 Feb 2014 10:07:29 +0000
In-Reply-To: <CAHBU6iuTLWFDV-2qM-FDMQeK1ONS8x4hUOGg2ssYRXQGnTZNTA@mail.gmail.com> (Tim Bray's message of "Sat\, 8 Feb 2014 11\:46\:01 -0800")
Message-ID: <f5bwqh3b7ry.fsf@troutbeck.inf.ed.ac.uk>
User-Agent: Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b33 (linux)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Edinburgh-Scanned: at treacle.ucs.ed.ac.uk with MIMEDefang 2.60, Sophie, Sophos Anti-Virus, Clam AntiVirus
X-Scanned-By: MIMEDefang 2.60 on 129.215.16.102
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-07.txt
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 10 Feb 2014 10:07:51 -0000

Tim Bray writes:

> Disclosure: This is the first draft of xml-mediatypes I’ve read in years,
> so probably is a little lacking in context.

Thanks for taking the time to review.

> 2.2: “[UNICODE] defines three "encoding forms", which are independent of
> serialization” - what does “independent of serialization” mean?  I think
> the UTF-* are actually serializations of unicode codepoints.  I suppose
> UTF-16 is sort-of semi-independent of serialization, but UTF-8 never is.

The three encoding forms (UTF-8, -16 and -32) allow for at least 7
serializations between them.  And the text does go on to say that
UTF-8 has only one serialization.  So I don't think there's anything
wrong here, and it is following the UNICODE spec itself.

> 2nd last para of 3.1, beginning “XML MIME producers are RECOMMENDED to
> provide means for XML MIME entity authors to determine what value” baffles
> me.  I just read it 3 times and I don’t get it.   Could we have an example
> or something?

Sure, sorry if this was obscure.  I was trying to avoid having to give
an application-specific example, but .htaccess AddType or IIS Mime
type configuration is what this is about.  I'll try to improve this.

> I also think I disagree with my guess as what it’s trying to
> say.  I tend to think the tools are going to do a better job of figuring
> out the right charset labeling than your typical document author.

Really?  Neither of the above-mentioned tools will do the 'right'
thing with my XHTML by default, for example.

> The crucial “ this specification sets the priority as follows:” indented
> para in 3.2.  I think a little more is needed. The crucial corner case is
> when you’ve got a MIME-header charset that is just wrong but an XML-aware
> receiver can in fact sort things out based on the encoding declaration.

There's nothing this document, or any processor, can do to always get
it right when there is conflicting encoding information.  The overall
aim of section 3 is to focus on MIME, and to push hard on the
proposition that a charset parameter should only be supplied if it is
known to be correct.  It is then consistent to say that when it _is_
present (and there is no BOM), it is authoritative.

>  What does being “authoritative” mean concretely? Is it the RFC’s
> recommendation that the receiver SHOULD refuse to parse the the XML even
> though it could?  If so, we should say so explicitly.

No -- 'authoritative' means 'answers the question "how to determine
the encoding with which to attempt to process the entity"'.  So the
RFC is telling you how to process the entity, which is its job, after
all.

> Then in the example in 9.8, draft says “all processors will treat the
> enclosed entity as iso-8859-1 encoded.   That is, the "UTF-8" encoding
> declaration will be ignored.”  Is this really true in practice?  I suspect
> not; so perhaps you should say “all processors which conform to this
> specification will”.

I could make it 'conformant processors', but the whole point of a
media type specification is to describe the behaviour of conformant
processors. . .

>  Hm, or perhaps the real issue is that in this case, you can’t
> predict what will happen; some implementations will ignore the MIME
> header, others will drop-kick the XML because of the inconsistency.

In practice, that's right.  The crucial point, as section 3 concludes
by emphasising, this whole problem _only_ arises when a non-conforming
_producer_ has screwed up.  The primary thrust of the spec is to
clarify what producers must to, so that the practical relevance of the
impossibility of consumers getting it right is minimised.

> Section 3.3 typo, “thatUTF-16”, space needed, also “entitiesnot” in the
> same sentence.
>
> Also some spacing problems in the NOTE: in 8.1

Thanks, will be fixed in next draft.

ht
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]