Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-07.txt

Tim Bray <tbray@textuality.com> Tue, 11 February 2014 18:53 UTC

MIME-Version: 1.0
In-Reply-To: <f5bwqh3b7ry.fsf@troutbeck.inf.ed.ac.uk>
References: <20140206183642.28098.24139.idtracker@ietfa.amsl.com> <f5bsirvjf27.fsf@troutbeck.inf.ed.ac.uk> <CAHBU6iuTLWFDV-2qM-FDMQeK1ONS8x4hUOGg2ssYRXQGnTZNTA@mail.gmail.com> <f5bwqh3b7ry.fsf@troutbeck.inf.ed.ac.uk>
Date: Tue, 11 Feb 2014 10:53:54 -0800
Message-ID: <CAHBU6iv8nS6Qh+98udWCEFc=U8WAiAFhoFsoEnxAzoYHpzyYvg@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
Content-Type: multipart/alternative; boundary="047d7b3a8456546f4404f225fa14"
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-07.txt
Precedence: list

On Mon, Feb 10, 2014 at 2:07 AM, Henry S. Thompson <ht@inf.ed.ac.uk> wrote:

>
> > 2.2: “[UNICODE] defines three "encoding forms", which are independent of
> > serialization” - what does “independent of serialization” mean?  I think
> > the UTF-* are actually serializations of unicode codepoints.  I suppose
> > UTF-16 is sort-of semi-independent of serialization, but UTF-8 never is.
>
> The three encoding forms (UTF-8, -16 and -32) allow for at least 7
> serializations between them.  And the text does go on to say that
> UTF-8 has only one serialization.  So I don't think there's anything
> wrong here, and it is following the UNICODE spec itself.
>

The sentence “[UNICODE] defines three "encoding forms", which are
independent of serialization, namely UTF-8, UTF-16 and UTF-32. ” is really
horribly misleading. UTF-8 is a serialization. UTF-16 and -32 are not
independent of serialization in the slightest, you can’t process them
successfully unless you know whether you’re looking at the _LE or _BE
serializations.  How about something like

[UNICODE] defines several “encoding forms”, namely UTF-8, UTF-16, and
UTF-32.  UTF-8 is a serialization. Note that UTF-16 XML documents may be
serialised into MIME entities in ... [also loses the information-free
sentence about the spec following the “precedent”]

> > I also think I disagree with my guess as what it’s trying to
> > say.  I tend to think the tools are going to do a better job of figuring
> > out the right charset labeling than your typical document author.
>
> Really?  Neither of the above-mentioned tools will do the 'right'
> thing with my XHTML by default, for example.
>

Ah, I got it finally; as the other thread said, what this is *really*
talking about is configuring your web server and so on.  So this is OK,
except for I think the word “author” is misleading since document authors
shouldn’t be expected to understand Unicode encodings or webserver
considerations.   So maybe something like:

XML MIME producers are RECOMMENDED to provide means to control what value,
if any, is given to charset parameters for XML MIME entities, for example
by enabling Web server configuration of filename-to-Content-Type-header
mappings on a file- by-file or suffix basis.

> >  What does being “authoritative” mean concretely? Is it the RFC’s
> > recommendation that the receiver SHOULD refuse to parse the the XML even
> > though it could?  If so, we should say so explicitly.
>
> No -- 'authoritative' means 'answers the question "how to determine
> the encoding with which to attempt to process the entity"'.  So the
> RFC is telling you how to process the entity, which is its job, after
> all.
>

RIght, but it feels bizarre and sort of against the spirit of Postel’s law
to remain completely silent about what happens when there are conflicts.  I
suggest you simply say that in the case of conflict, interoperability can
suffer since the observed behavior of receiving software is unpredictable.
 This reinforces your central thrust, which is: Don’t do this.

> > Then in the example in 9.8, draft says “all processors will treat the
> > enclosed entity as iso-8859-1 encoded.   That is, the "UTF-8" encoding
> > declaration will be ignored.”  Is this really true in practice?  I
> suspect
> > not; so perhaps you should say “all processors which conform to this
> > specification will”.
>
> I could make it 'conformant processors', but the whole point of a
> media type specification is to describe the behaviour of conformant
> processors. . .
>

It bothers me that the assertion, as stated, is simply wrong, and I don’t
think RFCs should contain assertions which are empirically false.   It’s
fairly common in the RFCs that I say to note that conformant
implementations will do thus and so.

>

[apps-discuss] I-D Action: draft-ietf-appsawg-xml… internet-drafts
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Tim Bray
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Tim Bray
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Tim Bray