Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-07.txt

ht@inf.ed.ac.uk (Henry S. Thompson) Fri, 14 February 2014 14:03 UTC

To: Tim Bray <tbray@textuality.com>
References: <20140206183642.28098.24139.idtracker@ietfa.amsl.com> <f5bsirvjf27.fsf@troutbeck.inf.ed.ac.uk> <CAHBU6iuTLWFDV-2qM-FDMQeK1ONS8x4hUOGg2ssYRXQGnTZNTA@mail.gmail.com> <f5bwqh3b7ry.fsf@troutbeck.inf.ed.ac.uk> <CAHBU6iv8nS6Qh+98udWCEFc=U8WAiAFhoFsoEnxAzoYHpzyYvg@mail.gmail.com>
From: ht@inf.ed.ac.uk
Date: Fri, 14 Feb 2014 14:03:12 +0000
In-Reply-To: <CAHBU6iv8nS6Qh+98udWCEFc=U8WAiAFhoFsoEnxAzoYHpzyYvg@mail.gmail.com> (Tim Bray's message of "Tue\, 11 Feb 2014 10\:53\:54 -0800")
Message-ID: <f5b4n41vljz.fsf@troutbeck.inf.ed.ac.uk>
User-Agent: Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b33 (linux)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: http://mailarchive.ietf.org/arch/msg/apps-discuss/BrqYCE2ivW3JinE7zWKqblBVWAM
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-07.txt
Precedence: list

Tim Bray writes:

> [wrt 2.2]
> How about something like
>
> [UNICODE] defines several “encoding forms”, namely UTF-8, UTF-16, and
> UTF-32.  UTF-8 is a serialization. Note that UTF-16 XML documents may be
> serialised into MIME entities in ... [also loses the information-free
> sentence about the spec following the “precedent”]

How about

  [UNICODE] defines three "encoding forms", namely UTF-8, UTF-16, and
  UTF-32. As UTF-8 can only be serialized in one way, the only possible
  label for UTF-8-encoded documents when serialised into MIME entities
  is "utf-8".  UTF-16 XML documents, however, can be serialised into
  MIME entities in one of two ways: either big- endian, labelled
  (optionally) "utf-16" or "utf-16be", or little- endian, labelled
  (optionally) "utf-16" or "utf-16le".

and add the following (removing the earlier verrsion from 3.1), per my
reply to SM:

  UTF-32 has four potential serializations, of which only two
  (UTF-32BE and UTF-32LE) are given names in in [UNICODE]. Support
  for the various serializations varies widely, and security concerns
  about their use have been raised.  The use of UTF-32 is NOT
  RECOMMENDED for XML MIME entities.

>> > I also think I disagree with my guess as what it’s trying to
>> > say.  I tend to think the tools are going to do a better job of figuring
>> > out the right charset labeling than your typical document author.
>>
>> Really?  Neither of the above-mentioned tools will do the 'right'
>> thing with my XHTML by default, for example.
>>
>
> Ah, I got it finally; as the other thread said, what this is *really*
> talking about is configuring your web server and so on.  So this is OK,
> except for I think the word “author” is misleading since document authors
> shouldn’t be expected to understand Unicode encodings or webserver
> considerations.   So maybe something like:
>
> XML MIME producers are RECOMMENDED to provide means to control what value,
> if any, is given to charset parameters for XML MIME entities, for example
> by enabling Web server configuration of filename-to-Content-Type-header
> mappings on a file- by-file or suffix basis.

Thanks, that works.

>> >  What does being “authoritative” mean concretely? Is it the RFC’s
>> > recommendation that the receiver SHOULD refuse to parse the the XML even
>> > though it could?  If so, we should say so explicitly.
>>
>> No -- 'authoritative' means 'answers the question "how to determine
>> the encoding with which to attempt to process the entity"'.  So the
>> RFC is telling you how to process the entity, which is its job, after
>> all.
>>
>
> RIght, but it feels bizarre and sort of against the spirit of Postel’s law
> to remain completely silent about what happens when there are conflicts.  I
> suggest you simply say that in the case of conflict, interoperability can
> suffer since the observed behavior of receiving software is unpredictable.
>  This reinforces your central thrust, which is: Don’t do this.

OK - will do, by clarifying that by 'authoritative' is meant 'do it
this way', while acknowledging that this will not (cannot) _always_ do
the 'right' thing.

>> > Then in the example in 9.8, draft says “all processors will treat the
>> > enclosed entity as iso-8859-1 encoded.   That is, the "UTF-8" encoding
>> > declaration will be ignored.”  Is this really true in practice?  I
>> suspect
>> > not; so perhaps you should say “all processors which conform to this
>> > specification will”.
>>
>> I could make it 'conformant processors', but the whole point of a
>> media type specification is to describe the behaviour of conformant
>> processors. . .
>>
>
> It bothers me that the assertion, as stated, is simply wrong, and I don’t
> think RFCs should contain assertions which are empirically false.   It’s
> fairly common in the RFCs that I say to note that conformant
> implementations will do thus and so.

Happy to make the change.  Will see if it is needed/feels right in
other examples.

ht
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]

[apps-discuss] I-D Action: draft-ietf-appsawg-xml… internet-drafts
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Tim Bray
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Tim Bray
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Henry S. Thompson
Re: [apps-discuss] I-D Action: draft-ietf-appsawg… Tim Bray