Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-07.txt

ht@inf.ed.ac.uk (Henry S. Thompson) Fri, 14 February 2014 14:03 UTC

Return-Path: <ht@inf.ed.ac.uk>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1E9811A0269 for <apps-discuss@ietfa.amsl.com>; Fri, 14 Feb 2014 06:03:37 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.749
X-Spam-Level:
X-Spam-Status: No, score=-4.749 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RP_MATCHES_RCVD=-0.548, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UGJtdyYSfKjM for <apps-discuss@ietfa.amsl.com>; Fri, 14 Feb 2014 06:03:32 -0800 (PST)
Received: from treacle.ucs.ed.ac.uk (treacle.ucs.ed.ac.uk [129.215.16.102]) by ietfa.amsl.com (Postfix) with ESMTP id F1E341A026D for <apps-discuss@ietf.org>; Fri, 14 Feb 2014 06:03:31 -0800 (PST)
Received: from crunchie.inf.ed.ac.uk (crunchie.inf.ed.ac.uk [129.215.33.180]) by treacle.ucs.ed.ac.uk (8.13.8/8.13.4) with ESMTP id s1EE3E8a009653; Fri, 14 Feb 2014 14:03:19 GMT
Received: from troutbeck.inf.ed.ac.uk (troutbeck.inf.ed.ac.uk [129.215.25.32]) by crunchie.inf.ed.ac.uk (8.14.4/8.14.4) with ESMTP id s1EE3Duo010150; Fri, 14 Feb 2014 14:03:13 GMT
Received: from troutbeck.inf.ed.ac.uk (localhost [127.0.0.1]) by troutbeck.inf.ed.ac.uk (8.14.4/8.14.4) with ESMTP id s1EE3DEb016772; Fri, 14 Feb 2014 14:03:13 GMT
Received: (from ht@localhost) by troutbeck.inf.ed.ac.uk (8.14.4/8.14.4/Submit) id s1EE3Cs0016768; Fri, 14 Feb 2014 14:03:12 GMT
X-Authentication-Warning: troutbeck.inf.ed.ac.uk: ht set sender to ht@inf.ed.ac.uk using -f
To: Tim Bray <tbray@textuality.com>
References: <20140206183642.28098.24139.idtracker@ietfa.amsl.com> <f5bsirvjf27.fsf@troutbeck.inf.ed.ac.uk> <CAHBU6iuTLWFDV-2qM-FDMQeK1ONS8x4hUOGg2ssYRXQGnTZNTA@mail.gmail.com> <f5bwqh3b7ry.fsf@troutbeck.inf.ed.ac.uk> <CAHBU6iv8nS6Qh+98udWCEFc=U8WAiAFhoFsoEnxAzoYHpzyYvg@mail.gmail.com>
From: ht@inf.ed.ac.uk
Date: Fri, 14 Feb 2014 14:03:12 +0000
In-Reply-To: <CAHBU6iv8nS6Qh+98udWCEFc=U8WAiAFhoFsoEnxAzoYHpzyYvg@mail.gmail.com> (Tim Bray's message of "Tue\, 11 Feb 2014 10\:53\:54 -0800")
Message-ID: <f5b4n41vljz.fsf@troutbeck.inf.ed.ac.uk>
User-Agent: Gnus/5.101 (Gnus v5.10.10) XEmacs/21.5-b33 (linux)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Edinburgh-Scanned: at treacle.ucs.ed.ac.uk with MIMEDefang 2.60, Sophie, Sophos Anti-Virus, Clam AntiVirus
X-Scanned-By: MIMEDefang 2.60 on 129.215.16.102
Archived-At: http://mailarchive.ietf.org/arch/msg/apps-discuss/BrqYCE2ivW3JinE7zWKqblBVWAM
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-07.txt
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Feb 2014 14:03:37 -0000

Tim Bray writes:

> [wrt 2.2]
> How about something like
>
> [UNICODE] defines several “encoding forms”, namely UTF-8, UTF-16, and
> UTF-32.  UTF-8 is a serialization. Note that UTF-16 XML documents may be
> serialised into MIME entities in ... [also loses the information-free
> sentence about the spec following the “precedent”]

How about

  [UNICODE] defines three "encoding forms", namely UTF-8, UTF-16, and
  UTF-32. As UTF-8 can only be serialized in one way, the only possible
  label for UTF-8-encoded documents when serialised into MIME entities
  is "utf-8".  UTF-16 XML documents, however, can be serialised into
  MIME entities in one of two ways: either big- endian, labelled
  (optionally) "utf-16" or "utf-16be", or little- endian, labelled
  (optionally) "utf-16" or "utf-16le".

and add the following (removing the earlier verrsion from 3.1), per my
reply to SM:

  UTF-32 has four potential serializations, of which only two
  (UTF-32BE and UTF-32LE) are given names in in [UNICODE]. Support
  for the various serializations varies widely, and security concerns
  about their use have been raised.  The use of UTF-32 is NOT
  RECOMMENDED for XML MIME entities.

>> > I also think I disagree with my guess as what it’s trying to
>> > say.  I tend to think the tools are going to do a better job of figuring
>> > out the right charset labeling than your typical document author.
>>
>> Really?  Neither of the above-mentioned tools will do the 'right'
>> thing with my XHTML by default, for example.
>>
>
> Ah, I got it finally; as the other thread said, what this is *really*
> talking about is configuring your web server and so on.  So this is OK,
> except for I think the word “author” is misleading since document authors
> shouldn’t be expected to understand Unicode encodings or webserver
> considerations.   So maybe something like:
>
> XML MIME producers are RECOMMENDED to provide means to control what value,
> if any, is given to charset parameters for XML MIME entities, for example
> by enabling Web server configuration of filename-to-Content-Type-header
> mappings on a file- by-file or suffix basis.

Thanks, that works.

>> >  What does being “authoritative” mean concretely? Is it the RFC’s
>> > recommendation that the receiver SHOULD refuse to parse the the XML even
>> > though it could?  If so, we should say so explicitly.
>>
>> No -- 'authoritative' means 'answers the question "how to determine
>> the encoding with which to attempt to process the entity"'.  So the
>> RFC is telling you how to process the entity, which is its job, after
>> all.
>>
>
> RIght, but it feels bizarre and sort of against the spirit of Postel’s law
> to remain completely silent about what happens when there are conflicts.  I
> suggest you simply say that in the case of conflict, interoperability can
> suffer since the observed behavior of receiving software is unpredictable.
>  This reinforces your central thrust, which is: Don’t do this.

OK - will do, by clarifying that by 'authoritative' is meant 'do it
this way', while acknowledging that this will not (cannot) _always_ do
the 'right' thing.

>> > Then in the example in 9.8, draft says “all processors will treat the
>> > enclosed entity as iso-8859-1 encoded.   That is, the "UTF-8" encoding
>> > declaration will be ignored.”  Is this really true in practice?  I
>> suspect
>> > not; so perhaps you should say “all processors which conform to this
>> > specification will”.
>>
>> I could make it 'conformant processors', but the whole point of a
>> media type specification is to describe the behaviour of conformant
>> processors. . .
>>
>
> It bothers me that the assertion, as stated, is simply wrong, and I don’t
> think RFCs should contain assertions which are empirically false.   It’s
> fairly common in the RFCs that I say to note that conformant
> implementations will do thus and so.

Happy to make the change.  Will see if it is needed/feels right in
other examples.

ht
-- 
       Henry S. Thompson, School of Informatics, University of Edinburgh
      10 Crichton Street, Edinburgh EH8 9AB, SCOTLAND -- (44) 131 650-4440
                Fax: (44) 131 650-4587, e-mail: ht@inf.ed.ac.uk
                       URL: http://www.ltg.ed.ac.uk/~ht/
 [mail from me _always_ has a .sig like this -- mail without it is forged spam]