Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-07.txt

Tim Bray <tbray@textuality.com> Tue, 11 February 2014 18:53 UTC

Return-Path: <tbray@textuality.com>
X-Original-To: apps-discuss@ietfa.amsl.com
Delivered-To: apps-discuss@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C26841A06D0 for <apps-discuss@ietfa.amsl.com>; Tue, 11 Feb 2014 10:53:57 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.977
X-Spam-Level:
X-Spam-Status: No, score=-1.977 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-0.7] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 6rGCfBWFttSR for <apps-discuss@ietfa.amsl.com>; Tue, 11 Feb 2014 10:53:55 -0800 (PST)
Received: from mail-vc0-f174.google.com (mail-vc0-f174.google.com [209.85.220.174]) by ietfa.amsl.com (Postfix) with ESMTP id 1E5041A06AD for <apps-discuss@ietf.org>; Tue, 11 Feb 2014 10:53:54 -0800 (PST)
Received: by mail-vc0-f174.google.com with SMTP id im17so6150199vcb.19 for <apps-discuss@ietf.org>; Tue, 11 Feb 2014 10:53:54 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=iOK6l6Tqay5cHpwXPmfdyBqpqF3/cSV2dezXtoU2aCY=; b=TzbobtZNZb/0zz9x5SYwPgywNaozttOr1ztMJeDYaEpowP+PQQvjghTgeecLlihjN5 SJBmUhrjYoNLS7hvOqUlPyGJ43SLfH+Tx0ke9nGjZYkKAZDc7SSlcgSpj2bDRzZQep69 7yn2ZI8/ZFBIpB7gBb7VDDdlSf/wdhHVu4Q0uuU/pFcPnTIYYJJ/BOgmj6812eFv37fu irQ91OCqqZoec994RBSc4kYNfVAgkuI0KzkexnkOCn0d8o+kDXnOblrmQSZaV0eRL9Uo pSuYiL0PQLbNUTQ5ftW6uTxpKveYZ7kwz27yapXBRikPAmfzwM4KXBT6bWHr2ZWe9uME ypDQ==
X-Gm-Message-State: ALoCoQlD108V1OXZPTsBpkh7KTOXwn0sXVejyawSBkB0ZhNiiTfilu8ROAZoanqsws0WEou/AZmV
MIME-Version: 1.0
X-Received: by 10.220.89.4 with SMTP id c4mr440671vcm.53.1392144834385; Tue, 11 Feb 2014 10:53:54 -0800 (PST)
Received: by 10.220.98.73 with HTTP; Tue, 11 Feb 2014 10:53:54 -0800 (PST)
X-Originating-IP: [96.49.81.176]
In-Reply-To: <f5bwqh3b7ry.fsf@troutbeck.inf.ed.ac.uk>
References: <20140206183642.28098.24139.idtracker@ietfa.amsl.com> <f5bsirvjf27.fsf@troutbeck.inf.ed.ac.uk> <CAHBU6iuTLWFDV-2qM-FDMQeK1ONS8x4hUOGg2ssYRXQGnTZNTA@mail.gmail.com> <f5bwqh3b7ry.fsf@troutbeck.inf.ed.ac.uk>
Date: Tue, 11 Feb 2014 10:53:54 -0800
Message-ID: <CAHBU6iv8nS6Qh+98udWCEFc=U8WAiAFhoFsoEnxAzoYHpzyYvg@mail.gmail.com>
From: Tim Bray <tbray@textuality.com>
To: "Henry S. Thompson" <ht@inf.ed.ac.uk>
Content-Type: multipart/alternative; boundary="047d7b3a8456546f4404f225fa14"
Cc: "apps-discuss@ietf.org" <apps-discuss@ietf.org>
Subject: Re: [apps-discuss] I-D Action: draft-ietf-appsawg-xml-mediatypes-07.txt
X-BeenThere: apps-discuss@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: General discussion of application-layer protocols <apps-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/apps-discuss/>
List-Post: <mailto:apps-discuss@ietf.org>
List-Help: <mailto:apps-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/apps-discuss>, <mailto:apps-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 11 Feb 2014 18:53:57 -0000

On Mon, Feb 10, 2014 at 2:07 AM, Henry S. Thompson <ht@inf.ed.ac.uk> wrote:

>
> > 2.2: “[UNICODE] defines three "encoding forms", which are independent of
> > serialization” - what does “independent of serialization” mean?  I think
> > the UTF-* are actually serializations of unicode codepoints.  I suppose
> > UTF-16 is sort-of semi-independent of serialization, but UTF-8 never is.
>
> The three encoding forms (UTF-8, -16 and -32) allow for at least 7
> serializations between them.  And the text does go on to say that
> UTF-8 has only one serialization.  So I don't think there's anything
> wrong here, and it is following the UNICODE spec itself.
>

The sentence “[UNICODE] defines three "encoding forms", which are
independent of serialization, namely UTF-8, UTF-16 and UTF-32. ” is really
horribly misleading. UTF-8 is a serialization. UTF-16 and -32 are not
independent of serialization in the slightest, you can’t process them
successfully unless you know whether you’re looking at the _LE or _BE
serializations.  How about something like

[UNICODE] defines several “encoding forms”, namely UTF-8, UTF-16, and
UTF-32.  UTF-8 is a serialization. Note that UTF-16 XML documents may be
serialised into MIME entities in ... [also loses the information-free
sentence about the spec following the “precedent”]


> > I also think I disagree with my guess as what it’s trying to
> > say.  I tend to think the tools are going to do a better job of figuring
> > out the right charset labeling than your typical document author.
>
> Really?  Neither of the above-mentioned tools will do the 'right'
> thing with my XHTML by default, for example.
>

Ah, I got it finally; as the other thread said, what this is *really*
talking about is configuring your web server and so on.  So this is OK,
except for I think the word “author” is misleading since document authors
shouldn’t be expected to understand Unicode encodings or webserver
considerations.   So maybe something like:

XML MIME producers are RECOMMENDED to provide means to control what value,
if any, is given to charset parameters for XML MIME entities, for example
by enabling Web server configuration of filename-to-Content-Type-header
mappings on a file- by-file or suffix basis.


> >  What does being “authoritative” mean concretely? Is it the RFC’s
> > recommendation that the receiver SHOULD refuse to parse the the XML even
> > though it could?  If so, we should say so explicitly.
>
> No -- 'authoritative' means 'answers the question "how to determine
> the encoding with which to attempt to process the entity"'.  So the
> RFC is telling you how to process the entity, which is its job, after
> all.
>

RIght, but it feels bizarre and sort of against the spirit of Postel’s law
to remain completely silent about what happens when there are conflicts.  I
suggest you simply say that in the case of conflict, interoperability can
suffer since the observed behavior of receiving software is unpredictable.
 This reinforces your central thrust, which is: Don’t do this.


> > Then in the example in 9.8, draft says “all processors will treat the
> > enclosed entity as iso-8859-1 encoded.   That is, the "UTF-8" encoding
> > declaration will be ignored.”  Is this really true in practice?  I
> suspect
> > not; so perhaps you should say “all processors which conform to this
> > specification will”.
>
> I could make it 'conformant processors', but the whole point of a
> media type specification is to describe the behaviour of conformant
> processors. . .
>

It bothers me that the assertion, as stated, is simply wrong, and I don’t
think RFCs should contain assertions which are empirically false.   It’s
fairly common in the RFCs that I say to note that conformant
implementations will do thus and so.


>