Re: [Tools-discuss] BOMs and rfcmarkup
Henrik Levkowetz <henrik@levkowetz.com> Wed, 20 September 2017 22:08 UTC
Return-Path: <henrik@levkowetz.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5F1BF1320D8 for <tools-discuss@ietfa.amsl.com>; Wed, 20 Sep 2017 15:08:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5mNxnl4nL1OY for <tools-discuss@ietfa.amsl.com>; Wed, 20 Sep 2017 15:08:46 -0700 (PDT)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [IPv6:2001:1890:126c::1:2a]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D1CD7132026 for <tools-discuss@ietf.org>; Wed, 20 Sep 2017 15:08:46 -0700 (PDT)
Received: from h-99-61.a357.priv.bahnhof.se ([82.196.99.61]:49741 helo=[192.168.1.120]) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1dunAj-00036p-OB; Wed, 20 Sep 2017 15:08:46 -0700
To: Adam Roach <adam@nostrum.com>, Brian E Carpenter <brian.e.carpenter@gmail.com>, tools-discuss@ietf.org
References: <935d04f3-7605-dcd2-7366-d7e3f522e0e0@gmail.com> <175CFE36-A267-42CF-98FE-50855E355DD6@tzi.org> <1403EA2F-C81D-44FF-B644-AD2C49FA6832@tzi.org> <b0f562eb-bf27-5c48-03a0-54cf63125135@gmail.com> <1c939c42-6574-8f3d-c871-80b7a0930f05@levkowetz.com> <13b43544-ec18-ec61-4f1b-acfe5f431551@nostrum.com>
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <caea1415-66d8-e98a-3a59-d0a199a39372@levkowetz.com>
Date: Thu, 21 Sep 2017 00:08:37 +0200
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <13b43544-ec18-ec61-4f1b-acfe5f431551@nostrum.com>
Content-Type: multipart/signed; micalg="pgp-sha256"; protocol="application/pgp-signature"; boundary="0gk7t8s3C2KQ7tg0hpJWLKCElJfqAo7RC"
X-SA-Exim-Connect-IP: 82.196.99.61
X-SA-Exim-Rcpt-To: tools-discuss@ietf.org, brian.e.carpenter@gmail.com, adam@nostrum.com
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/IoO0SMua_S8eJ0CD-91zesiKvSQ>
Subject: Re: [Tools-discuss] BOMs and rfcmarkup
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 20 Sep 2017 22:08:48 -0000
Hi Adam, On 2017-09-20 23:44, Adam Roach wrote: > On 9/20/17 1:34 PM, Henrik Levkowetz wrote: >> Hi Brian, >> >> On 2017-09-20 01:47, Brian E Carpenter wrote: >>> Just a note that https://tools.ietf.org/html/rfc8187 seems >>> to be fine, iff your browser is set to assume UTF-8. But it >>> does include the BOM. >>> >>> If your browser is set to assume what Firefox calls "Western", >>> the document starts with  (which is the IS8859 interpretation >>> of a UTF-8 BOM, expressed in UTF-8, if that doesn't make your head >>> hurt and if it survives the email system). Naturally, the £, € and >>> ü do not display correctly. >>> >>> The BOM is in the generated HTML immediately after the <pre>. >>> However, with Firefox it performs no useful function; all that matters >>> is the View/Text Encoding setting. Exactly the same applies to Internet >>> Explorer. >>> >>> Thus, at least for these two browsers, including the BOM in the HTML >>> file seems to be pointless. I'd vote for removing it. >> I've updated rfcmarkup to strip out BOMs; the link above should now >> give you a document without a BOM after the <pre>. > > Wait. WAIT! No. This is the wrong answer. If you are in a tool that > makes the BOM visible at all, the problem isn't the presence of the BOM > (which was well-defined rendering semantics in UTF8 that basically say > "this shouldn't be visible in any way at all"); the problem is that you > are loading a UTF-8 document and treating it as _some_ _other_ _encoding_. I'm sorry, but I disagree (at least in one sense). Earlier, rfcmarkup left a BOM in the middle of the html, which I believe is wrong. You could say that this is the result of not reading a UTF-8 file with BOM correctly, and I would agree. So now I'm reading the content as UTF-8, but not keeping the BOM, and not placing it in the middle of a html file (which is served with Content-Type: text/html; charset=UTF-8). I think that's right. > It does no good to paper over the presence of a BOM, since that just > defers the problem to elsewhere (namely, any other non-ASCII codepoints > in the document). Nope. I've been handling UTF-8 correctly in rfcmarkup for several years; it has behaved correctly for draft-ietf-httpbis-rfc5987bis for every version which has had non-ascii characters. > The proper behavior here, and especially for rfcdiff, is to make sure > everything is using UTF-8. Leaving the BOM in place serves as a useful > canary in this coal mine, and I would suggest *not* stripping it, at > least until we're sure the tools *otherwise* handle UTF-8 correctly. > That is, the BOM provides a good litmus test: if you can see it, you're > doing it wrong. I think my experience is that rfcmarkup has bean dealing just fine with utf-8 in general, but the BOM (which I think is nonsense in UTF-8, but that's a different discussion) has been visible, which I've now fixed. Best regards, Henrik
- [Tools-discuss] BOMs and rfcdiff Brian E Carpenter
- Re: [Tools-discuss] BOMs and rfcdiff Carsten Bormann
- [Tools-discuss] BOMs and rfcmarkup Brian E Carpenter
- Re: [Tools-discuss] BOMs and rfcmarkup Dale R. Worley
- Re: [Tools-discuss] BOMs and rfcmarkup Brian E Carpenter
- Re: [Tools-discuss] BOMs and rfcmarkup Julian Reschke
- Re: [Tools-discuss] BOMs and rfcmarkup Henrik Levkowetz
- Re: [Tools-discuss] BOMs and rfcmarkup Brian E Carpenter
- Re: [Tools-discuss] BOMs and rfcmarkup Henrik Levkowetz
- Re: [Tools-discuss] BOMs and rfcdiff Henrik Levkowetz
- Re: [Tools-discuss] BOMs and rfcmarkup Adam Roach
- Re: [Tools-discuss] BOMs and rfcmarkup Adam Roach
- Re: [Tools-discuss] BOMs and rfcmarkup Henrik Levkowetz
- Re: [Tools-discuss] BOMs and rfcmarkup Adam Roach