Re: [xml2rfc] idnits 2.16.05 finds non-ascii character —

Carsten Bormann <cabo@tzi.org> Thu, 18 February 2021 21:12 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1883B3A1888 for <xml2rfc@ietfa.amsl.com>; Thu, 18 Feb 2021 13:12:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FCqK87SlPWKm for <xml2rfc@ietfa.amsl.com>; Thu, 18 Feb 2021 13:12:10 -0800 (PST)
Received: from gabriel-vm-2.zfn.uni-bremen.de (gabriel-vm-2.zfn.uni-bremen.de [134.102.50.17]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 5B8B63A1889 for <xml2rfc@ietf.org>; Thu, 18 Feb 2021 13:12:10 -0800 (PST)
Received: from [192.168.217.118] (p5089a828.dip0.t-ipconnect.de [80.137.168.40]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-vm-2.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4DhS882hXgzydN; Thu, 18 Feb 2021 22:12:08 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <alpine.DEB.2.20.2102182141230.6375@maria.rogerprice.org>
Date: Thu, 18 Feb 2021 22:12:07 +0100
Cc: xml2rfc Mailing List <xml2rfc@ietf.org>
X-Mao-Original-Outgoing-Id: 635375527.851127-ecb6d8bc91ab83b033a4605296719159
Content-Transfer-Encoding: quoted-printable
Message-Id: <A233F65B-A17E-4ABF-9AC0-260AF29BC9D6@tzi.org>
References: <alpine.DEB.2.20.2102182141230.6375@maria.rogerprice.org>
To: Roger Price <roger@rogerprice.org>
X-Mailer: Apple Mail (2.3608.120.23.2.4)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/mTCly3J4qm7JUC1Y82ossSygtro>
Subject: Re: [xml2rfc] idnits 2.16.05 finds non-ascii character &mdash;
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Feb 2021 21:12:13 -0000


> On 2021-02-18, at 21:57, Roger Price <roger@rogerprice.org> wrote:
> 
> My XML markup includes an &mdash; in the <title>.  

Which is rather surprising to me, as it should not be possible according to the currently prevailing restrictive interpretation of RFC 7997.

> This looks fine in HTML, but idnits 2.16.05 finds non-ascii utf-8 e28084 in the txt file.

This is intentional as long as we stick to the above restrictive interpretation.

> I fixed the problem by adding the line
> 
>  text = text.replace(u'\u2014', u'-')  # Replace &mdash; EM DASH with HYPHEN-MINUS
> 
> to file $PYTHON/site-packages/xml2rfc/writers/text.py immediately before line
> 
>  text = text.replace(u'\u00A0', u' ')
> 
> Could this be made a permanent change?

I hope not.  Em-dashes look OK in plaintext.   They might be harder to distinguish from en-dashes or neutral hyphen minus in monospaced fonts. 
If we do decide we need a surrogate for em-dashes in monospaced plaintext, that should be ---.
(V2 used -- for its ASCIIification, IIRC.)

> Perhaps there are other character entities which could profit from such replacement.

There are not too many characters that suffer much from a monospaced font.

Grüße, Carsten