[xml2rfc] idnits 2.16.05 finds non-ascii character —

Roger Price <roger@rogerprice.org> Thu, 18 February 2021 20:57 UTC

Return-Path: <roger@rogerprice.org>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 282593A1864 for <xml2rfc@ietfa.amsl.com>; Thu, 18 Feb 2021 12:57:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.897
X-Spam-Level:
X-Spam-Status: No, score=-1.897 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id obK3wuDGWNVt for <xml2rfc@ietfa.amsl.com>; Thu, 18 Feb 2021 12:57:34 -0800 (PST)
Received: from relay11.mail.gandi.net (relay11.mail.gandi.net [217.70.178.231]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 663253A1863 for <xml2rfc@ietf.org>; Thu, 18 Feb 2021 12:57:33 -0800 (PST)
Received: from maria (unknown [78.243.124.66]) (Authenticated sender: mailbox@rogerprice.org) by relay11.mail.gandi.net (Postfix) with ESMTPSA id 991AF100003 for <xml2rfc@ietf.org>; Thu, 18 Feb 2021 20:57:28 +0000 (UTC)
Date: Thu, 18 Feb 2021 21:57:27 +0100
From: Roger Price <roger@rogerprice.org>
X-X-Sender: rprice@maria.rogerprice.org
To: xml2rfc Mailing List <xml2rfc@ietf.org>
Message-ID: <alpine.DEB.2.20.2102182141230.6375@maria.rogerprice.org>
User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
X-Message-Flag: Supplemental report to reaper.nsa.gov. rc=0
MIME-Version: 1.0
Content-Type: text/plain; format="flowed"; charset="US-ASCII"
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/AAzL5O0ASFYAhjc6gXUljJaNacM>
Subject: [xml2rfc] idnits 2.16.05 finds non-ascii character &mdash;
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 18 Feb 2021 20:57:36 -0000

My XML markup includes an &mdash; in the <title>.  This looks fine in HTML, but 
idnits 2.16.05 finds non-ascii utf-8 e28084 in the txt file.

I fixed the problem by adding the line

   text = text.replace(u'\u2014', u'-')  # Replace &mdash; EM DASH with HYPHEN-MINUS

to file $PYTHON/site-packages/xml2rfc/writers/text.py immediately before line

   text = text.replace(u'\u00A0', u' ')

Could this be made a permanent change?

Perhaps there are other character entities which could profit from such 
replacement.

Roger