Re: [Tools-discuss] idnits 2.16.02 counts line length using bytes and not characters (but xml2rfc 2.34.0 line-wraps counting chars, not bytes)

Henrik Levkowetz <henrik@levkowetz.com> Mon, 04 November 2019 21:07 UTC

Return-Path: <henrik@levkowetz.com>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DFB9612081E for <tools-discuss@ietfa.amsl.com>; Mon, 4 Nov 2019 13:07:12 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.888
X-Spam-Level:
X-Spam-Status: No, score=-1.888 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id XI-nJuCoG_cG for <tools-discuss@ietfa.amsl.com>; Mon, 4 Nov 2019 13:07:11 -0800 (PST)
Received: from zinfandel.tools.ietf.org (zinfandel.tools.ietf.org [IPv6:2001:1890:126c::1:2a]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3C25212002E for <tools-discuss@ietf.org>; Mon, 4 Nov 2019 13:07:11 -0800 (PST)
Received: from h-202-242.a357.priv.bahnhof.se ([158.174.202.242]:56957 helo=tannat.localdomain) by zinfandel.tools.ietf.org with esmtpsa (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <henrik@levkowetz.com>) id 1iRjZ8-0003i9-GG; Mon, 04 Nov 2019 13:07:10 -0800
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>, tools-discuss@ietf.org
References: <87h83jpdp1.fsf@fifthhorseman.net>
From: Henrik Levkowetz <henrik@levkowetz.com>
Message-ID: <b045766b-ef0b-7dbd-cf34-b1f2f332c151@levkowetz.com>
Date: Mon, 4 Nov 2019 22:07:00 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.8.0
MIME-Version: 1.0
In-Reply-To: <87h83jpdp1.fsf@fifthhorseman.net>
Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="TMV4PwdWCPf6D20dNqwFDx3q4kJUklUrf"
X-SA-Exim-Connect-IP: 158.174.202.242
X-SA-Exim-Rcpt-To: tools-discuss@ietf.org, dkg@fifthhorseman.net
X-SA-Exim-Mail-From: henrik@levkowetz.com
X-SA-Exim-Version: 4.2.1 (built Mon, 26 Dec 2011 16:24:06 +0000)
X-SA-Exim-Scanned: Yes (on zinfandel.tools.ietf.org)
X-Clacks-Overhead: GNU Terry Pratchett
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/JFpq_Xp7iRT35kRGzrQfo6IH6mk>
Subject: Re: [Tools-discuss] idnits 2.16.02 counts line length using bytes and not characters (but xml2rfc 2.34.0 line-wraps counting chars, not bytes)
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Nov 2019 21:07:18 -0000

Hi Daniel,

Agreed.  A rewrite of idnits which will do better in this respect is
in progress.


Best regards,

	Henrik

On 2019-11-04 21:51, Daniel Kahn Gillmor wrote:
> i tried running idnits 2.16.02 on a local copy of an intermediate build
> of draft-autocrypt-lamps-protected-headers (something between -00 and
> -01).
> 
> idnits complained that one of my lines was > 72 chars long.
> 
> It was correct that the line was > 72 *bytes* long, but because the txt
> was using UTF-8 (and not all characters are 7-bit clean), the line itself
> was <= 72 characters.
> 
> the line in question was:
> 
> ----
>    In the diagrams below, "↧" (DOWNWARDS ARROW FROM BAR, U+21A7) is used
> ----
> 
> This .txt file was generated from xml2rfc 2.34.0-1 (as shipped in debian).
> 
> For draft -01 i've worked around it by adjusting the text so that it
> reflows to have both the chars and bytes <= 72, so i'm not blocked on
> the error, but i wanted to document it here (happy to also report it on
> a bugtracker if you prefer, let me know).
> 
> My concern is that if xml2rfc is line-wrapping its txt output based on
> the number of characters, then idnits should *also* be doing its
> line-length-counting on the basis of characters, not the number of
> bytes.
> 
> So i think the two tools should probably be better aligned if we want to
> support non-ASCII characters in the published txt documents.
> 
>         --dkg
> 
> 
> 
> ___________________________________________________________
> Tools-discuss mailing list
> Tools-discuss@ietf.org
> https://www.ietf.org/mailman/listinfo/tools-discuss
> 
> Please report datatracker.ietf.org and mailarchive.ietf.org
> bugs at http://tools.ietf.org/tools/ietfdb
> or send email to datatracker-project@ietf.org
> 
> Please report tools.ietf.org bugs at
> http://tools.ietf.org/tools/issues
> or send email to webmaster@tools.ietf.org
>