[Tools-discuss] idnits 2.16.02 counts line length using bytes and not characters (but xml2rfc 2.34.0 line-wraps counting chars, not bytes)

Daniel Kahn Gillmor <dkg@fifthhorseman.net> Mon, 04 November 2019 20:52 UTC

Return-Path: <dkg@fifthhorseman.net>
X-Original-To: tools-discuss@ietfa.amsl.com
Delivered-To: tools-discuss@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ADC29120844 for <tools-discuss@ietfa.amsl.com>; Mon, 4 Nov 2019 12:52:07 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.3
X-Spam-Level:
X-Spam-Status: No, score=-4.3 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=neutral reason="invalid (unsupported algorithm ed25519-sha256)" header.d=fifthhorseman.net header.b=BCplyW5S; dkim=pass (2048-bit key) header.d=fifthhorseman.net header.b=c3+yMmQ8
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qOeEERVwIRr9 for <tools-discuss@ietfa.amsl.com>; Mon, 4 Nov 2019 12:52:05 -0800 (PST)
Received: from che.mayfirst.org (che.mayfirst.org [162.247.75.118]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DB8F7120836 for <tools-discuss@ietf.org>; Mon, 4 Nov 2019 12:51:57 -0800 (PST)
DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/simple; d=fifthhorseman.net; i=@fifthhorseman.net; q=dns/txt; s=2019; t=1572900716; h=from : to : subject : date : message-id : mime-version : content-type : from; bh=gjMwhUDhnZ0g5XxzcTWTG/OsbD5WodmfMg2wrLr2IUs=; b=BCplyW5Sh8FUICJR5IWBr4seZX+rjCobZQmwGRmSKITyX+/FzCq/bGn5 hnjnkerHDA5t2uE0WZgqnPJxsfrdBA==
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=fifthhorseman.net; i=@fifthhorseman.net; q=dns/txt; s=2019rsa; t=1572900716; h=from : to : subject : date : message-id : mime-version : content-type : from; bh=gjMwhUDhnZ0g5XxzcTWTG/OsbD5WodmfMg2wrLr2IUs=; b=c3+yMmQ85DrIHBuMY88C3wTpuZxYm62Vh6FXNAwXITFSmAQgKhtOZeeo bVYSWpAvvKS1uAHXGeQGiiMUB/xIeHzvNbXur8PaqYVk309FdXDRSKEmgm uBga0uwo0FM/8VN46LcrJgK+7RDGgLwUtkURIzsK4EOAlhwacdNj0NHn66 xSh9sI9Efu6+wS6/88enfCQkonvT7HM+fcpqPsJVdyfJ9IgwokqcvbzOkL sCGa3UV6/FgKbZ+N4NmHJ3IGL6C7v59KnAmERlUYG0DsVrhbtPKru7RZRi IqJvMKTgzhKw/jnlI+r0R8EBiANw78wTK0xT8JKZRt7QrgMvQ5R8og==
Received: from fifthhorseman.net (unknown [38.109.115.130]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) by che.mayfirst.org (Postfix) with ESMTPSA id 8AED2F9A5 for <tools-discuss@ietf.org>; Mon, 4 Nov 2019 15:51:56 -0500 (EST)
Received: by fifthhorseman.net (Postfix, from userid 1000) id 6C2F020403; Mon, 4 Nov 2019 15:51:54 -0500 (EST)
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: tools-discuss@ietf.org
Autocrypt: addr=dkg@fifthhorseman.net; prefer-encrypt=mutual; keydata= mDMEXEK/AhYJKwYBBAHaRw8BAQdAr/gSROcn+6m8ijTN0DV9AahoHGafy52RRkhCZVwxhEe0K0Rh bmllbCBLYWhuIEdpbGxtb3IgPGRrZ0BmaWZ0aGhvcnNlbWFuLm5ldD6ImQQTFggAQQIbAQUJA8Jn AAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgBYhBMS8Lds4zOlkhevpwvIGkReQOOXGBQJcQsbzAhkB AAoJEPIGkReQOOXG4fkBAO1joRxqAZY57PjdzGieXLpluk9RkWa3ufkt3YUVEpH/AP9c+pgIxtyW +FwMQRjlqljuj8amdN4zuEqaCy4hhz/1DbgzBFxCv4sWCSsGAQQB2kcPAQEHQERSZxSPmgtdw6nN u7uxY7bzb9TnPrGAOp9kClBLRwGfiPUEGBYIACYWIQTEvC3bOMzpZIXr6cLyBpEXkDjlxgUCXEK/ iwIbAgUJAeEzgACBCRDyBpEXkDjlxnYgBBkWCAAdFiEEyQ5tNiAKG5IqFQnndhgZZSmuX/gFAlxC v4sACgkQdhgZZSmuX/iVWgD/fCU4ONzgy8w8UCHGmrmIZfDvdhg512NIBfx+Mz9ls5kA/Rq97vz4 z48MFuBdCuu0W/fVqVjnY7LN5n+CQJwGC0MIA7QA/RyY7Sz2gFIOcrns0RpoHr+3WI+won3xCD8+ sVXSHZvCAP98HCjDnw/b0lGuCR7coTXKLIM44/LFWgXAdZjm1wjODbg4BFxCv50SCisGAQQBl1UB BQEBB0BG4iXnHX/fs35NWKMWQTQoRI7oiAUt0wJHFFJbomxXbAMBCAeIfgQYFggAJhYhBMS8Lds4 zOlkhevpwvIGkReQOOXGBQJcQr+dAhsMBQkB4TOAAAoJEPIGkReQOOXGe/cBAPlek5d9xzcXUn/D kY6jKmxe26CTws3ZkbK6Aa5Ey/qKAP0VuPQSCRxA7RKfcB/XrEphfUFkraL06Xn/xGwJ+D0hCw==
Date: Mon, 04 Nov 2019 15:51:54 -0500
Message-ID: <87h83jpdp1.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-discuss/QxlNoW6gOKPJi3L6zzZQH97xFQs>
Subject: [Tools-discuss] idnits 2.16.02 counts line length using bytes and not characters (but xml2rfc 2.34.0 line-wraps counting chars, not bytes)
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-discuss/>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 04 Nov 2019 20:52:12 -0000

i tried running idnits 2.16.02 on a local copy of an intermediate build
of draft-autocrypt-lamps-protected-headers (something between -00 and
-01).

idnits complained that one of my lines was > 72 chars long.

It was correct that the line was > 72 *bytes* long, but because the txt
was using UTF-8 (and not all characters are 7-bit clean), the line itself
was <= 72 characters.

the line in question was:

----
   In the diagrams below, "↧" (DOWNWARDS ARROW FROM BAR, U+21A7) is used
----

This .txt file was generated from xml2rfc 2.34.0-1 (as shipped in debian).

For draft -01 i've worked around it by adjusting the text so that it
reflows to have both the chars and bytes <= 72, so i'm not blocked on
the error, but i wanted to document it here (happy to also report it on
a bugtracker if you prefer, let me know).

My concern is that if xml2rfc is line-wrapping its txt output based on
the number of characters, then idnits should *also* be doing its
line-length-counting on the basis of characters, not the number of
bytes.

So i think the two tools should probably be better aligned if we want to
support non-ASCII characters in the published txt documents.

        --dkg