Re: Looking for a grammar/spelling tool for XML I-D

Lars Eggert <lars@eggert.org> Thu, 20 August 2020 14:11 UTC

Return-Path: <lars@eggert.org>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E2A793A0B05 for <ietf@ietfa.amsl.com>; Thu, 20 Aug 2020 07:11:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=eggert.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SaL_KdI3Nfba for <ietf@ietfa.amsl.com>; Thu, 20 Aug 2020 07:11:37 -0700 (PDT)
Received: from mail.eggert.org (mail.eggert.org [91.190.195.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D88D03A0B03 for <ietf@ietf.org>; Thu, 20 Aug 2020 07:11:36 -0700 (PDT)
Received: from [IPv6:2a00:ac00:4000:400:1069:5b8:4590:433a] (unknown [IPv6:2a00:ac00:4000:400:1069:5b8:4590:433a]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.eggert.org (Postfix) with ESMTPSA id 31CA5683B3F; Thu, 20 Aug 2020 17:11:28 +0300 (EEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=eggert.org; s=dkim; t=1597932688; bh=y7Bp5IhIgqCP6jE3wt5sVd0I/qNbvzZ0o+F0p6Lg70w=; h=From:Subject:Date:In-Reply-To:Cc:To:References; b=FbwZDn/97Q4vaCjaGHg5bCE5TgTlB87Wk8/VDMbzh5BoWV95rEnJBoFfDMtwX/0rl t6gn3Bqit09uqK/O4q6meLmT0pi9z0WvkYIS6SvoGKA20/MNlUYIGRMaOxT7q9gBJD 9gqNveGEHfVMUIQxXI2W5ppbQgVQSEZDAnHp9PHg=
From: Lars Eggert <lars@eggert.org>
Message-Id: <728E56D6-62FA-4768-B183-10877319FD1E@eggert.org>
Content-Type: multipart/signed; boundary="Apple-Mail=_27B87518-E94E-42EE-B54C-32271ACFF976"; protocol="application/pgp-signature"; micalg=pgp-sha512
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
Subject: Re: Looking for a grammar/spelling tool for XML I-D
Date: Thu, 20 Aug 2020 17:11:27 +0300
In-Reply-To: <E992727E-9B27-494B-A210-6E9E44966DE8@eggert.org>
Cc: "ietf@ietf.org" <ietf@ietf.org>
To: "Eric Vyncke (evyncke)" <evyncke=40cisco.com@dmarc.ietf.org>
References: <27EC28EB-E58F-48EA-ABE0-99E0DF709847@cisco.com> <E992727E-9B27-494B-A210-6E9E44966DE8@eggert.org>
X-MailScanner-ID: 31CA5683B3F.A4661
X-MailScanner: Found to be clean
X-MailScanner-From: lars@eggert.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/BsB197OC7vRwBCvaPGRrlotckPI>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Aug 2020 14:11:39 -0000

On 2020-8-20, at 16:58, Lars Eggert <lars@eggert.org> wrote:
> On 2020-8-20, at 12:38, Eric Vyncke (evyncke) <evyncke=40cisco.com@dmarc.ietf.org> wrote:
>> So, we are looking forward for any tools on-line/off-line being able to do this.
> 
> https://github.com/codespell-project/codespell works on md, xml and txt (and more).

Ah, sorry. You wanted a grammar checker.

https://eggert.org/software/idreview has some scripting around languangetool that boils down to something like this:

rfcstrip "$id" | \
sed -e 's/^[ ]\{1,\}//g; s/[ ]\{2,\}/ /g; s/^o /\* /' | \
languagetool -l en-US -d WHITESPACE_RULE,EN_QUOTES,\
COMMA_PARENTHESIS_WHITESPACE,UPPERCASE_SENTENCE_START,\
THREE_NN,DOUBLE_PUNCTUATION,THREE_NN,DOUBLE_PUNCTUATION,\
WORD_CONTAINS_UNDERSCORE,COPYRIGHT,\
DASH_RULE,PLUS_MINUS,MULTIPLICATION_SIGN,ARROWS

There are still a bunch of false positives, often due to ASCII diagrams, etc.

Lars