Re: Looking for a grammar/spelling tool for XML I-D

Lars Eggert <lars@eggert.org> Thu, 20 August 2020 14:15 UTC

Return-Path: <lars@eggert.org>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C6B173A09BA for <ietf@ietfa.amsl.com>; Thu, 20 Aug 2020 07:15:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=eggert.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rWQmEPvwb2P7 for <ietf@ietfa.amsl.com>; Thu, 20 Aug 2020 07:15:52 -0700 (PDT)
Received: from mail.eggert.org (mail.eggert.org [91.190.195.94]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B1B5B3A09B8 for <ietf@ietf.org>; Thu, 20 Aug 2020 07:15:52 -0700 (PDT)
Received: from [IPv6:2a00:ac00:4000:400:1069:5b8:4590:433a] (unknown [IPv6:2a00:ac00:4000:400:1069:5b8:4590:433a]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.eggert.org (Postfix) with ESMTPSA id 2FEE4683B3E; Thu, 20 Aug 2020 17:15:43 +0300 (EEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=eggert.org; s=dkim; t=1597932943; bh=y2b/Vg1CApERV2MmMlSnvcxrUE8E1VjdrXRKJZk2qSc=; h=From:Subject:Date:In-Reply-To:Cc:To:References; b=o5HC+XTCoAK9hy04D6RmEB5Oz1oc6PqXTl3qqTWSBhkR25Oak2DCt4hpL2mCNPYJO 1pdLCcjV0fOkiF39VOamz1eqEywox8pdBkimCl7go2tKE8lHQb1DnjiikoA+xGFgs5 alcf9IEt2pEPnDhR7OShvtExqWa46Nih8Ta059/w=
From: Lars Eggert <lars@eggert.org>
Message-Id: <EE67D61F-8B9E-4AC3-93B7-7EB3C31DDFA8@eggert.org>
Content-Type: multipart/signed; boundary="Apple-Mail=_C9A8DE31-47C7-4699-85AC-0F64BF506DEE"; protocol="application/pgp-signature"; micalg=pgp-sha512
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.1\))
Subject: Re: Looking for a grammar/spelling tool for XML I-D
Date: Thu, 20 Aug 2020 17:15:42 +0300
In-Reply-To: <728E56D6-62FA-4768-B183-10877319FD1E@eggert.org>
Cc: "ietf@ietf.org" <ietf@ietf.org>
To: "Eric Vyncke (evyncke)" <evyncke=40cisco.com@dmarc.ietf.org>
References: <27EC28EB-E58F-48EA-ABE0-99E0DF709847@cisco.com> <E992727E-9B27-494B-A210-6E9E44966DE8@eggert.org> <728E56D6-62FA-4768-B183-10877319FD1E@eggert.org>
X-MailScanner-ID: 2FEE4683B3E.A5C0B
X-MailScanner: Found to be clean
X-MailScanner-From: lars@eggert.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/EcqoJgPG5aHC0InW0zntuuI9t4s>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Aug 2020 14:15:56 -0000

On 2020-8-20, at 17:11, Lars Eggert <lars@eggert.org> wrote:
> On 2020-8-20, at 16:58, Lars Eggert <lars@eggert.org> wrote:
>> On 2020-8-20, at 12:38, Eric Vyncke (evyncke) <evyncke=40cisco.com@dmarc.ietf.org> wrote:
>>> So, we are looking forward for any tools on-line/off-line being able to do this.
>> 
>> https://github.com/codespell-project/codespell works on md, xml and txt (and more).
> 
> Ah, sorry. You wanted a grammar checker.
> 
> https://eggert.org/software/idreview has some scripting around languangetool that boils down to something like this:
> 
> rfcstrip "$id" | \
> sed -e 's/^[ ]\{1,\}//g; s/[ ]\{2,\}/ /g; s/^o /\* /' | \
> languagetool -l en-US -d WHITESPACE_RULE,EN_QUOTES,\
> COMMA_PARENTHESIS_WHITESPACE,UPPERCASE_SENTENCE_START,\
> THREE_NN,DOUBLE_PUNCTUATION,THREE_NN,DOUBLE_PUNCTUATION,\
> WORD_CONTAINS_UNDERSCORE,COPYRIGHT,\
> DASH_RULE,PLUS_MINUS,MULTIPLICATION_SIGN,ARROWS

And if you want to work on the XML, you can use --xmlfilter and pass it an XML file instead of the rfcstripped/sedded text version.

Lars

PS: I'll stop with the follow-ons now.