Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong

Julian Reschke <julian.reschke@gmx.de> Sat, 27 February 2021 16:35 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 08B453A0D25 for <xml2rfc@ietfa.amsl.com>; Sat, 27 Feb 2021 08:35:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 0XDwfZ2r5Zim for <xml2rfc@ietfa.amsl.com>; Sat, 27 Feb 2021 08:35:06 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 12A8F3A0D20 for <xml2rfc@ietf.org>; Sat, 27 Feb 2021 08:35:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1614443703; bh=23mgYHHNy1lYsPrHz9TkinozzWdmFpC6/6XPfzhbyy8=; h=X-UI-Sender-Class:Subject:To:References:From:Date:In-Reply-To; b=Ov8QGDFJ4BaRaeGqFp2zW7E4rSCbQ/jaEVrWsQB97me9JIN8y1bw6lX49xO6kVB8U j2ZE7j6lEa0+tioECydGSwSsB1YpxPDMzz8gHWbx6UxDTIzHE0m2z+IOJTTlg4Rdgw CJ4TdHxKltUxKz/V9E4FVkKW4ufFgeB0JIW5wEiE=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.178.20] ([217.251.136.5]) by mail.gmx.net (mrgmx004 [212.227.17.190]) with ESMTPSA (Nemesis) id 1N33Il-1lxFfv3DO1-013JfE for <xml2rfc@ietf.org>; Sat, 27 Feb 2021 17:35:02 +0100
To: xml2rfc@ietf.org
References: <87wnuucjra.fsf@fifthhorseman.net> <20210227160926.GA30153@localhost>
From: Julian Reschke <julian.reschke@gmx.de>
Message-ID: <3494e8d8-e9bd-6c38-61f3-0c31d066a61f@gmx.de>
Date: Sat, 27 Feb 2021 17:35:02 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0
MIME-Version: 1.0
In-Reply-To: <20210227160926.GA30153@localhost>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:WHSMrvLRUxJ0kQIdPSxZYO3K/5TSGSq52n887sq69g6rAs0x3un yNpG/wXoud+rnXw3hz8CDEbab0Ye/FVGbqG2dzWpznr9FgsAxYXvWQLVKV+6rIx3gYNVMuQ RqwNu7hWfpwbb0r9mSeAQ7yNWZA9IVMjtSQxnnwCaSOf9+iOQ0cfTgctTRtXG9LWLiDPngx spbZbUwjPLr7a1aFPb3+A==
X-UI-Out-Filterresults: notjunk:1;V03:K0:NamSTeLApjI=:98l7LH8KyTZyhr6ARpXa1n jGnAY2jCWvfs/sAFq6FtQ2UdTFlBeGcWZIWr+Y/bAQnjBuqUm2tMqVFLShLBkhqV6cuDUIJ/m fnnLfOcUGl+ns2AEA5S0rYS4LITEdP2kmaHeBlSATjF0s4G1aNMUdhR88NVGjUDXv241UQUix EUzi+OVeADXc9QPGi4QJ3kUo+OtJb7hgXB5N/IaPRJTgQglCWv/dnvvt4KC2Hq0OL83VZLHGX 1B/pSx97nDbrYGWoB/D51LirStvadQiLIG7PyYBR+FrhSj+NatVAIoLOpEZd4k3SpUInAJzej 9owjjhYlToBASJKRs0vJFyAdrQQCJvYqt1K0AFPSf/3eYDcR6INnjypCN23VgL5RB5UN4na67 yOemJ3b3eYNb01w0zZZedsPo+PHMvkyLgjdT/HoTKEIqlwwaBIhmpp8RRQIDi9Wuf5Cnc3x3j +GGxyGzlCOeNA9Up76Hjx+HHCwdf499hk9GFZygQCdu2vur7WmedwUqu+M0JmFHEvP8gnAkc5 ZpzQe5I1wQLfgnyyhSTynSucOIsHpDlrbOxKHqgCg5ROVMqD5sxquFDkBSSMnBKwc+yiDrs5r LEVDbPdZR2qN79r5LTB/YDquJdQnW6ofYGE9BZzheN4n2/uSbMH5G+j70xqGBtFQjiR1j7o1V aS+6RYzXTXQFpCSQYbBQlkCKsvE/BYLqc1DqLYZADUdcjXFZeAfsGQRPOLFmRUpng1PqDUFbH YU2zeIKMkDZE5b3p/e4aMu+CGaJhh0fYMBfFHs/zqTkfaQOBNjsrRtOXpf1/1wcKsDmFr/8WV D4tvWFRHZDB7MiV56/Zon0VCl9RC+sYiyd4R/obcAa9cRQ6crsmRk6PoNmJzQTnDPdMLdD1Oc fKd/iMw/OG1N7tHOER1Q==
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/Nbg7D1U5rBsa6xts_VL4ZlQxh-M>
Subject: Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Feb 2021 16:35:08 -0000

Am 27.02.2021 um 17:09 schrieb Nico Williams:
> On Fri, Feb 26, 2021 at 09:44:09PM -0500, Daniel Kahn Gillmor wrote:
>> The toolchain to build draft-ietf-openpgp-crypto-refresh produces XML
>> that contains:
>
> There are only two possible correct answers, but historically it is
> impossible to get agreement on adopting either:

At least three :-).

>   - mark-up sentences, e.g.,
>
>       <sentence>D. K. G. wrote a post.</sentence>
>       <sentence>This follow-up might be controversial.</sentence>

I'd be surprised if people would be willing to do this.

>   - follow sentence-ending periods with two spaces (which does not mean
>     the the rendered output must also do the same, as it could use a wide
>     space instead), e.g.,
>
>       D. K. G. wrote a post.  This follow-up might be controversial.
>
>     i.e., `.  ` as a sort of mark-up

Tricky, because it would change the whitespace handling inside <t>
(where currently multiple white space characters are always equivalent
to a single space).

> Instead many developers prefer to code up imperfect heuristics for
> sentence ending periods.  If you search relevant archives (e.g., this
> list's) you'll find that this is a periodic discussion.

Yes.

> I myself am quite used to always following every sentence period with
> two spaces.  (In smart phone text input boxes that's a huge pain
> because, at least on mine they turn two spaces typed in quick succession
> into a period and space.)
>
> Don your asbestos suits now.  Flame war incoming.

The third answer is: stop trying to. Optimizing the plain text output
format really really is not important.

Best regards, Julian