Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong

Nico Williams <nico@cryptonector.com> Sat, 27 February 2021 16:09 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1D4753A0C7A for <xml2rfc@ietfa.amsl.com>; Sat, 27 Feb 2021 08:09:36 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.099
X-Spam-Level:
X-Spam-Status: No, score=-2.099 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ajyjb4yTZdl9 for <xml2rfc@ietfa.amsl.com>; Sat, 27 Feb 2021 08:09:34 -0800 (PST)
Received: from insect.birch.relay.mailchannels.net (insect.birch.relay.mailchannels.net [23.83.209.93]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 20CA43A0ADD for <xml2rfc@ietf.org>; Sat, 27 Feb 2021 08:09:33 -0800 (PST)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 3B975342A92; Sat, 27 Feb 2021 16:09:33 +0000 (UTC)
Received: from pdx1-sub0-mail-a83.g.dreamhost.com (100-96-133-25.trex.outbound.svc.cluster.local [100.96.133.25]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id C26783429AA; Sat, 27 Feb 2021 16:09:32 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a83.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.96.133.25 (trex/6.0.2); Sat, 27 Feb 2021 16:09:33 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Bitter-Spot: 68b939991148b335_1614442173028_2046141563
X-MC-Loop-Signature: 1614442173028:2603131779
X-MC-Ingress-Time: 1614442173028
Received: from pdx1-sub0-mail-a83.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a83.g.dreamhost.com (Postfix) with ESMTP id 7B3C87E745; Sat, 27 Feb 2021 08:09:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=fCUbwaWUs8ScGq 2rtojtZJyUv5g=; b=bTUx8r7OoEZmoOVagK6VFudNMBsrooVj3ueEvSUk/kpx6B runeFg6YFV7JTjat4oOySgg4b0nFV1jEkdjGnG94sQAJoiUQDpoksiYCGw4Va++C YbF12aHt7byIiAd9L7OS6Jb21tikrbqJNA4Mi0ORgbLieaQ8ABuqaru3Ij6xM=
Received: from localhost (unknown [24.28.108.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a83.g.dreamhost.com (Postfix) with ESMTPSA id 45E9B7E740; Sat, 27 Feb 2021 08:09:29 -0800 (PST)
Date: Sat, 27 Feb 2021 10:09:27 -0600
X-DH-BACKEND: pdx1-sub0-mail-a83
From: Nico Williams <nico@cryptonector.com>
To: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
Cc: xml2rfc@ietf.org
Message-ID: <20210227160926.GA30153@localhost>
References: <87wnuucjra.fsf@fifthhorseman.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <87wnuucjra.fsf@fifthhorseman.net>
User-Agent: Mutt/1.9.4 (2018-02-28)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/UkfU9E0k8JQzQCjTjY3SkckDmIs>
Subject: Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Feb 2021 16:09:36 -0000

On Fri, Feb 26, 2021 at 09:44:09PM -0500, Daniel Kahn Gillmor wrote:
> The toolchain to build draft-ietf-openpgp-crypto-refresh produces XML
> that contains:

There are only two possible correct answers, but historically it is
impossible to get agreement on adopting either:

 - mark-up sentences, e.g.,

     <sentence>D. K. G. wrote a post.</sentence>
     <sentence>This follow-up might be controversial.</sentence>

 - follow sentence-ending periods with two spaces (which does not mean
   the the rendered output must also do the same, as it could use a wide
   space instead), e.g.,

     D. K. G. wrote a post.  This follow-up might be controversial.

   i.e., `.  ` as a sort of mark-up

Instead many developers prefer to code up imperfect heuristics for
sentence ending periods.  If you search relevant archives (e.g., this
list's) you'll find that this is a periodic discussion.

I myself am quite used to always following every sentence period with
two spaces.  (In smart phone text input boxes that's a huge pain
because, at least on mine they turn two spaces typed in quick succession
into a period and space.)

Don your asbestos suits now.  Flame war incoming.

Nico
--