Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong

Nico Williams <nico@cryptonector.com> Sat, 27 February 2021 17:30 UTC

Return-Path: <nico@cryptonector.com>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A960C3A0EF4 for <xml2rfc@ietfa.amsl.com>; Sat, 27 Feb 2021 09:30:11 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.1
X-Spam-Level:
X-Spam-Status: No, score=-2.1 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cryptonector.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9i_zsf4cUoVy for <xml2rfc@ietfa.amsl.com>; Sat, 27 Feb 2021 09:30:10 -0800 (PST)
Received: from hedgehog.birch.relay.mailchannels.net (hedgehog.birch.relay.mailchannels.net [23.83.209.81]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2369E3A0EF3 for <xml2rfc@ietf.org>; Sat, 27 Feb 2021 09:30:09 -0800 (PST)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from relay.mailchannels.net (localhost [127.0.0.1]) by relay.mailchannels.net (Postfix) with ESMTP id 5E50A341ACB; Sat, 27 Feb 2021 17:30:08 +0000 (UTC)
Received: from pdx1-sub0-mail-a59.g.dreamhost.com (100-96-10-164.trex.outbound.svc.cluster.local [100.96.10.164]) (Authenticated sender: dreamhost) by relay.mailchannels.net (Postfix) with ESMTPA id E2E6F341A57; Sat, 27 Feb 2021 17:30:07 +0000 (UTC)
X-Sender-Id: dreamhost|x-authsender|nico@cryptonector.com
Received: from pdx1-sub0-mail-a59.g.dreamhost.com (pop.dreamhost.com [64.90.62.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384) by 100.96.10.164 (trex/6.0.2); Sat, 27 Feb 2021 17:30:08 +0000
X-MC-Relay: Neutral
X-MailChannels-SenderId: dreamhost|x-authsender|nico@cryptonector.com
X-MailChannels-Auth-Id: dreamhost
X-Supply-Callous: 1b2fedb677f15820_1614447008160_880407726
X-MC-Loop-Signature: 1614447008160:210219530
X-MC-Ingress-Time: 1614447008159
Received: from pdx1-sub0-mail-a59.g.dreamhost.com (localhost [127.0.0.1]) by pdx1-sub0-mail-a59.g.dreamhost.com (Postfix) with ESMTP id 992B27EFD4; Sat, 27 Feb 2021 09:30:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=cryptonector.com; h=date :from:to:cc:subject:message-id:references:mime-version :content-type:in-reply-to; s=cryptonector.com; bh=d6Vhs3/LgH/DKa LoV2RVfClk/rM=; b=GQyuzMtgQegCJQQJJ6gSAkNetW/ofuPW1GFVnr7LtQEOhk uh4j+lY1JvpfHXZ2VzbpOG4rB+fcIYIMnRMoDvXIRap9ncC2Us5iyYTXnxBlWKAo XxOGUqweFhDnp4p3UCSm2OamX4+fL5CANyU6ZIoVfiIEf2aeXZoX1b3lywEXg=
Received: from localhost (unknown [24.28.108.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) (Authenticated sender: nico@cryptonector.com) by pdx1-sub0-mail-a59.g.dreamhost.com (Postfix) with ESMTPSA id D6BCA7EFC7; Sat, 27 Feb 2021 09:30:06 -0800 (PST)
Date: Sat, 27 Feb 2021 11:30:04 -0600
X-DH-BACKEND: pdx1-sub0-mail-a59
From: Nico Williams <nico@cryptonector.com>
To: Julian Reschke <julian.reschke@gmx.de>
Cc: xml2rfc@ietf.org
Message-ID: <20210227173003.GB30153@localhost>
References: <87wnuucjra.fsf@fifthhorseman.net> <20210227160926.GA30153@localhost> <3494e8d8-e9bd-6c38-61f3-0c31d066a61f@gmx.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <3494e8d8-e9bd-6c38-61f3-0c31d066a61f@gmx.de>
User-Agent: Mutt/1.9.4 (2018-02-28)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/uPvWBhXjO70HUYyY0NJWnp0diB8>
Subject: Re: [xml2rfc] assuming that period (.) ends a sentence is sometimes wrong
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 27 Feb 2021 17:30:12 -0000

On Sat, Feb 27, 2021 at 05:35:02PM +0100, Julian Reschke wrote:
> Am 27.02.2021 um 17:09 schrieb Nico Williams:
> > On Fri, Feb 26, 2021 at 09:44:09PM -0500, Daniel Kahn Gillmor wrote:
> > > The toolchain to build draft-ietf-openpgp-crypto-refresh produces XML
> > > that contains:
> > 
> > There are only two possible correct answers, but historically it is
> > impossible to get agreement on adopting either:
> 
> At least three :-).

Your third solution is not a solution.  It says stop trying to have
rendered text distinguish between sentence-ending periods and not.  That
is just unacceptable.

Ah, there is a third solution, duh: use a Unicode widespace space after
sentence-ending periods (&emsp;).  It... might be just slightly less
unpleasant than <sentence>:

  Here's an example.&emsp; And here's the next sentence.&emsp; I had to
  follow &amp;emsp; with a space because running on two sentences is
  annoying.

> >   - mark-up sentences, e.g.,
> > 
> >       <sentence>D. K. G. wrote a post.</sentence>
> >       <sentence>This follow-up might be controversial.</sentence>
> 
> I'd be surprised if people would be willing to do this.

Right, no one would or should want that.

> >   - follow sentence-ending periods with two spaces (which does not mean
> >     the the rendered output must also do the same, as it could use a wide
> >     space instead), e.g.,
> > 
> >       D. K. G. wrote a post.  This follow-up might be controversial.
> > 
> >     i.e., `.  ` as a sort of mark-up
> 
> Tricky, because it would change the whitespace handling inside <t>
> (where currently multiple white space characters are always equivalent
> to a single space).

Is that XML or xml2rfc forcing that?

> > Instead many developers prefer to code up imperfect heuristics for
> > sentence ending periods.  If you search relevant archives (e.g., this
> > list's) you'll find that this is a periodic discussion.
> 
> Yes.

Yuck.

> > Don your asbestos suits now.  Flame war incoming.
> 
> The third answer is: stop trying to. Optimizing the plain text output
> format really really is not important.

Not acceptable.

Nico
--