Re: [Rfc-markdown] [rfc-i] The <tt> train wreck

Carsten Bormann <cabo@tzi.org> Mon, 16 August 2021 09:09 UTC

Return-Path: <cabo@tzi.org>
X-Original-To: rfc-markdown@ietfa.amsl.com
Delivered-To: rfc-markdown@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8012D3A0D69 for <rfc-markdown@ietfa.amsl.com>; Mon, 16 Aug 2021 02:09:15 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id GJw5S5yBUXKc for <rfc-markdown@ietfa.amsl.com>; Mon, 16 Aug 2021 02:09:09 -0700 (PDT)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [134.102.50.15]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 017953A0D12 for <rfc-markdown@ietf.org>; Mon, 16 Aug 2021 02:09:06 -0700 (PDT)
Received: from [192.168.217.118] (p548dcc89.dip0.t-ipconnect.de [84.141.204.137]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4Gp7dC5S4Sz2xLv; Mon, 16 Aug 2021 11:09:03 +0200 (CEST)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <a001c6eb-330c-4029-a7ce-2809a9b5352a@www.fastmail.com>
Date: Mon, 16 Aug 2021 11:09:03 +0200
Cc: rfc-interest@rfc-editor.org, rfc-markdown@ietf.org
X-Mao-Original-Outgoing-Id: 650797743.160955-1a95b448da2223f8ad35aff32c950835
Content-Transfer-Encoding: quoted-printable
Message-Id: <3ADB9914-FA52-44CF-9F11-70AD99F88F59@tzi.org>
References: <04BFB6A7-7601-409D-8101-237242F6F38A@tzi.org> <a001c6eb-330c-4029-a7ce-2809a9b5352a@www.fastmail.com>
To: Martin Thomson <mt@lowentropy.net>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/rfc-markdown/YmzDfGFuaqT7l8DjA7YXh1r6nqo>
Subject: Re: [Rfc-markdown] [rfc-i] The <tt> train wreck
X-BeenThere: rfc-markdown@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "rfc-markdown is a discussion list for people writing I-Ds and RFCs in Markdown and the authors of the tools used for that." <rfc-markdown.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rfc-markdown>, <mailto:rfc-markdown-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/rfc-markdown/>
List-Post: <mailto:rfc-markdown@ietf.org>
List-Help: <mailto:rfc-markdown-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rfc-markdown>, <mailto:rfc-markdown-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 16 Aug 2021 09:09:23 -0000

On 2021-08-16, at 03:11, Martin Thomson <mt@lowentropy.net> wrote:
> 
> It seems like overloading this with three levels of semantics is the original sin.

Right.  When the font change was split out from a single style attribute into bits (<em, <strong, <tt), the other semantics might have been split out as well.  (There are some 64 or so combinations…)

> decorations (italic, bold, monospace), quoting (_, *, "), line breaking

You are using different terms than I did, and decoration is easily confused with decorator, so let me propose yet another set of terms:

font-change (italic, bold, monospace)
txt-fallback (_, *, “)
no-breaking syntax (break-on-space, no-break? See below.)

> From my perspective, it would be good to control each independently.  With tags.  I don't care if that is different tags, attributes on a single tag, or some combination of that with some global flags to control it.  (Global flags => stylesheet?)

The spanx markup was meant to delimit a span of text for a style change.  Having txt-fallback for that without the actual style change doesn’t make a lot of sense to me; this makes an attribute on the span to control txt-fallback look good.

No-breaking is actually useful without visual delimitation, so maybe this should be considered separately.

To me it does not make sense to remove the txt-fallback default=on from <tt but keep it in <em and <strong.  I think we need to design a way to make this work for all three.  (Note that the default fallbacks for <em and <strong don’t always work, so it would be good to be able to select a different one, as is also needed with <tt.  Compare the use of >false< and >true< in https://www.rfc-editor.org/rfc/rfc8949.html#name-diagnostic-notation — we used that sick notation because the default fallback for <tt was wrong (»false« and »“false”« are two different things in CBOR diagnostic notation), but there was no way to specify a different fallback, so we opted to always have the fallback characters in there, and then we were limited to ASCII.  Ouch.)

Global flags create all kinds of problems and are best avoided.

<aside markdown=“1”>
(CCing rfc-markdown again:) I shudder about the way to indicate the fallback preference in the markdown.  Maybe this can be made almost palatable with predefined ALDs, as in

`foo`
`bar`{:nf}

(Where nf is an abbreviation for “no fallback”.)

We could also invent some new syntax, of course (and we don’t need to limit the markdown input charset to ASCII for a more readable version of the above: »bar« maybe?).

Global flags are somewhat more excusable for a keyboarding syntax, but there still would need to be a way to compose text from different sources.

Note that the question which of the attribute values are default in the markdown is entirely orthogonal to the question which are the defaults in the XML; when in doubt, I prefer to keep backwards compatibility (which would mean the default should be fallback=_/*/" for <em, <strong, <tt).
</aside>

> Regarding non-breaking options:
> 
> I personally find the current reliance on &nbsp; &nbhy; (and worse, &zwnj; of which we have one in RFC 9000) to be problematic.  

I find a zero-width space (U+200B) on 0x0100-​0x01ff [look closely after the "-"!] in the table row “CRYPTO-ERROR”, is that what you mean? (zwsp U+200B ≠ zwnj U+200C, and I don’t think we need a lot of ligature control.)

My current view is that *introducing* break points into a span is something that the Unicode spaces do reasonably.  An editor with a reveal-mode does help (haven’t primed my emacs to deal with U+200B yet, though).

> It means that the text you copy is not the text you expect which can confuse all but the smartest searcher.  I would prefer to control this with tags.  If nothing else, it would be much more explicit and less error-prone.

So you would prefer 0x0100-<preferentially-break-here/>0x01ff or some such?

> With all the effort that went into making BCP 14 not wrap,

(Do you mean the Phrase “BCP 14”, which should have an nbsp in it, or do you mean <bcp14>MUST NOT</bcp14>?)

> I note that RFC 9000 wraps between BCP and 14.  Something that RFC 9087 doesn't do - at least for the text rendering (the &nbsp; is in the XML, but not the HTML, which suggests that xml2rfc is bleaching it incorrectly).

The boilerplate says “BCP 78” without no-break as well.
Note that RFC 9087 has six occurrences of “AS path”, only one of which is nbsp-protected (but the example pathes after three of them are).

Note that there are several aspects of horizontal no-breaking:

— turn blank space into no-break spaces etc.
— don’t allow breaking after characters such as / @ & | - + # % :
(— hyphenation no-breaking, which we don’t need as we don’t do hyphenation - or should we?)

Note that one constant source of spurious rfcdiff differences is the differences in breaking on »/«.  The default could be to never break on these, but allow WJ (U+2060) to enable breaking.  But then we are used to breaking on »-«…

Grüße, Carsten