Re: [xml2rfc] [irsg] character sets, was UPDATE regarding <u>
Carsten Bormann <cabo@tzi.org> Sun, 05 March 2023 13:19 UTC
Return-Path: <cabo@tzi.org>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 47ABAC14CF1A; Sun, 5 Mar 2023 05:19:29 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.199
X-Spam-Level:
X-Spam-Status: No, score=-4.199 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 933QXTG2C4Ew; Sun, 5 Mar 2023 05:19:24 -0800 (PST)
Received: from smtp.zfn.uni-bremen.de (smtp.zfn.uni-bremen.de [134.102.50.15]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C41EBC14CEFC; Sun, 5 Mar 2023 05:19:23 -0800 (PST)
Received: from [192.168.217.124] (p548dc9a4.dip0.t-ipconnect.de [84.141.201.164]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4PV2Np0qLdzDCbS; Sun, 5 Mar 2023 14:19:22 +0100 (CET)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <20230304190316.05346A51F3D2@ary.qy>
Date: Sun, 05 Mar 2023 14:19:21 +0100
Cc: xml2rfc@ietf.org, rfc-markdown@ietf.org
X-Mao-Original-Outgoing-Id: 699715161.558488-53542cc15c1d3001f49b1311463adaca
Content-Transfer-Encoding: quoted-printable
Message-Id: <5081F069-705D-4707-85EB-DBA11D594D19@tzi.org>
References: <20230304190316.05346A51F3D2@ary.qy>
To: "John R. Levine" <johnl@taugh.com>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/CM7oMM2MbL4_YnZWZv2cdHn1mmg>
Subject: Re: [xml2rfc] [irsg] character sets, was UPDATE regarding <u>
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: XML2RFC discussion list <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 05 Mar 2023 13:19:29 -0000
On 2023-03-04, at 20:03, John Levine <johnl@taugh.com> wrote: > > One issue is our policy about where any non-ASCII goes, but a separate > issue that Carsten has run into is exotic characters beyond the 3000 > or so that are in the fonts we normally use. Kramdown-rfc 1.6.26 now has an `echars` utility that is more talkative about unicode blocks and unicode scripts. Examples below. I hope this is useful in our quest to rescue these Unicode arcana from being held captive by Unicode super-geeks. Grüße, Carsten This is the RFC-to-be where RPC happened to pick a character from the Dingbats block that started this discussion: $ echars rfc/authors/rfc9340.txt *** Latin-1 Supplement (Latin) ß: U+00DF 1 LATIN SMALL LETTER SHARP S á: U+00E1 3 LATIN SMALL LETTER A WITH ACUTE ä: U+00E4 1 LATIN SMALL LETTER A WITH DIAERESIS é: U+00E9 5 LATIN SMALL LETTER E WITH ACUTE ó: U+00F3 1 LATIN SMALL LETTER O WITH ACUTE ø: U+00F8 1 LATIN SMALL LETTER O WITH STROKE ü: U+00FC 7 LATIN SMALL LETTER U WITH DIAERESIS *** Latin Extended-A (Latin) ć: U+0107 2 LATIN SMALL LETTER C WITH ACUTE č: U+010D 2 LATIN SMALL LETTER C WITH CARON ę: U+0119 1 LATIN SMALL LETTER E WITH OGONEK ł: U+0142 2 LATIN SMALL LETTER L WITH STROKE š: U+0161 1 LATIN SMALL LETTER S WITH CARON *** General Punctuation (Common) –: U+2013 1 EN DASH *** Dingbats (Common) ➔: U+2794 2 HEAVY WIDE-HEADED RIGHTWARDS ARROW *** Miscellaneous Mathematical Symbols-A (Common) ⟩: U+27E9 61 MATHEMATICAL RIGHT ANGLE BRACKET *** Arabic Presentation Forms-B (Common) : U+FEFF 1 ZERO WIDTH NO-BREAK SPACE For comparison, an RFC out of the pre-v3 times: $ echars rfc/rfc8265.txt *** Basic Latin (Common) "\f": U+000C 25 <control-000C> *** Latin-1 Supplement ¹: U+00B9 1 SUPERSCRIPT ONE (Common) ß: U+00DF 3 LATIN SMALL LETTER SHARP S (Latin) å: U+00E5 1 LATIN SMALL LETTER A WITH RING ABOVE (Latin) *** Latin Extended-A (Latin) ſ: U+017F 2 LATIN SMALL LETTER LONG S *** Greek and Coptic (Greek) Σ: U+03A3 2 GREEK CAPITAL LETTER SIGMA π: U+03C0 2 GREEK SMALL LETTER PI ς: U+03C2 2 GREEK SMALL LETTER FINAL SIGMA σ: U+03C3 2 GREEK SMALL LETTER SIGMA *** Ogham (Ogham) : U+1680 1 OGHAM SPACE MARK *** Number Forms (Latin) Ⅳ: U+2163 6 ROMAN NUMERAL FOUR *** Mathematical Operators (Common) ∞: U+221E 2 INFINITY *** Miscellaneous Symbols (Common) ♦: U+2666 1 BLACK DIAMOND SUIT *** Alphabetic Presentation Forms (Latin) fi: U+FB01 2 LATIN SMALL LIGATURE FI *** Arabic Presentation Forms-B (Common) : U+FEFF 1 ZERO WIDTH NO-BREAK SPACE You can see the form feeds we used before RFC8650, as well as the dreadful BOM already (which is in the Arabic Presentation Forms-B block, in case you didn’t know that).
- Re: [xml2rfc] [irsg] UPDATE regarding <u> Re: AUT… Carsten Bormann
- Re: [xml2rfc] [irsg] UPDATE regarding <u> Re: AUT… John Levine
- Re: [xml2rfc] [irsg] UPDATE regarding <u> Re: AUT… Carsten Bormann
- Re: [xml2rfc] [irsg] character sets, was UPDATE r… John Levine
- Re: [xml2rfc] [irsg] character sets, was UPDATE r… Carsten Bormann
- Re: [xml2rfc] [irsg] character sets, was UPDATE r… John R Levine
- Re: [xml2rfc] [irsg] character sets, was UPDATE r… Carsten Bormann
- Re: [xml2rfc] [irsg] character sets, was UPDATE r… Fred Baker
- Re: [xml2rfc] [irsg] character sets, was UPDATE r… John Levine
- Re: [xml2rfc] [irsg] character sets, was UPDATE r… Carsten Bormann
- Re: [xml2rfc] [irsg] character sets, was UPDATE r… Martin J. Dürst
- Re: [xml2rfc] [irsg] character sets, was UPDATE r… Carsten Bormann