Re: [rfc-i] entities and unicode
Carsten Bormann <cabo@tzi.org> Fri, 03 December 2021 10:37 UTC
Return-Path: <rfc-interest-bounces@rfc-editor.org>
X-Original-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Delivered-To: ietfarch-rfc-interest-archive@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 87F633A045E; Fri, 3 Dec 2021 02:37:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.65
X-Spam-Level:
X-Spam-Status: No, score=-2.65 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NgD1wMrc147p; Fri, 3 Dec 2021 02:37:16 -0800 (PST)
Received: from rfc-editor.org (rfc-editor.org [IPv6:2001:1900:3001:11::31]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 226683A0476; Fri, 3 Dec 2021 02:37:16 -0800 (PST)
Received: from rfcpa.amsl.com (localhost [IPv6:::1]) by rfc-editor.org (Postfix) with ESMTP id E8278163323; Fri, 3 Dec 2021 02:37:14 -0800 (PST)
X-Original-To: rfc-interest@rfc-editor.org
Delivered-To: rfc-interest@rfc-editor.org
Received: from localhost (localhost [127.0.0.1]) by rfc-editor.org (Postfix) with ESMTP id D3F4A163323 for <rfc-interest@rfc-editor.org>; Fri, 3 Dec 2021 02:37:13 -0800 (PST)
X-Virus-Scanned: amavisd-new at rfc-editor.org
Received: from rfc-editor.org ([127.0.0.1]) by localhost (rfcpa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id hJV9RbHZARFT for <rfc-interest@rfc-editor.org>; Fri, 3 Dec 2021 02:37:09 -0800 (PST)
Received: from gabriel-smtp.zfn.uni-bremen.de (gabriel-smtp.zfn.uni-bremen.de [134.102.50.15]) by rfc-editor.org (Postfix) with ESMTPS id 24AD911FDDF for <rfc-interest@rfc-editor.org>; Fri, 3 Dec 2021 02:37:08 -0800 (PST)
Received: from [192.168.217.118] (p5089a436.dip0.t-ipconnect.de [80.137.164.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by gabriel-smtp.zfn.uni-bremen.de (Postfix) with ESMTPSA id 4J58QV1jmczDCf7; Fri, 3 Dec 2021 11:37:06 +0100 (CET)
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
From: Carsten Bormann <cabo@tzi.org>
In-Reply-To: <20211203101748.GA26129@miek.nl>
Date: Fri, 03 Dec 2021 11:37:05 +0100
Cc: RFC Interest <rfc-interest@rfc-editor.org>
X-Mao-Original-Outgoing-Id: 660220625.8838429-58f692caaabbd18fccf98436d6ee3428
Message-Id: <473FCE46-AF5B-4B89-ADF1-B68E908BE59F@tzi.org>
References: <20211203101748.GA26129@miek.nl>
To: Miek Gieben <miek@miek.nl>
X-Mailer: Apple Mail (2.3608.120.23.2.7)
Subject: Re: [rfc-i] entities and unicode
X-BeenThere: rfc-interest@rfc-editor.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: "A list for discussion of the RFC series and RFC Editor functions." <rfc-interest.rfc-editor.org>
List-Unsubscribe: <https://www.rfc-editor.org/mailman/options/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=unsubscribe>
List-Archive: <http://www.rfc-editor.org/pipermail/rfc-interest/>
List-Post: <mailto:rfc-interest@rfc-editor.org>
List-Help: <mailto:rfc-interest-request@rfc-editor.org?subject=help>
List-Subscribe: <https://www.rfc-editor.org/mailman/listinfo/rfc-interest>, <mailto:rfc-interest-request@rfc-editor.org?subject=subscribe>
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: rfc-interest-bounces@rfc-editor.org
Sender: rfc-interest <rfc-interest-bounces@rfc-editor.org>
On 2021-12-03, at 11:17, Miek Gieben <miek@miek.nl> wrote: > > Hello all, > > In https://www.rfc-editor.org/materials/FAQ-xml2rfcv3.html it says I need to wrap unicode > characters in <u> tags (which is already a bit confusing: > https://github.com/rfc-format/draft-iab-xml2rfc-v3-bis/issues/205). > > Due to some other bug, I was testing (html) entities and if I put: > > <t>this is some dashes ‑</t> > > In the XML, xml2rfc --text renders the "-" (but then the proper unicode one) in the text > document... which now puzzles me. Can’t reproduce (xml2rfc 3.11.1). The v2 support in xml2rfc turns the en-dash into a single hyphen/minus and the em-dash into a double hyphen/minus. Umlauts turn into language specific substitutes, e.g, Keränen turns into Keraenen (great for German, ouch for Finnish). But if I put these characters into a v3 source, they are all converted into a single hyphen/minus during rendering. (Umlauts turn into beautiful ä in the plaintext and HTML [the latter with & in the source]; Keränen — I’d rather have xml2rfc turn into green smoke here.) > Is <u> really needed? Or are entities not allowed? Or something else that I'm not seeing? There is also <contact>, which the grammar unfortunately breaks in most contexts, but does work in plain paragraphs. https://mailarchive.ietf.org/arch/msg/rfc-markdown/7spMIPZdz6S8_NbZZ2WcSB-ScPY Grüße, Carsten _______________________________________________ rfc-interest mailing list rfc-interest@rfc-editor.org https://www.rfc-editor.org/mailman/listinfo/rfc-interest
- [rfc-i] entities and unicode Miek Gieben
- Re: [rfc-i] entities and unicode Carsten Bormann
- Re: [rfc-i] entities and unicode Miek Gieben
- Re: [rfc-i] entities and unicode John Levine
- Re: [rfc-i] entities and unicode Carsten Bormann
- Re: [rfc-i] entities and unicode John R Levine
- Re: [rfc-i] entities and unicode Julian Reschke
- Re: [rfc-i] entities and unicode Robert Sparks
- Re: [rfc-i] entities and unicode Carsten Bormann