[xml2rfc] Re: I18n of I-Ds (was Simpler than...)
julian.reschke at gmx.de (Julian Reschke) Wed, 27 August 2008 13:00 UTC
From: "julian.reschke at gmx.de"
Date: Wed, 27 Aug 2008 13:00:51 +0000
Subject: [xml2rfc] Re: I18n of I-Ds (was Simpler than...)
In-Reply-To: <g94asm$6ug$1@ger.gmane.org>
References: <517bf110808251844h1f7442bes8bfb9d7d05500fda@mail.gmail.com><g8vu4u$ude$1@ger.gmane.org><48B3AF1F.4050901@gmx.de> <g925mj$gik$1@ger.gmane.org> <48B507BE.4090305@gmx.de> <g94asm$6ug$1@ger.gmane.org>
Message-ID: <48B5B265.7000705@gmx.de>
X-Date: Wed Aug 27 13:00:51 2008
Frank Ellermann wrote: > Julian Reschke wrote: > > Digression, since I use OE I wonder why subjects > of replies to my articles sometimes end up with > Re: Re: (and more, but I trim Re: Re: manually). > > Is that some OE bug on my side, or a Tunderbird > oddity on your side ? No arguments about "Re:" > please, the USEFOR WG discussed it for years ;-) Seems to me it's OE inserting it -- it appeared first in your previous email: <http://drakken.dbc.mtview.ca.us/pipermail/xml2rfc/2008-August/003497.html>. >> What xml2rfc currently does I would call "highly >> experimental". I really don't see why we have to >> keep that, unless there's evidence it's widely >> used. In *that* case, a PI would do, I guess. > > Well, no. Clearly xml2rfc is tasked to produce > US-ASCII text/plain RFC output. For that job it > has to do something sensible and reproducible with > non-ASCII (= Unicode after evaluating the document > charset) input. It hasn't been doing that for years, and I think what it currently does was introduced as an experiment. > Clearly rendering question marks is not desirable > for the discussed Latin cases. For some languages > "just strip the diacritics" might be good enough. Stopping and telling the author to remove the non-ASCII characters is another choice. > I didn't look into the unsolicited code for Polish > in my inbox, but likely it is a similar solution. Actually no, it produces UTF-8 text output. > As far as the ASCII rendering is noted in the DTD > in the form of symbolic charater entities we can > have only one target ASCII represesentation per > symbol, remotely in the direction of "mnemonics". I don't think the DTD has anything to do with it; it just declares the names. The expansion happens inside xml2rfc.tcl. > [In theory DTDs allow if-then-else like constructs, > but this doesn't help me, because I want "Duerst" > and "Faltstrom" in *one* ASCII document] > > Therefore we can have only one "plausible" output > for ü (etc.) in the DTD, either "ue" as is, or > "u". Whatever we pick, I need a way to get at the > other form. As far as I can tell, you're misunderstanding where the mapping occurs. If it would be based on the DTD, it should be the same for TXT and HTML output, no? BR, Julian >From dhc2 at dcrocker.net Wed Aug 27 15:01:33 2008 From: dhc2 at dcrocker.net (Dave CROCKER) Date: Wed Aug 27 14:01:44 2008 Subject: [xml2rfc] Re: Re: I18n of I-Ds (was Simpler than...) In-Reply-To: <E6DE4195-AEF6-4092-ACA6-47CEC8D7C31B@cisco.com> References: <517bf110808251844h1f7442bes8bfb9d7d05500fda@mail.gmail.com><g8vu4u$ude$1@ger.gmane.org> <48B3AF1F.4050901@gmx.de> <g925mj$gik$1@ger.gmane.org> <E6DE4195-AEF6-4092-ACA6-47CEC8D7C31B@cisco.com> Message-ID: <48B5C0AD.3000609@dcrocker.net> Patrik F?ltstr?m wrote: > So, "just" adding one "alternative" attribute that is displayed if UTF-8 > encoding is allowed as output would be perfect. +1 I hadn't been tracking this thread, but had an exchange with Tim Bray about the topic and the use of an alt mechanism seemed to fit into the existing xml2rfc model quite nicely. I recently tested the alt mechanism for figures, with results that I like quite a bit. Take a look at the 3 forms of output my email-arch draft now generates, notably in terms of figures. Specifically look at the "unreleased" -11-14dc version: <http://bbiw.net/recent.html#emailarch> > And for the RFCs, I think having UTF-8 in Names, Examples and other > non-normative places would be something I personally is prepared arguing > for. > > Normative text should still be ASCII only. Still too many issues with > for example regular expressions, diff tools and other things we use. +1 d/ -- Dave Crocker Brandenburg InternetWorking bbiw.net >From nobody at xyzzy.claranet.de Thu Aug 28 01:17:21 2008 From: nobody at xyzzy.claranet.de (Frank Ellermann) Date: Wed Aug 27 15:15:29 2008 Subject: [xml2rfc] Re: xml2rfc FAQ available References: <D852C1A1-3953-4695-B5A7-040A1A8CC7F4@isi.edu> Message-ID: <g94jl4$8os$1@ger.gmane.org> Alice Hagens wrote: > http://xml.resource.org/xml2rfcFAQ.html Great. As always the demon question, *where* is the text/plain output for rfcmarkup, or the XML source to create it ? The audience of my IETF tools page are Lynx users, any other stoneage browser not supporting CSS, or modern mobile devices with their own restrictions. Without TXT version common tasks such as rfcdiff don't work. Without XML folks insisting on PDF in addition to HTML (please don't ask why, I have not the faintest idea) cannot use Julian's tools. Julian's tools support lots of other magic, all directly on the XML, avoiding the lossy TXT step for the various IETF tools. --- 3.1 bullet 1 s/at the top/DTD subset at the top/ (terminology not, "at the top" is IMO too vague). The note "choose uppercase" is IMHO unnecessary: Whatever folks pick, it has to match the use in the references. A reference has to be used in the body (otherwise this triggers a warning for strict). The entity name is *unrelated* to the fragment identifier determined by the anchor in the imported bibxml snippet, that is something confusing authors: Entity name and fragment id. *appear* to be related for RFCs. --- 3.1 bullet 2: <?rfc include= is an alternative, but I'm not sure if this is a good idea for uses of Bill's validator, or of the W3C validator. --- 3.7 question: For US-ASCII output (my use case) you propose a reference-trick to bypass an <eref> limitation. I didn't know that this fails for text output, that's a bug: For <eref target="xyz">abc</eref> I expect xyz abc or similar, not only abc. And in the simple <eref target="123" /> case it works, I get 123 as text, not an empty string. --- 3.10 Same issue as for <eref />, something with <xref target="RFC2119"> section 2 </xref> is not as it should be for the purposes of rfcmarkup. --- 3.11 Brilliant, I didn't know format="title". --- 4.4 Ditto most list details in chapter 4, thanks. --- 5.1 <artwork> outside of <figure>, are you sure that this is valid XML based on the DTD ? There are situations when I use the W3C validator... --- 6.5 Interesting, I have to test inline="no", it sounds like a plan to use <cref> in a good way. --- 1.4 Maybe add self-references to all formats of the FAQ (HTML, TXT, XML) here. For drafts and RFCs everybody knows where that is, but the FAQ and the Checklist are no ordinary numbered I-Ds. The 2006 slides exist also in a human readable format, not only as proprietary document format: <http://www.ietf.org/proceedings/06jul/slides/xml2rfc-1/sld1.htm> Eager to add a rfcmarkup link to the TXT version on my IETF tools page, Frank
- [xml2rfc] Re: I18n of I-Ds (was Simpler than...) Patrik Fältström
- [xml2rfc] Re: I18n of I-Ds (was Simpler than...) Frank Ellermann
- [xml2rfc] Re: I18n of I-Ds (was Simpler than...) Julian Reschke
- [xml2rfc] Re: Re: I18n of I-Ds (was Simpler than.… Julian Reschke
- [xml2rfc] Re: I18n of I-Ds (was Simpler than...) Julian Reschke
- [xml2rfc] Re: I18n of I-Ds (was Simpler than...) Frank Ellermann
- [xml2rfc] Re: I18n of I-Ds (was Simpler than...) Julian Reschke
- [xml2rfc] Re: I18n of I-Ds (was Simpler than...) Dearlove, Christopher UK
- [xml2rfc] Re: I18n of I-Ds (was Simpler than...) Julian Reschke