[xml2rfc] Re: I18n of I-Ds (was Simpler than...)

julian.reschke at gmx.de (Julian Reschke) Wed, 27 August 2008 13:00 UTC

From: "julian.reschke at gmx.de"
Date: Wed, 27 Aug 2008 13:00:51 +0000
Subject: [xml2rfc] Re: I18n of I-Ds (was Simpler than...)
In-Reply-To: <g94asm$6ug$1@ger.gmane.org>
References: <517bf110808251844h1f7442bes8bfb9d7d05500fda@mail.gmail.com><g8vu4u$ude$1@ger.gmane.org><48B3AF1F.4050901@gmx.de> <g925mj$gik$1@ger.gmane.org> <48B507BE.4090305@gmx.de> <g94asm$6ug$1@ger.gmane.org>
Message-ID: <48B5B265.7000705@gmx.de>
X-Date: Wed Aug 27 13:00:51 2008

Frank Ellermann wrote:
> Julian Reschke wrote:
> 
> Digression, since I use OE I wonder why subjects
> of replies to my articles sometimes end up with
> Re: Re: (and more, but I trim Re: Re: manually).
> 
> Is that some OE bug on my side, or a Tunderbird
> oddity on your side ?  No arguments about "Re:"
> please, the USEFOR WG discussed it for years ;-)

Seems to me it's OE inserting it -- it appeared first in your previous 
email: 
<http://drakken.dbc.mtview.ca.us/pipermail/xml2rfc/2008-August/003497.html>.

>> What xml2rfc currently does I would call "highly
>> experimental". I really don't see why we have to
>> keep that, unless there's evidence it's widely 
>> used. In *that* case, a PI would do, I guess.
> 
> Well, no.  Clearly xml2rfc is tasked to produce
> US-ASCII text/plain RFC output.  For that job it
> has to do something sensible and reproducible with
> non-ASCII (= Unicode after evaluating the document
> charset) input.

It hasn't been doing that for years, and I think what it currently does 
was introduced as an experiment.

> Clearly rendering question marks is not desirable
> for the discussed Latin cases.  For some languages
> "just strip the diacritics" might be good enough.

Stopping and telling the author to remove the non-ASCII characters is 
another choice.

> I didn't look into the unsolicited code for Polish
> in my inbox, but likely it is a similar solution.

Actually no, it produces UTF-8 text output.

> As far as the ASCII rendering is noted in the DTD
> in the form of symbolic charater entities we can
> have only one target ASCII represesentation per
> symbol, remotely in the direction of "mnemonics".

I don't think the DTD has anything to do with it; it just declares the 
names. The expansion happens inside xml2rfc.tcl.

> [In theory DTDs allow if-then-else like constructs,
>  but this doesn't help me, because I want "Duerst"
>  and "Faltstrom" in *one* ASCII document]
> 
> Therefore we can have only one "plausible" output
> for &uuml (etc.) in the DTD, either "ue" as is, or
> "u".  Whatever we pick, I need a way to get at the
> other form.

As far as I can tell, you're misunderstanding where the mapping occurs. 
If it would be based on the DTD, it should be the same for TXT and HTML 
output, no?

BR, Julian
>From dhc2 at dcrocker.net  Wed Aug 27 15:01:33 2008
From: dhc2 at dcrocker.net (Dave CROCKER)
Date: Wed Aug 27 14:01:44 2008
Subject: [xml2rfc] Re: Re: I18n of I-Ds (was Simpler than...)
In-Reply-To: <E6DE4195-AEF6-4092-ACA6-47CEC8D7C31B@cisco.com>
References: 
	<517bf110808251844h1f7442bes8bfb9d7d05500fda@mail.gmail.com><g8vu4u$ude$1@ger.gmane.org>
	<48B3AF1F.4050901@gmx.de> <g925mj$gik$1@ger.gmane.org>
	<E6DE4195-AEF6-4092-ACA6-47CEC8D7C31B@cisco.com>
Message-ID: <48B5C0AD.3000609@dcrocker.net>



Patrik F?ltstr?m wrote:
> So, "just" adding one "alternative" attribute that is displayed if UTF-8 
> encoding is allowed as output would be perfect.

+1

I hadn't been tracking this thread, but had an exchange with Tim Bray about the 
topic and the use of an alt mechanism seemed to fit into the existing xml2rfc 
model quite nicely.

I recently tested the alt mechanism for figures, with results that I like quite 
a bit.

Take a look at the 3 forms of output my email-arch draft now generates, notably 
in terms of figures.  Specifically look at the "unreleased" -11-14dc version:

   <http://bbiw.net/recent.html#emailarch>


> And for the RFCs, I think having UTF-8 in Names, Examples and other 
> non-normative places would be something I personally is prepared arguing 
> for.
> 
> Normative text should still be ASCII only. Still too many issues with 
> for example regular expressions, diff tools and other things we use.

+1

d/
-- 

   Dave Crocker
   Brandenburg InternetWorking
   bbiw.net
>From nobody at xyzzy.claranet.de  Thu Aug 28 01:17:21 2008
From: nobody at xyzzy.claranet.de (Frank Ellermann)
Date: Wed Aug 27 15:15:29 2008
Subject: [xml2rfc] Re: xml2rfc FAQ available
References: <D852C1A1-3953-4695-B5A7-040A1A8CC7F4@isi.edu>
Message-ID: <g94jl4$8os$1@ger.gmane.org>

Alice Hagens wrote:

> http://xml.resource.org/xml2rfcFAQ.html

Great.  As always the demon question, *where* is
the text/plain output for rfcmarkup, or the XML
source to create it ?

The audience of my IETF tools page are Lynx users,
any other stoneage browser not supporting CSS, or
modern mobile devices with their own restrictions.

Without TXT version common tasks such as rfcdiff
don't work.  Without XML folks insisting on PDF
in addition to HTML (please don't ask why, I have
not the faintest idea) cannot use Julian's tools.

Julian's tools support lots of other magic, all
directly on the XML, avoiding the lossy TXT step
for the various IETF tools.  

---
3.1 bullet 1 s/at the top/DTD subset at the top/
(terminology not, "at the top" is IMO too vague).

The note "choose uppercase" is IMHO unnecessary:

Whatever folks pick, it has to match the use in
the references.  A reference has to be used in
the body (otherwise this triggers a warning for
strict).  The entity name is *unrelated* to the
fragment identifier determined by the anchor in 
the imported bibxml snippet, that is something
confusing authors:  Entity name and fragment id.
*appear* to be related for RFCs.

---
3.1 bullet 2:  <?rfc include= is an alternative,
but I'm not sure if this is a good idea for uses
of Bill's validator, or of the W3C validator.

---
3.7 question:

For US-ASCII output (my use case) you propose a 
reference-trick to bypass an <eref> limitation.
  
I didn't know that this fails for text output,
that's a bug:  For <eref target="xyz">abc</eref>
I expect xyz abc or similar, not only abc.  

And in the simple <eref target="123" /> case it
works, I get 123 as text, not an empty string.

---
3.10  Same issue as for <eref />, something with
<xref target="RFC2119"> section 2 </xref> is not
as it should be for the purposes of rfcmarkup.

---
3.11  Brilliant, I didn't know format="title".

---
4.4
Ditto most list details in chapter 4, thanks.

---
5.1  <artwork> outside of <figure>, are you sure
that this is valid XML based on the DTD ?  There
are situations when I use the W3C validator...

---
6.5  Interesting, I have to test inline="no", it
sounds like a plan to use <cref> in a good way.

---
1.4  Maybe add self-references to all formats of
the FAQ (HTML, TXT, XML) here.  For drafts and
RFCs everybody knows where that is, but the FAQ
and the Checklist are no ordinary numbered I-Ds.

The 2006 slides exist also in a human readable
format, not only as proprietary document format: 
<http://www.ietf.org/proceedings/06jul/slides/xml2rfc-1/sld1.htm>

Eager to add a rfcmarkup link to the TXT version
on my IETF tools page,

 Frank