[xml2rfc] v1.28 released (was v1.27 released)

clive at demon.net (Clive D.W. Feather) Wed, 19 January 2005 01:16 UTC

From: "clive at demon.net"
Date: Wed, 19 Jan 2005 01:16:27 +0000
Subject: [xml2rfc] v1.28 released (was v1.27 released)
In-Reply-To: <20050118125416.GA20674@localhost.localdomain>
References: <DDBDE784-67B6-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <37F71350-6846-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <20050118083227.GJ362@finch-staff-1.thus.net> <20050118125416.GA20674@localhost.localdomain>
Message-ID: <20050119091544.GB42236@finch-staff-1.thus.net>
X-Date: Wed Jan 19 01:16:27 2005

Charles Levert said:
>> Not completely; I downloaded 1.28 and I get a single nul in my output.
>> I attach a fairly short example.
> Can't reproduce it with your example, 1.28, and tcl-8.3.5-96.1 (from
> Fedora Core).

I'm using tcl8.4.1, though I couldn't offhand tell you where I got it from
originally. I built it in October 2003.

> I get the same output with the txt renderer, and with the
> nroff renderer followed by groff (except for the missing newline at the
> very end, but it's always like that), no NUL in either case.  Everything
> looks fine.

The nul is in the text version as well, but not the HTML.

I've done some more playing around, and this is the shortest file I can
construct that reproduces it:

<?xml version='1.0'?>
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>
<?rfc strict='yes'?>
<?rfc compact='no'?>
<?rfc editing='no'?>
<?rfc symrefs='yes'?>
<?rfc sortrefs='yes'?>
<?rfc emoticonic='yes'?>
<?rfc toc='yes'?>
<?rfc tocdepth='9'?>
<rfc ipr="full3667" docName="draft-ietf-nntpext-base-25">
<front>
  <title>Network News Transfer Protocol</title>
  <author initials="C.D.W." surname="Feather" fullname="Clive D.W. Feather">
    <organization>Thus plc</organization><address />
  </author>
  <date year="2005" month="January" day="18" />
  <area>Applications</area><workgroup>NNTP</workgroup>
  <abstract><t>X</t></abstract>
</front>
<middle>
<section anchor="z" title="Extensions">
<t><xref target="x" />.
<xref target="z" /></t>
</section>
<section anchor="x" title="Security Considerations"><t>X</t></section>
</middle>
</rfc>

The null comes before the output generated by the second xref. The two
references can be to the same place or different ones and there's no
significance to fact that one is recursive. But it *is* important to have
the dot-newline just before the second one; a colon won't cut the mustard.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | Fax:    +44 870 051 9937
Demon Internet      | WWW: http://www.davros.org | Mobile: +44 7973 377646
Thus plc            |                            |
>From clive at demon.net  Wed Jan 19 09:19:35 2005
From: clive at demon.net (Clive D.W. Feather)
Date: Wed Jan 19 01:19:46 2005
Subject: [xml2rfc] v1.28 released (was v1.27 released)
In-Reply-To: <27FC0CDF-69CF-11D9-A059-000A95CA7FAE@dbc.mtview.ca.us>
References: <DDBDE784-67B6-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us>
	<37F71350-6846-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us>
	<20050118083227.GJ362@finch-staff-1.thus.net>
	<27FC0CDF-69CF-11D9-A059-000A95CA7FAE@dbc.mtview.ca.us>
Message-ID: <20050119091935.GD42236@finch-staff-1.thus.net>

Marshall Rose said:
>> * Ages ago I asked to be allowed to have more than one email address
>>   within the <author> clause. Is this unreasonable?
> in a word: yes.

Out of interest, why?

One address is my work one. It's the correct one to have on the document,
based on <organisation> and so on.

The other is my personal address, and is likely to last long after I leave
this job. Therefore it seems the right thing to include in a document that
will (hopefully) be archived long-term.

I can't be the only person in this situation, surely?

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | Fax:    +44 870 051 9937
Demon Internet      | WWW: http://www.davros.org | Mobile: +44 7973 377646
Thus plc            |                            |
>From charles_levert at gna.org  Wed Jan 19 05:54:51 2005
From: charles_levert at gna.org (Charles Levert)
Date: Wed Jan 19 02:55:08 2005
Subject: [xml2rfc] v1.28 released (was v1.27 released)
In-Reply-To: <20050119091544.GB42236@finch-staff-1.thus.net>
References: <DDBDE784-67B6-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us>
	<37F71350-6846-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us>
	<20050118083227.GJ362@finch-staff-1.thus.net>
	<20050118125416.GA20674@localhost.localdomain>
	<20050119091544.GB42236@finch-staff-1.thus.net>
Message-ID: <20050119105451.GA12959@localhost.localdomain>

* On Wednesday 2005-01-19 at 09:15:44 +0000, Clive D.W. Feather wrote:
> Charles Levert said:
> > I get the same output with the txt renderer, and with the
> > nroff renderer followed by groff (except for the missing newline at the
> > very end, but it's always like that), no NUL in either case.  Everything
> > looks fine.
> 
> The nul is in the text version as well, but not the HTML.

This is very surprising because the only place in the xml2rfc.tcl script
where there is something even resembling a NUL is part of the nroff
renderer (and it should generate backslash-NUL in the end).  I'm assuming
you don't mean the text version produced by running nroff on the foo.nr
file, but the foo.txt one produced straight by "xml2rfc.tcl foo.xml".

> I've done some more playing around, and this is the shortest file I can
> construct that reproduces it:

[snip]

> The null comes before the output generated by the second xref. The two
> references can be to the same place or different ones and there's no
> significance to fact that one is recursive. But it *is* important to have
> the dot-newline just before the second one; a colon won't cut the mustard.

I still can't reproduce it with this new example.

However, I did try it with tcl version 8.3.5-96.1 and with a freshly
compiled 8.4.9 (latest stable tcl release) and I get this amazing
discrepancy (files with 2 in the name are produced with 8.4.9):



========================================================================
--- cdwf-shortest.txt	2005-01-19 05:13:36 -0500
+++ cdwf-shortest2.txt	2005-01-19 05:40:28 -0500
@@ -115,7 +115,7 @@
 
 1.  Extensions
 
-   Section 2.  Section 1
+   Section 2. Section 1
 
 
 
--- cdwf-shortest.nr	2005-01-19 05:15:26 -0500
+++ cdwf-shortest2.nr	2005-01-19 05:40:44 -0500
@@ -1,4 +1,4 @@
-.\" automatically generated by xml2rfc v1.28 on 2005-01-19T10:15:26Z
+.\" automatically generated by xml2rfc v1.28 on 2005-01-19T10:40:44Z
 .\" 
 .pl 10.0i
 .po 0
@@ -80,7 +80,7 @@
 1.  Extensions
 .in 3
 
-Section\02.  Section\01
+Section\02. Section\01
 .bp
 .in 4
 .ti 0
========================================================================



I can't explain this right now.  Maybe I'll look into it later.
>From clive at demon.net  Wed Jan 19 11:53:50 2005
From: clive at demon.net (Clive D.W. Feather)
Date: Wed Jan 19 03:54:25 2005
Subject: [xml2rfc] v1.28 released (was v1.27 released)
In-Reply-To: <20050119105451.GA12959@localhost.localdomain>
References: <DDBDE784-67B6-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us>
	<37F71350-6846-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us>
	<20050118083227.GJ362@finch-staff-1.thus.net>
	<20050118125416.GA20674@localhost.localdomain>
	<20050119091544.GB42236@finch-staff-1.thus.net>
	<20050119105451.GA12959@localhost.localdomain>
Message-ID: <20050119115349.GO42236@finch-staff-1.thus.net>

Charles Levert said:
>> The nul is in the text version as well, but not the HTML.
> 
> This is very surprising because the only place in the xml2rfc.tcl script
> where there is something even resembling a NUL is part of the nroff
> renderer (and it should generate backslash-NUL in the end).  I'm assuming
> you don't mean the text version produced by running nroff on the foo.nr
> file, but the foo.txt one produced straight by "xml2rfc.tcl foo.xml".

Yes, I do.

> I still can't reproduce it with this new example.
> 
> However, I did try it with tcl version 8.3.5-96.1 and with a freshly
> compiled 8.4.9 (latest stable tcl release) and I get this amazing
> discrepancy (files with 2 in the name are produced with 8.4.9):
[...]

I downloaded all of 4.8.1 to 4.8.9 from Sourceforge. On both the simple
example and my full source file, only 4.8.1 shows the problem; it was
fixed between there and 4.8.2.

I am going to upgrade to 4.8.9, which will solve my issues. I leave it
for anyone interested to investigate further.

-- 
Clive D.W. Feather  | Work:  <clive@demon.net>   | Tel:    +44 20 8495 6138
Internet Expert     | Home:  <clive@davros.org>  | Fax:    +44 870 051 9937
Demon Internet      | WWW: http://www.davros.org | Mobile: +44 7973 377646
Thus plc            |                            |
>From charles_levert at gna.org  Wed Jan 19 20:20:54 2005
From: charles_levert at gna.org (Charles Levert)
Date: Wed Jan 19 17:21:11 2005
Subject: [xml2rfc] v1.28 released (was v1.27 released)
In-Reply-To: <20050119105451.GA12959@localhost.localdomain>
References: <DDBDE784-67B6-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us>
	<37F71350-6846-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us>
	<20050118083227.GJ362@finch-staff-1.thus.net>
	<20050118125416.GA20674@localhost.localdomain>
	<20050119091544.GB42236@finch-staff-1.thus.net>
	<20050119105451.GA12959@localhost.localdomain>
Message-ID: <20050120012054.GA25865@localhost.localdomain>

* On Wednesday 2005-01-19 at 05:54:51 -0500, Charles Levert wrote:
> However, I did try it with tcl version 8.3.5-96.1 and with a freshly
> compiled 8.4.9 (latest stable tcl release) and I get this amazing
> discrepancy (files with 2 in the name are produced with 8.4.9):
> 
> -   Section 2.  Section 1
> +   Section 2. Section 1

I don't know how, but the "set foo" statement in the patch below fixes
that weird bug in tcl.  (Maybe it even solves the NUL bug that was
reported with some tcl versions; it can't test that myself since I was
never able to duplicate it in the first place.)


There was also an xml2rfc.tcl bug in the test to identify abbreviations:
the pattern should have ended in ". ", not " .".


I also changed the first pattern ({*[A-Z][A-Z]. }) to match a single
(not double) capital letter followed by a period ({*[A-Z]. }) as in the
middle initial of a name.  (The double capital letter case was still
covered by this at this point; but see below.)  If this isn't ok, just
revert it back.

I assume the other match ({*[A-Z][a-z][a-z]. }) was meant for things like
"Fig. 5".  I left it there, but authors should really be encouraged to use
"Fig.&nbsp;5" which makes this check unnecessary; the pattern should be
removed to avoid false positives.

In the end, I got rid of the "y" variable and replaced all this by an
equivalent regexp that also supports closing quotes and parentheses,
as well as some other stuff ("cf.", "vs.", "resp.", and "Jr.", but not
"e.g." and "i.e." which, as the TeXbook correctly points out, should
never be immediately followed by spaces).  I then removed support for
the double capital letter case; abbreviations look like "SNAFU" or
"S.N.A.F.U.", but not "SNAFU.", so this was not only unnecessary but
harmful as sentences often end like "This is an RFC." does.


I also added support in the initial lookup for other punctuation that
normally requires two spaces after it in English typography (and for
closing quotes and parentheses).

(Strangely, this last change makes the "set foo" work-around statement
unnecessary as that tcl bug just disappears, but I left it in there
anyway, just in case.  The tcl bug was probably in "string first", but
I can't tell for sure.)

That in turn prompted me to also put a minimum of two spaces after
the colons in the author's address section.  I also did the same for
headers such as "Expires:" but if there is a fear that some automated
processes that work on published I-Ds or RFCs might no longer recognize
these headers, then just don't apply that part of the patch (the first
three hunks affecting the pass2begin_front function).


Tested with tcl versions 8.3.5-96.1 and 8.4.9 (on Linux).



========================================================================
--- xml2rfc.tcl.orig-1.28	2005-01-17 00:02:24.000000000 -0500
+++ xml2rfc.tcl	2005-01-19 19:51:19.000000000 -0500
@@ -4573,23 +4573,23 @@ proc pass2begin_front {elemX} {
         lappend left $first
 
         if {[string compare $rv(number) ""]} {
-            lappend left "Request for Comments: $rv(number)"
+            lappend left "Request for Comments:  $rv(number)"
 
             set cindex [lsearch0 $categories $rv(category)]
             if {[string compare $rv(seriesNo) ""]} {
                 lappend left \
-                        "[lindex [lindex $categories $cindex] 2]: $rv(seriesNo)"
+                        "[lindex [lindex $categories $cindex] 2]:  $rv(seriesNo)"
             }
 
             if {[string compare $rv(updates) ""]} {
-                lappend left "Updates: $rv(updates)"
+                lappend left "Updates:  $rv(updates)"
             }
             if {[string compare $rv(obsoletes) ""]} {
-                lappend left "Obsoletes: $rv(obsoletes)"
+                lappend left "Obsoletes:  $rv(obsoletes)"
             }
 
             set category [lindex [lindex $categories $cindex] 1]
-            lappend left "Category: $category"
+            lappend left "Category:  $category"
             set status [list [lindex [lindex $categories $cindex] 3]]
         } else {
             if {$options(.STRICT)} {
@@ -4616,10 +4616,10 @@ proc pass2begin_front {elemX} {
             lappend left "Internet-Draft"
 
             if {[string compare $rv(updates) ""]} {
-                lappend left "Updates: $rv(updates) (if approved)"
+                lappend left "Updates:  $rv(updates) (if approved)"
             }
             if {[string compare $rv(obsoletes) ""]} {
-                lappend left "Obsoletes: $rv(obsoletes) (if approved)"
+                lappend left "Obsoletes:  $rv(obsoletes) (if approved)"
             }
 
             if {[catch { set day $dv(day) }]} {
@@ -4630,7 +4630,7 @@ proc pass2begin_front {elemX} {
             set day [string trimleft \
                             [clock format $secs -format "%d" -gmt true] 0]
             set expires [clock format $secs -format "%B $day, %Y" -gmt true]
-            lappend left "Expires: $expires"
+            lappend left "Expires:  $expires"
             set category "Expires $expires"
             if {![string compare $mode html]} {
                 set iindex 1
@@ -7160,7 +7160,7 @@ proc back_txt {authors} {
                 set value [lindex [lindex $contacts \
                                           [lsearch0 $contacts $key]] 1]
                 set value [format %-6s $value:]
-                write_line_txt "   $value [chars_expand [lindex $contact 1]]"
+                write_line_txt "   $value  [chars_expand [lindex $contact 1]]"
             }
         }
     }
@@ -7446,22 +7446,25 @@ proc nbsp_expand_txt {s} {
 proc two_spaces {glop} {
     set post ""
 
+    # Work around a bug in tcl-8.4.9 and possibly others.
+    # Don't ask, it's a mystery anyway.
+    set foo "x$glop"
+
     while {[string length $glop] > 0} {
-        if {[set x [string first ". " $glop]] < 0} {
+        # The double quotes will also match the end of a spanx-verb, which
+        # may not be the end of a sentence.  Impossible to tell apart.  :-(
+        if {![regexp -indices {[.?!](['"]?[])]?|[])]?['"]?) |: } $glop x]} {
             append post $glop
             break
         }
 
-        set pre [string range $glop 0 [expr $x+1]]
-        set glop [string trimleft [string range $glop [expr $x+2] end]]
+        set pre [string range $glop 0 [lindex $x 1]]
+        set glop [string trimleft [string range $glop [expr [lindex $x 1] + 1] end]]
         append post $pre
 
         # Check for likely abbreviation.  Do not insert two spaces in
         # this case.
-        if {![set y [string match {*[A-Z][A-Z] .} $pre]]} {
-            set y [string match {*[A-Z][a-z][a-z] .} $pre]
-        }
-        if {!$y} {
+        if {![regexp {(^|[^A-Za-z])([A-Z]\.(['"]?[])]?|[])]?['"]?)|([A-Z][a-z][a-z]|[Cc]f|vs|resp|Jr)\.) $} $pre]} {
             append post " "
         }
     }
@@ -9797,7 +9800,7 @@ proc back_nr {authors} {
                 set value [lindex [lindex $contacts \
                                           [lsearch0 $contacts $key]] 1]
                 set value [format %-6s $value:]
-                write_line_nr "$value [chars_expand [lindex $contact 1]]"
+                write_line_nr "$value  [chars_expand [lindex $contact 1]]"
             }
         }
     }
========================================================================
>From dhc2 at dcrocker.net  Mon Jan 24 11:21:08 2005
From: dhc2 at dcrocker.net (Dave Crocker)
Date: Mon Jan 24 11:21:15 2005
Subject: [xml2rfc] px vs. pt
In-Reply-To: <20050120012054.GA25865@localhost.localdomain>
Message-ID: <200512411218.297236@bbprime>

Folks,

Why are fonts sized by px rather than pt?  

The px-based output comes out pretty darn small.



d/

ps.  and to show my ignorance further, is there an xslt file to produce classic IETF ASCII TEXT, rather than HTML, PDF, or the like?

--
Dave Crocker
Brandenburg InternetWorking
+1.408.246.8253
dcrocker  a t ...
WE'VE MOVED to:  www.bbiw.net