[xml2rfc] v1.28 released (was v1.27 released)
clive at demon.net (Clive D.W. Feather) Wed, 19 January 2005 01:16 UTC
From: "clive at demon.net"
Date: Wed, 19 Jan 2005 01:16:27 +0000
Subject: [xml2rfc] v1.28 released (was v1.27 released)
In-Reply-To: <20050118125416.GA20674@localhost.localdomain>
References: <DDBDE784-67B6-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <37F71350-6846-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <20050118083227.GJ362@finch-staff-1.thus.net> <20050118125416.GA20674@localhost.localdomain>
Message-ID: <20050119091544.GB42236@finch-staff-1.thus.net>
X-Date: Wed Jan 19 01:16:27 2005
Charles Levert said: >> Not completely; I downloaded 1.28 and I get a single nul in my output. >> I attach a fairly short example. > Can't reproduce it with your example, 1.28, and tcl-8.3.5-96.1 (from > Fedora Core). I'm using tcl8.4.1, though I couldn't offhand tell you where I got it from originally. I built it in October 2003. > I get the same output with the txt renderer, and with the > nroff renderer followed by groff (except for the missing newline at the > very end, but it's always like that), no NUL in either case. Everything > looks fine. The nul is in the text version as well, but not the HTML. I've done some more playing around, and this is the shortest file I can construct that reproduces it: <?xml version='1.0'?> <!DOCTYPE rfc SYSTEM 'rfc2629.dtd'> <?rfc strict='yes'?> <?rfc compact='no'?> <?rfc editing='no'?> <?rfc symrefs='yes'?> <?rfc sortrefs='yes'?> <?rfc emoticonic='yes'?> <?rfc toc='yes'?> <?rfc tocdepth='9'?> <rfc ipr="full3667" docName="draft-ietf-nntpext-base-25"> <front> <title>Network News Transfer Protocol</title> <author initials="C.D.W." surname="Feather" fullname="Clive D.W. Feather"> <organization>Thus plc</organization><address /> </author> <date year="2005" month="January" day="18" /> <area>Applications</area><workgroup>NNTP</workgroup> <abstract><t>X</t></abstract> </front> <middle> <section anchor="z" title="Extensions"> <t><xref target="x" />. <xref target="z" /></t> </section> <section anchor="x" title="Security Considerations"><t>X</t></section> </middle> </rfc> The null comes before the output generated by the second xref. The two references can be to the same place or different ones and there's no significance to fact that one is recursive. But it *is* important to have the dot-newline just before the second one; a colon won't cut the mustard. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | | >From clive at demon.net Wed Jan 19 09:19:35 2005 From: clive at demon.net (Clive D.W. Feather) Date: Wed Jan 19 01:19:46 2005 Subject: [xml2rfc] v1.28 released (was v1.27 released) In-Reply-To: <27FC0CDF-69CF-11D9-A059-000A95CA7FAE@dbc.mtview.ca.us> References: <DDBDE784-67B6-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <37F71350-6846-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <20050118083227.GJ362@finch-staff-1.thus.net> <27FC0CDF-69CF-11D9-A059-000A95CA7FAE@dbc.mtview.ca.us> Message-ID: <20050119091935.GD42236@finch-staff-1.thus.net> Marshall Rose said: >> * Ages ago I asked to be allowed to have more than one email address >> within the <author> clause. Is this unreasonable? > in a word: yes. Out of interest, why? One address is my work one. It's the correct one to have on the document, based on <organisation> and so on. The other is my personal address, and is likely to last long after I leave this job. Therefore it seems the right thing to include in a document that will (hopefully) be archived long-term. I can't be the only person in this situation, surely? -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | | >From charles_levert at gna.org Wed Jan 19 05:54:51 2005 From: charles_levert at gna.org (Charles Levert) Date: Wed Jan 19 02:55:08 2005 Subject: [xml2rfc] v1.28 released (was v1.27 released) In-Reply-To: <20050119091544.GB42236@finch-staff-1.thus.net> References: <DDBDE784-67B6-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <37F71350-6846-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <20050118083227.GJ362@finch-staff-1.thus.net> <20050118125416.GA20674@localhost.localdomain> <20050119091544.GB42236@finch-staff-1.thus.net> Message-ID: <20050119105451.GA12959@localhost.localdomain> * On Wednesday 2005-01-19 at 09:15:44 +0000, Clive D.W. Feather wrote: > Charles Levert said: > > I get the same output with the txt renderer, and with the > > nroff renderer followed by groff (except for the missing newline at the > > very end, but it's always like that), no NUL in either case. Everything > > looks fine. > > The nul is in the text version as well, but not the HTML. This is very surprising because the only place in the xml2rfc.tcl script where there is something even resembling a NUL is part of the nroff renderer (and it should generate backslash-NUL in the end). I'm assuming you don't mean the text version produced by running nroff on the foo.nr file, but the foo.txt one produced straight by "xml2rfc.tcl foo.xml". > I've done some more playing around, and this is the shortest file I can > construct that reproduces it: [snip] > The null comes before the output generated by the second xref. The two > references can be to the same place or different ones and there's no > significance to fact that one is recursive. But it *is* important to have > the dot-newline just before the second one; a colon won't cut the mustard. I still can't reproduce it with this new example. However, I did try it with tcl version 8.3.5-96.1 and with a freshly compiled 8.4.9 (latest stable tcl release) and I get this amazing discrepancy (files with 2 in the name are produced with 8.4.9): ======================================================================== --- cdwf-shortest.txt 2005-01-19 05:13:36 -0500 +++ cdwf-shortest2.txt 2005-01-19 05:40:28 -0500 @@ -115,7 +115,7 @@ 1. Extensions - Section 2. Section 1 + Section 2. Section 1 --- cdwf-shortest.nr 2005-01-19 05:15:26 -0500 +++ cdwf-shortest2.nr 2005-01-19 05:40:44 -0500 @@ -1,4 +1,4 @@ -.\" automatically generated by xml2rfc v1.28 on 2005-01-19T10:15:26Z +.\" automatically generated by xml2rfc v1.28 on 2005-01-19T10:40:44Z .\" .pl 10.0i .po 0 @@ -80,7 +80,7 @@ 1. Extensions .in 3 -Section\02. Section\01 +Section\02. Section\01 .bp .in 4 .ti 0 ======================================================================== I can't explain this right now. Maybe I'll look into it later. >From clive at demon.net Wed Jan 19 11:53:50 2005 From: clive at demon.net (Clive D.W. Feather) Date: Wed Jan 19 03:54:25 2005 Subject: [xml2rfc] v1.28 released (was v1.27 released) In-Reply-To: <20050119105451.GA12959@localhost.localdomain> References: <DDBDE784-67B6-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <37F71350-6846-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <20050118083227.GJ362@finch-staff-1.thus.net> <20050118125416.GA20674@localhost.localdomain> <20050119091544.GB42236@finch-staff-1.thus.net> <20050119105451.GA12959@localhost.localdomain> Message-ID: <20050119115349.GO42236@finch-staff-1.thus.net> Charles Levert said: >> The nul is in the text version as well, but not the HTML. > > This is very surprising because the only place in the xml2rfc.tcl script > where there is something even resembling a NUL is part of the nroff > renderer (and it should generate backslash-NUL in the end). I'm assuming > you don't mean the text version produced by running nroff on the foo.nr > file, but the foo.txt one produced straight by "xml2rfc.tcl foo.xml". Yes, I do. > I still can't reproduce it with this new example. > > However, I did try it with tcl version 8.3.5-96.1 and with a freshly > compiled 8.4.9 (latest stable tcl release) and I get this amazing > discrepancy (files with 2 in the name are produced with 8.4.9): [...] I downloaded all of 4.8.1 to 4.8.9 from Sourceforge. On both the simple example and my full source file, only 4.8.1 shows the problem; it was fixed between there and 4.8.2. I am going to upgrade to 4.8.9, which will solve my issues. I leave it for anyone interested to investigate further. -- Clive D.W. Feather | Work: <clive@demon.net> | Tel: +44 20 8495 6138 Internet Expert | Home: <clive@davros.org> | Fax: +44 870 051 9937 Demon Internet | WWW: http://www.davros.org | Mobile: +44 7973 377646 Thus plc | | >From charles_levert at gna.org Wed Jan 19 20:20:54 2005 From: charles_levert at gna.org (Charles Levert) Date: Wed Jan 19 17:21:11 2005 Subject: [xml2rfc] v1.28 released (was v1.27 released) In-Reply-To: <20050119105451.GA12959@localhost.localdomain> References: <DDBDE784-67B6-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <37F71350-6846-11D9-B0E2-000A95CA7FAE@dbc.mtview.ca.us> <20050118083227.GJ362@finch-staff-1.thus.net> <20050118125416.GA20674@localhost.localdomain> <20050119091544.GB42236@finch-staff-1.thus.net> <20050119105451.GA12959@localhost.localdomain> Message-ID: <20050120012054.GA25865@localhost.localdomain> * On Wednesday 2005-01-19 at 05:54:51 -0500, Charles Levert wrote: > However, I did try it with tcl version 8.3.5-96.1 and with a freshly > compiled 8.4.9 (latest stable tcl release) and I get this amazing > discrepancy (files with 2 in the name are produced with 8.4.9): > > - Section 2. Section 1 > + Section 2. Section 1 I don't know how, but the "set foo" statement in the patch below fixes that weird bug in tcl. (Maybe it even solves the NUL bug that was reported with some tcl versions; it can't test that myself since I was never able to duplicate it in the first place.) There was also an xml2rfc.tcl bug in the test to identify abbreviations: the pattern should have ended in ". ", not " .". I also changed the first pattern ({*[A-Z][A-Z]. }) to match a single (not double) capital letter followed by a period ({*[A-Z]. }) as in the middle initial of a name. (The double capital letter case was still covered by this at this point; but see below.) If this isn't ok, just revert it back. I assume the other match ({*[A-Z][a-z][a-z]. }) was meant for things like "Fig. 5". I left it there, but authors should really be encouraged to use "Fig. 5" which makes this check unnecessary; the pattern should be removed to avoid false positives. In the end, I got rid of the "y" variable and replaced all this by an equivalent regexp that also supports closing quotes and parentheses, as well as some other stuff ("cf.", "vs.", "resp.", and "Jr.", but not "e.g." and "i.e." which, as the TeXbook correctly points out, should never be immediately followed by spaces). I then removed support for the double capital letter case; abbreviations look like "SNAFU" or "S.N.A.F.U.", but not "SNAFU.", so this was not only unnecessary but harmful as sentences often end like "This is an RFC." does. I also added support in the initial lookup for other punctuation that normally requires two spaces after it in English typography (and for closing quotes and parentheses). (Strangely, this last change makes the "set foo" work-around statement unnecessary as that tcl bug just disappears, but I left it in there anyway, just in case. The tcl bug was probably in "string first", but I can't tell for sure.) That in turn prompted me to also put a minimum of two spaces after the colons in the author's address section. I also did the same for headers such as "Expires:" but if there is a fear that some automated processes that work on published I-Ds or RFCs might no longer recognize these headers, then just don't apply that part of the patch (the first three hunks affecting the pass2begin_front function). Tested with tcl versions 8.3.5-96.1 and 8.4.9 (on Linux). ======================================================================== --- xml2rfc.tcl.orig-1.28 2005-01-17 00:02:24.000000000 -0500 +++ xml2rfc.tcl 2005-01-19 19:51:19.000000000 -0500 @@ -4573,23 +4573,23 @@ proc pass2begin_front {elemX} { lappend left $first if {[string compare $rv(number) ""]} { - lappend left "Request for Comments: $rv(number)" + lappend left "Request for Comments: $rv(number)" set cindex [lsearch0 $categories $rv(category)] if {[string compare $rv(seriesNo) ""]} { lappend left \ - "[lindex [lindex $categories $cindex] 2]: $rv(seriesNo)" + "[lindex [lindex $categories $cindex] 2]: $rv(seriesNo)" } if {[string compare $rv(updates) ""]} { - lappend left "Updates: $rv(updates)" + lappend left "Updates: $rv(updates)" } if {[string compare $rv(obsoletes) ""]} { - lappend left "Obsoletes: $rv(obsoletes)" + lappend left "Obsoletes: $rv(obsoletes)" } set category [lindex [lindex $categories $cindex] 1] - lappend left "Category: $category" + lappend left "Category: $category" set status [list [lindex [lindex $categories $cindex] 3]] } else { if {$options(.STRICT)} { @@ -4616,10 +4616,10 @@ proc pass2begin_front {elemX} { lappend left "Internet-Draft" if {[string compare $rv(updates) ""]} { - lappend left "Updates: $rv(updates) (if approved)" + lappend left "Updates: $rv(updates) (if approved)" } if {[string compare $rv(obsoletes) ""]} { - lappend left "Obsoletes: $rv(obsoletes) (if approved)" + lappend left "Obsoletes: $rv(obsoletes) (if approved)" } if {[catch { set day $dv(day) }]} { @@ -4630,7 +4630,7 @@ proc pass2begin_front {elemX} { set day [string trimleft \ [clock format $secs -format "%d" -gmt true] 0] set expires [clock format $secs -format "%B $day, %Y" -gmt true] - lappend left "Expires: $expires" + lappend left "Expires: $expires" set category "Expires $expires" if {![string compare $mode html]} { set iindex 1 @@ -7160,7 +7160,7 @@ proc back_txt {authors} { set value [lindex [lindex $contacts \ [lsearch0 $contacts $key]] 1] set value [format %-6s $value:] - write_line_txt " $value [chars_expand [lindex $contact 1]]" + write_line_txt " $value [chars_expand [lindex $contact 1]]" } } } @@ -7446,22 +7446,25 @@ proc nbsp_expand_txt {s} { proc two_spaces {glop} { set post "" + # Work around a bug in tcl-8.4.9 and possibly others. + # Don't ask, it's a mystery anyway. + set foo "x$glop" + while {[string length $glop] > 0} { - if {[set x [string first ". " $glop]] < 0} { + # The double quotes will also match the end of a spanx-verb, which + # may not be the end of a sentence. Impossible to tell apart. :-( + if {![regexp -indices {[.?!](['"]?[])]?|[])]?['"]?) |: } $glop x]} { append post $glop break } - set pre [string range $glop 0 [expr $x+1]] - set glop [string trimleft [string range $glop [expr $x+2] end]] + set pre [string range $glop 0 [lindex $x 1]] + set glop [string trimleft [string range $glop [expr [lindex $x 1] + 1] end]] append post $pre # Check for likely abbreviation. Do not insert two spaces in # this case. - if {![set y [string match {*[A-Z][A-Z] .} $pre]]} { - set y [string match {*[A-Z][a-z][a-z] .} $pre] - } - if {!$y} { + if {![regexp {(^|[^A-Za-z])([A-Z]\.(['"]?[])]?|[])]?['"]?)|([A-Z][a-z][a-z]|[Cc]f|vs|resp|Jr)\.) $} $pre]} { append post " " } } @@ -9797,7 +9800,7 @@ proc back_nr {authors} { set value [lindex [lindex $contacts \ [lsearch0 $contacts $key]] 1] set value [format %-6s $value:] - write_line_nr "$value [chars_expand [lindex $contact 1]]" + write_line_nr "$value [chars_expand [lindex $contact 1]]" } } } ======================================================================== >From dhc2 at dcrocker.net Mon Jan 24 11:21:08 2005 From: dhc2 at dcrocker.net (Dave Crocker) Date: Mon Jan 24 11:21:15 2005 Subject: [xml2rfc] px vs. pt In-Reply-To: <20050120012054.GA25865@localhost.localdomain> Message-ID: <200512411218.297236@bbprime> Folks, Why are fonts sized by px rather than pt? The px-based output comes out pretty darn small. d/ ps. and to show my ignorance further, is there an xslt file to produce classic IETF ASCII TEXT, rather than HTML, PDF, or the like? -- Dave Crocker Brandenburg InternetWorking +1.408.246.8253 dcrocker a t ... WE'VE MOVED to: www.bbiw.net
- [xml2rfc] v1.27 released Marshall Rose
- [xml2rfc] v1.27 released Julian Reschke
- [xml2rfc] v1.28 released (was v1.27 released) Jonathan Lennox
- [xml2rfc] v1.28 released (was v1.27 released) Clive D.W. Feather
- [xml2rfc] v1.28 released (was v1.27 released) Clive D.W. Feather