[xml2rfc] Re: Bug in reference author output.

elwynd@nortelnetworks.com (Elwyn Davies) Mon, 11 August 2003 12:19 UTC

From: elwynd@nortelnetworks.com
Date: Mon, 11 Aug 2003 13:19:34 +0100
Subject: [xml2rfc] Re: Bug in reference author output.
Message-ID: <4103264BC8D3D51180B7002048400C4501623662@zhard0jd.europe.nortel.com>

This message is in MIME format. Since your mail reader does not understand
this format, some or all of this message may not be legible.

------_=_NextPart_001_01C36002.D2A95B42
Content-Type: text/plain;
	charset="iso-8859-1"

Well, that will teach me to apply logic to typographical matters.  I seem to
have got a bit carried away researching this but there are a couple of
interesting possibilities for community resources embedded lower down.

Having looked into this a bit, I am truly amazed by the number of different
'systems' for formatting citations and references that have been designed
and documented in considerable rigour.  All, it seems, also have the caveat
that most universities, journals and publishers have a house style.

It appears the RFC Editor (following in the steps of the illustrious Postel)
is using a variant of the Vancouver system, but so far as I can see, the
scheme of distinguishing the last author of multiple authors by using
'<initials> <surname>' is unique.  I also think it is bizarre, and that's
what immediately made me think it was a bug!  The relatively few examples of
multiple authors in the tutorials (eg.
http://infoskills.port.ac.uk/refcite/refcite.htm,
http://www.wisc.edu/writetest/Handbook/DocChi_WC_book.html) show one of
- ((<firstname> <other initials>)|<initials>) <surname> for all authors,
- <surname>(,?) <initials>' for all authors, or 
- <surname>(,?) <initials>' for the first and '<initials> <surname>' for all
others
with up to six authors spelled out and 'et al' appended for more than six.
The logic for the <surname> first for, at least, the first author presumably
being that the Vancouver system calls for references to be in alphabetical
order of first author surname.

The reference House Style for RFCs is enshrined in RFC2223 (by example in
the references section), and is, indeed, exactly as Marshall says - although
the case is built on exactly one example in Reference [5] where 'Li, T.' is
the second author of three.  I would doubt if one in a hundred ID/RFC
authors has noticed this since it is not spelt out in the text - we all know
how references are done, don't we;-).  The predecessor of RFC 2223 (RFC1543)
was more unhelpful, in that it contained the same definition by example but
gave only a single example, with a single author! I note that there is
currently no requirement for references to be ordered in any particular way
- presumably this was not thought essential given that most RFCs only
contain a handful of references. 

Incidentally, the only reason I came up against this, is that I have been
writing a tool to reverse engineer the XML from a basic ASCII draft.  I was
helping to convert an IRTF draft with 43 references in it and found that I
and my co-authors had conspired to format the references in just about every
possible convolution of minor deviations from the standard form, but the
human brain is such a forgiving syntax analyser that it not only manages to
extract the correct semantics but doesn't make the slightest complaint about
the random formatting.  I managed to create enough heuristics to get a good
automated result, but it will be interesting to try it on other docs.

[BTW - I also note that the policy was not applied consistently even around
the time that RFC2223 was published - RFC refs tend to be in the 'approved'
format, but other refs follow completely different strategies - see RFC2224
[Sandberg] and many of the refs in RFC2227.  The format seems to have crept
into use around RFC1144 - Marshall may even have been responsible for it;-).
I also looked at one RFC I was involved in (2475) and that is also
inconsistent. Sampling of more recent RFCs also reveals greater but by no
means total consistency, ***except for refs to other RFCs***, eg. ref [2] in
RFC3002, all the non-RFC refs in RFC3208, most of the refs in RFC3309, one
in RFC3312 - Do I detect that the RFC Editor has a handy library of RFC
reference boiler plate? - If so I would be happy to convert it to XML and
publish it as a useful resource for draft/RFC authors. Another thought along
these lines - a database of Author description blocks would save a lot of
tedious creation.]

This would all be moot if we were able to persuade everybody to use the
xml2rfc XML format, but it would still be a good idea to spell out the
intention in any future revision of RFC2223 -  it might make a bit less work
for the RFC Editor. It would also help to provide examples of references to
documents that are not RFCs and codify what is supposed to happen with URIs
now that these are commonplace.

Regards,
Elwyn



> -----Original Message-----
> From: Scott W Brim [mailto:swb@employees.org]
> Sent: 10 August 2003 02:13
> To: John C Klensin
> Cc: Marshall Rose; xml2rfc@lists.xml.resource.org
> Subject: Re: [xml2rfc] Re: Bug in reference author output.
> 
> 
> On Sat, Aug 09, 2003 06:37:06PM -0400, John C Klensin allegedly wrote:
> > 
> > 
> > --On Saturday, 09 August, 2003 15:24 -0700 Marshall Rose 
> > <mrose+internet.xml2rfc@dbc.mtview.ca.us> wrote:
> > 
> > >> Actually, if Elwyn correctly reported this, it may be a
> > >> problem.  The "official RFC style" is
> > >>
> > >> 	<surname>, <initials> For the first-listed author, and
> > >> 	then <initials> <surname> for all of the others.
> > >
> > > hmmm... here are all the references from rfc3576 with more
> > > than two listed authors:
> > >
> > >    [RFC2104]      Krawczyk, H., Bellare, M. and R. Canetti,
> > > "HMAC:                   Keyed-Hashing for Message
> > > Authentication", RFC 2104,                   February 1997.
> > >...
> > > i think your interpretation of the rule is wrong (or, if not,
> > > then the rfc editor is interpretting it differently).
> > 
> > I stand corrected.  This form is, IMO, bizarre (unlike what I 
> > thought the rule was, which I can rationalize).  But it is 
> > clearly the form the RFC Editor has been using (and correcting 
> > my documents to over the years without my noticing).
> 
> My age-old knowledge of correct citation format was what John 
> originally
> said -- only the first author has surname first.  The above format is
> bizarre to me as well.  What's the point?  (I wonder what the new
> Chicago Manual of Style says.)  I think xml2rfc should do the right
> thing (i.e. what we say), and we'll fix what RFC Editor says.
> 
> ..swb
> _______________________________________________
> xml2rfc mailing list
> xml2rfc@lists.xml.resource.org
> http://lists.xml.resource.org/mailman/listinfo/xml2rfc
> 

------_=_NextPart_001_01C36002.D2A95B42
Content-Type: text/html;
	charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Diso-8859-1">
<META NAME=3D"Generator" CONTENT=3D"MS Exchange Server version =
5.5.2656.31">
<TITLE>RE: [xml2rfc] Re: Bug in reference author output.</TITLE>
</HEAD>
<BODY>

<P><FONT SIZE=3D2>Well, that will teach me to apply logic to =
typographical matters.&nbsp; I seem to have got a bit carried away =
researching this but there are a couple of interesting possibilities =
for community resources embedded lower down.</FONT></P>

<P><FONT SIZE=3D2>Having looked into this a bit, I am truly amazed by =
the number of different 'systems' for formatting citations and =
references that have been designed and documented in considerable =
rigour.&nbsp; All, it seems, also have the caveat that most =
universities, journals and publishers have a house style.</FONT></P>

<P><FONT SIZE=3D2>It appears the RFC Editor (following in the steps of =
the illustrious Postel) is using a variant of the Vancouver system, but =
so far as I can see, the scheme of distinguishing the last author of =
multiple authors by using '&lt;initials&gt; &lt;surname&gt;' is =
unique.&nbsp; I also think it is bizarre, and that's what immediately =
made me think it was a bug!&nbsp; The relatively few examples of =
multiple authors in the tutorials (eg. <A =
HREF=3D"http://infoskills.port.ac.uk/refcite/refcite.htm" =
TARGET=3D"_blank">http://infoskills.port.ac.uk/refcite/refcite.htm</A>, =
<A HREF=3D"http://www.wisc.edu/writetest/Handbook/DocChi_WC_book.html" =
TARGET=3D"_blank">http://www.wisc.edu/writetest/Handbook/DocChi_WC_book.=
html</A>) show one of</FONT></P>

<P><FONT SIZE=3D2>- ((&lt;firstname&gt; &lt;other =
initials&gt;)|&lt;initials&gt;) &lt;surname&gt; for all authors,</FONT>
<BR><FONT SIZE=3D2>- &lt;surname&gt;(,?) &lt;initials&gt;' for all =
authors, or </FONT>
<BR><FONT SIZE=3D2>- &lt;surname&gt;(,?) &lt;initials&gt;' for the =
first and '&lt;initials&gt; &lt;surname&gt;' for all others</FONT>
<BR><FONT SIZE=3D2>with up to six authors spelled out and 'et al' =
appended for more than six.&nbsp; The logic for the &lt;surname&gt; =
first for, at least, the first author presumably being that the =
Vancouver system calls for references to be in alphabetical order of =
first author surname.</FONT></P>

<P><FONT SIZE=3D2>The reference House Style for RFCs is enshrined in =
RFC2223 (by example in the references section), and is, indeed, exactly =
as Marshall says - although the case is built on exactly one example in =
Reference [5] where 'Li, T.' is the second author of three.&nbsp; I =
would doubt if one in a hundred ID/RFC authors has noticed this since =
it is not spelt out in the text - we all know how references are done, =
don't we;-).&nbsp; The predecessor of RFC 2223 (RFC1543) was more =
unhelpful, in that it contained the same definition by example but gave =
only a single example, with a single author! I note that there is =
currently no requirement for references to be ordered in any particular =
way - presumably this was not thought essential given that most RFCs =
only contain a handful of references. </FONT></P>

<P><FONT SIZE=3D2>Incidentally, the only reason I came up against this, =
is that I have been writing a tool to reverse engineer the XML from a =
basic ASCII draft.&nbsp; I was helping to convert an IRTF draft with 43 =
references in it and found that I and my co-authors had conspired to =
format the references in just about every possible convolution of minor =
deviations from the standard form, but the human brain is such a =
forgiving syntax analyser that it not only manages to extract the =
correct semantics but doesn't make the slightest complaint about the =
random formatting.&nbsp; I managed to create enough heuristics to get a =
good automated result, but it will be interesting to try it on other =
docs.</FONT></P>

<P><FONT SIZE=3D2>[BTW - I also note that the policy was not applied =
consistently even around the time that RFC2223 was published - RFC refs =
tend to be in the 'approved' format, but other refs follow completely =
different strategies - see RFC2224 [Sandberg] and many of the refs in =
RFC2227.&nbsp; The format seems to have crept into use around RFC1144 - =
Marshall may even have been responsible for it;-). I also looked at one =
RFC I was involved in (2475) and that is also inconsistent. Sampling of =
more recent RFCs also reveals greater but by no means total =
consistency, ***except for refs to other RFCs***, eg. ref [2] in =
RFC3002, all the non-RFC refs in RFC3208, most of the refs in RFC3309, =
one in RFC3312 - Do I detect that the RFC Editor has a handy library of =
RFC reference boiler plate? - If so I would be happy to convert it to =
XML and publish it as a useful resource for draft/RFC authors. Another =
thought along these lines - a database of Author description blocks =
would save a lot of tedious creation.]</FONT></P>

<P><FONT SIZE=3D2>This would all be moot if we were able to persuade =
everybody to use the xml2rfc XML format, but it would still be a good =
idea to spell out the intention in any future revision of RFC2223 =
-&nbsp; it might make a bit less work for the RFC Editor. It would also =
help to provide examples of references to documents that are not RFCs =
and codify what is supposed to happen with URIs now that these are =
commonplace.</FONT></P>

<P><FONT SIZE=3D2>Regards,</FONT>
<BR><FONT SIZE=3D2>Elwyn</FONT>
</P>
<BR>
<BR>

<P><FONT SIZE=3D2>&gt; -----Original Message-----</FONT>
<BR><FONT SIZE=3D2>&gt; From: Scott W Brim [<A =
HREF=3D"mailto:swb@employees.org">mailto:swb@employees.org</A>]</FONT>
<BR><FONT SIZE=3D2>&gt; Sent: 10 August 2003 02:13</FONT>
<BR><FONT SIZE=3D2>&gt; To: John C Klensin</FONT>
<BR><FONT SIZE=3D2>&gt; Cc: Marshall Rose; =
xml2rfc@lists.xml.resource.org</FONT>
<BR><FONT SIZE=3D2>&gt; Subject: Re: [xml2rfc] Re: Bug in reference =
author output.</FONT>
<BR><FONT SIZE=3D2>&gt; </FONT>
<BR><FONT SIZE=3D2>&gt; </FONT>
<BR><FONT SIZE=3D2>&gt; On Sat, Aug 09, 2003 06:37:06PM -0400, John C =
Klensin allegedly wrote:</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; --On Saturday, 09 August, 2003 15:24 -0700 =
Marshall Rose </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; =
&lt;mrose+internet.xml2rfc@dbc.mtview.ca.us&gt; wrote:</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt;&gt; Actually, if Elwyn correctly =
reported this, it may be a</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt;&gt; problem.&nbsp; The &quot;official =
RFC style&quot; is</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt;&gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt;&gt;&nbsp; &lt;surname&gt;, =
&lt;initials&gt; For the first-listed author, and</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt;&gt;&nbsp; then &lt;initials&gt; =
&lt;surname&gt; for all of the others.</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt; hmmm... here are all the references =
from rfc3576 with more</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt; than two listed authors:</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt;</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt;&nbsp;&nbsp;&nbsp; =
[RFC2104]&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Krawczyk, H., Bellare, M. and =
R. Canetti,</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt; =
&quot;HMAC:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Keyed-Hashing for =
Message</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt; Authentication&quot;, RFC =
2104,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; February 1997.</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt;...</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt; i think your interpretation of the =
rule is wrong (or, if not,</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; &gt; then the rfc editor is interpretting =
it differently).</FONT>
<BR><FONT SIZE=3D2>&gt; &gt; </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; I stand corrected.&nbsp; This form is, =
IMO, bizarre (unlike what I </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; thought the rule was, which I can =
rationalize).&nbsp; But it is </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; clearly the form the RFC Editor has been =
using (and correcting </FONT>
<BR><FONT SIZE=3D2>&gt; &gt; my documents to over the years without my =
noticing).</FONT>
<BR><FONT SIZE=3D2>&gt; </FONT>
<BR><FONT SIZE=3D2>&gt; My age-old knowledge of correct citation format =
was what John </FONT>
<BR><FONT SIZE=3D2>&gt; originally</FONT>
<BR><FONT SIZE=3D2>&gt; said -- only the first author has surname =
first.&nbsp; The above format is</FONT>
<BR><FONT SIZE=3D2>&gt; bizarre to me as well.&nbsp; What's the =
point?&nbsp; (I wonder what the new</FONT>
<BR><FONT SIZE=3D2>&gt; Chicago Manual of Style says.)&nbsp; I think =
xml2rfc should do the right</FONT>
<BR><FONT SIZE=3D2>&gt; thing (i.e. what we say), and we'll fix what =
RFC Editor says.</FONT>
<BR><FONT SIZE=3D2>&gt; </FONT>
<BR><FONT SIZE=3D2>&gt; ..swb</FONT>
<BR><FONT SIZE=3D2>&gt; =
_______________________________________________</FONT>
<BR><FONT SIZE=3D2>&gt; xml2rfc mailing list</FONT>
<BR><FONT SIZE=3D2>&gt; xml2rfc@lists.xml.resource.org</FONT>
<BR><FONT SIZE=3D2>&gt; <A =
HREF=3D"http://lists.xml.resource.org/mailman/listinfo/xml2rfc" =
TARGET=3D"_blank">http://lists.xml.resource.org/mailman/listinfo/xml2rfc=
</A></FONT>
<BR><FONT SIZE=3D2>&gt; </FONT>
</P>

</BODY>
</HTML>
------_=_NextPart_001_01C36002.D2A95B42--