Re: Documents with no authors

John C Klensin <john-ietf@jck.com> Fri, 23 November 2018 17:25 UTC

Return-Path: <john-ietf@jck.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E401312DD85 for <ietf@ietfa.amsl.com>; Fri, 23 Nov 2018 09:25:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zkRE-yszE23z for <ietf@ietfa.amsl.com>; Fri, 23 Nov 2018 09:25:50 -0800 (PST)
Received: from bsa2.jck.com (bsa2.jck.com [70.88.254.51]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D17C2130DC0 for <ietf@ietf.org>; Fri, 23 Nov 2018 09:25:49 -0800 (PST)
Received: from [198.252.137.10] (helo=PSB) by bsa2.jck.com with esmtp (Exim 4.82 (FreeBSD)) (envelope-from <john-ietf@jck.com>) id 1gQFDA-000PU9-3G; Fri, 23 Nov 2018 12:25:48 -0500
Date: Fri, 23 Nov 2018 12:25:41 -0500
From: John C Klensin <john-ietf@jck.com>
To: Doug Royer <douglasroyer@gmail.com>, ietf@ietf.org
Subject: Re: Documents with no authors
Message-ID: <95327D9B3B548C86FBB67CE6@PSB>
In-Reply-To: <817cb7db-c095-75cf-3450-ddc9c3372784@gmail.com>
References: <7f831a6a-e2d0-cb88-1d2a-dfcdab921307@gmail.com> <817cb7db-c095-75cf-3450-ddc9c3372784@gmail.com>
X-Mailer: Mulberry/4.0.8 (Win32)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
X-SA-Exim-Connect-IP: 198.252.137.10
X-SA-Exim-Mail-From: john-ietf@jck.com
X-SA-Exim-Scanned: No (on bsa2.jck.com); SAEximRunCond expanded to false
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf/bZWXx1ZQV5eJDBM99Mg65axAEqg>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 23 Nov 2018 17:25:52 -0000


--On Friday, November 23, 2018 07:57 -0700 Doug Royer
<douglasroyer@gmail.com> wrote:

> On 11/23/18 3:29 AM, Stewart Bryant wrote:
>> https://datatracker.ietf.org/stats/document/authors/
>> 
>> Why do such a high proportion of our documents (for example
>> 1929 RFCs)  have no authors?
> 
> Well, RFC-1929 does have an author. So I am guessing the
> automated tools can not (or did not) parse the older text only
> documents.

A different guess would be that whatever tool/ algorithm
produces this graph counts a document with "only" an editor as
having no author.   If that is the way things are counted, this
sort of statistic would not be surprising.  Indeed, if documents
that came out of a WG and that were ultimately compendiums of
input from many WG participants were identified as having an
editor and not an author or handful or authors, I'd expect
16.79% to be somewhat low and hope 25.21% (of RFCs only) would
be low too.

Doug, if the problem were "text only", then one would expect a
much larger number.  If you intended "XML available" then that
wasn't defined until RFC 2629 and, IIR, the RFC Editor didn't
start accepting the XML files, much less archiving them, until
much leter.  If it were "XML or nroff", I don't know -- it might
depend on whether the documents that were submitted/archived on
paper and then scanned and converted passed through an nroff
page.   More important to this little detective job, if one adds
up the numbers in the right column of the "RFCs" tab, one ends
up with 8311, a fair approximation to the largest RFC number as
of this morning (8521), and a closer one if the number "not
issued" (79) is subtracted (8442).   Could the difference of
about 210 be documents that have been issued numbers but are
still in the publication queue?  I don't know, but, given the
highest issued numbers are 8496, 8505, and 8521, it doesn't seem
entirely implausible.

Similar comments would apply to I-Ds: as far as I know, it has
never been possible to post one without an identifiable author
or editor.  There are definitely a few pseudonyms but those are
still authors for the purpose of this type of count  Possibly
something slipped through the cracks, but I'd expect that number
to be in single digits.

Moreover, counting an RFC as having "no author" when it was
really "not parsed" would be seriously irresponsible and I would
not expect that of the tools team.  FWIW, I would expect a page
like these to show the date compiled (perhaps there was a lag
between the RFC list or I-D list and the compilation date/time)
and exactly what is reported as "0 authors". 

  --your friendly statistical detective