Re: Google Scholar, was How to pay $47 for a copy of RFC 793

"John Levine" <> Tue, 10 May 2011 15:35 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 74348E06CD for <>; Tue, 10 May 2011 08:35:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -109.333
X-Spam-Status: No, score=-109.333 tagged_above=-999 required=5 tests=[AWL=1.866, BAYES_00=-2.599, HABEAS_ACCREDITED_SOI=-4.3, RCVD_IN_BSP_TRUSTED=-4.3, USER_IN_WHITELIST=-100]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id lXANipo-yaJQ for <>; Tue, 10 May 2011 08:35:54 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 85411E0659 for <>; Tue, 10 May 2011 08:35:54 -0700 (PDT)
Received: (qmail 21522 invoked from network); 10 May 2011 15:29:13 -0000
Received: from ( by with QMQP; 10 May 2011 15:29:13 -0000
Date: 10 May 2011 15:28:51 -0000
Message-ID: <20110510152851.40727.qmail@joyce.lan>
From: "John Levine" <>
Subject: Re: Google Scholar, was How to pay $47 for a copy of RFC 793
In-Reply-To: <>
X-Headerized: yes
Mime-Version: 1.0
Content-type: text/plain; charset=utf-8
Content-transfer-encoding: 7bit
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 10 May 2011 15:35:55 -0000

>In the case of Google Scholar, I found the guidelines to be a bit 
>but not something that would be hard for the RFC publisher to set up in 
>a few hours based on the PDF form of the RFCs and the rfc-index.xml file.

Actually, now that I look at their guidelines, I'm sort of surprised
that they're not in Scholar.  They say they'll index HTML versions of
documents so long as they have meta tags that have the title, author,
and other bibliographic info and it has references it can crawl to do
cross links to other documents. The HTML versions in look to me like they have the right tags.  The
problem may be that the meta tags are missing some minor item, that it
can't recognize the references sections, which should be a matter of
tweaking the HTML a little bit, or maybe that there isn't a TOC page
that lets it recognize all the RFCs as a collection.

Whatever it is, it doesn't look like it'd be hard for someone with
sufficient spare time to fix.