Re: Google Scholar, was How to pay $47 for a copy of RFC 793

Harald Alvestrand <harald@alvestrand.no> Tue, 10 May 2011 15:41 UTC

Return-Path: <harald@alvestrand.no>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id ED7D8E071C for <ietf@ietfa.amsl.com>; Tue, 10 May 2011 08:41:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RyeDzG31xgpu for <ietf@ietfa.amsl.com>; Tue, 10 May 2011 08:41:54 -0700 (PDT)
Received: from eikenes.alvestrand.no (eikenes.alvestrand.no [158.38.152.233]) by ietfa.amsl.com (Postfix) with ESMTP id B374BE06CC for <ietf@ietf.org>; Tue, 10 May 2011 08:41:54 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id AEC4439E162; Tue, 10 May 2011 17:41:03 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id DakuZS7r-pJE; Tue, 10 May 2011 17:41:03 +0200 (CEST)
Received: from [172.16.33.45] (62-20-124-50.customer.telia.com [62.20.124.50]) by eikenes.alvestrand.no (Postfix) with ESMTPS id E073139E119; Tue, 10 May 2011 17:41:02 +0200 (CEST)
Message-ID: <4DC95CBE.60304@alvestrand.no>
Date: Tue, 10 May 2011 17:41:50 +0200
From: Harald Alvestrand <harald@alvestrand.no>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.17) Gecko/20110424 Thunderbird/3.1.10
MIME-Version: 1.0
To: John Levine <johnl@iecc.com>
Subject: Re: Google Scholar, was How to pay $47 for a copy of RFC 793
References: <20110510152851.40727.qmail@joyce.lan>
In-Reply-To: <20110510152851.40727.qmail@joyce.lan>
Content-Type: text/plain; charset="UTF-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: ietf@ietf.org
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ietf>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 10 May 2011 15:41:56 -0000

On 05/10/11 17:28, John Levine wrote:
>> In the case of Google Scholar, I found the guidelines to be a bit
>> intimidating:
>>
>> http://scholar.google.com/intl/en/scholar/inclusion.html
>>
>> but not something that would be hard for the RFC publisher to set up in
>> a few hours based on the PDF form of the RFCs and the rfc-index.xml file.
> Actually, now that I look at their guidelines, I'm sort of surprised
> that they're not in Scholar.  They say they'll index HTML versions of
> documents so long as they have meta tags that have the title, author,
> and other bibliographic info and it has references it can crawl to do
> cross links to other documents. The HTML versions in
> tools.ietf.org/html look to me like they have the right tags.  The
> problem may be that the meta tags are missing some minor item, that it
> can't recognize the references sections, which should be a matter of
> tweaking the HTML a little bit, or maybe that there isn't a TOC page
> that lets it recognize all the RFCs as a collection.
>
> Whatever it is, it doesn't look like it'd be hard for someone with
> sufficient spare time to fix.
For some reason, scholar has indexed 151 docs from tools.ietf.org and 
then stopped.

http://scholar.google.com/scholar?hl=en&q=site%3Atools.ietf.org&btnG=Search&as_sdt=0%2C5&as_ylo=&as_vis=0 
<http://scholar.google.com/scholar?hl=en&q=site%3Atools.ietf.org&btnG=Search&as_sdt=0%2C5&as_ylo=&as_vis=0>

> R's,
> John
>