[Tools-discuss] Data sources / alternatives to screen scraping
<Pasi.Eronen@nokia.com> Mon, 15 February 2010 08:01 UTC
Return-Path: <Pasi.Eronen@nokia.com>
X-Original-To: tools-discuss@core3.amsl.com
Delivered-To: tools-discuss@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id C44223A77B5 for <tools-discuss@core3.amsl.com>; Mon, 15 Feb 2010 00:01:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.372
X-Spam-Level:
X-Spam-Status: No, score=-6.372 tagged_above=-999 required=5 tests=[AWL=0.227, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aP-X7CETXYuo for <tools-discuss@core3.amsl.com>; Mon, 15 Feb 2010 00:01:45 -0800 (PST)
Received: from mgw-mx03.nokia.com (smtp.nokia.com [192.100.122.230]) by core3.amsl.com (Postfix) with ESMTP id 2C6253A7720 for <tools-discuss@ietf.org>; Mon, 15 Feb 2010 00:01:44 -0800 (PST)
Received: from esebh105.NOE.Nokia.com (esebh105.ntc.nokia.com [172.21.138.211]) by mgw-mx03.nokia.com (Switch-3.3.3/Switch-3.3.3) with ESMTP id o1F82vq7031316 for <tools-discuss@ietf.org>; Mon, 15 Feb 2010 10:03:12 +0200
Received: from vaebh102.NOE.Nokia.com ([10.160.244.23]) by esebh105.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 15 Feb 2010 10:03:10 +0200
Received: from vaebh101.NOE.Nokia.com ([10.160.244.22]) by vaebh102.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 15 Feb 2010 10:03:04 +0200
Received: from smtp.mgd.nokia.com ([65.54.30.7]) by vaebh101.NOE.Nokia.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.3959); Mon, 15 Feb 2010 10:03:01 +0200
Received: from NOK-EUMSG-01.mgdnok.nokia.com ([65.54.30.86]) by nok-am1mhub-03.mgdnok.nokia.com ([65.54.30.7]) with mapi; Mon, 15 Feb 2010 09:03:00 +0100
From: Pasi.Eronen@nokia.com
To: tools-discuss@ietf.org
Date: Mon, 15 Feb 2010 09:02:57 +0100
Thread-Topic: Data sources / alternatives to screen scraping
Thread-Index: AcquFUjhZRIRdljIT/+nJJWVIPTJIQ==
Message-ID: <808FD6E27AD4884E94820BC333B2DB775841610C5C@NOK-EUMSG-01.mgdnok.nokia.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
X-OriginalArrivalTime: 15 Feb 2010 08:03:01.0269 (UTC) FILETIME=[4B337450:01CAAE15]
X-Nokia-AV: Clean
Subject: [Tools-discuss] Data sources / alternatives to screen scraping
X-BeenThere: tools-discuss@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF Tools Discussion <tools-discuss.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tools-discuss>
List-Post: <mailto:tools-discuss@ietf.org>
List-Help: <mailto:tools-discuss-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-discuss>, <mailto:tools-discuss-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Feb 2010 08:01:46 -0000
Hi all, As Russ announced on the ietf-announce list a month ago, the upcoming datatracker user interface changes will break scripts that do "screen-scraping", or try to parse the HTML pages to extract some information. Much of the information is also available in forms that are intended to parseable by scripts; however, those data sources aren't exactly well documented (and it's not always easy to tell which of them are authorative primary data, and which derived). I've now updated the data source documentation on the Tools wiki (http://trac.tools.ietf.org/group/tools/trac/wiki/DataSources). Here's a quick summary about the most important primary data sources: --------- Status of all internet-drafts (tab-separated text; generated from the database by a cron job once a day): http://www.ietf.org/id/all_id.txt Title/authors/abstract/date for active internet-drafts (text; generated from the database by cron jobs once a day. Although these text files were probably originally intended for mainly human consumption, their format has been very stable over the years, and as they're currently used by number of tools, I would expect this to continue): http://www.ietf.org/id/1id-index.txt http://www.ietf.org/id/1id-abstracts.txt All RFCs (XML/text; note that the XML has much more information than the text version): http://www.rfc-editor.org/rfc/rfc-index.xml http://www.rfc-editor.org/rfc/rfc-index.txt Documents that are currently in IETF last call (Atom feed; generated from the database on-the-fly): http://datatracker.ietf.org/feed/last-call/ Document that are on the agenda of upcoming IESG telechats (tab-separated text; generated from the database on-the-fly): http://datatracker.ietf.org/iesg/agenda/documents.txt Documents in the RFC editor queue (XML): http://www.rfc-editor.org/queue2.xml Detailed datatracker history for a particular draft (Atom feed; generated from the database on-the-fly): http://datatracker.ietf.org/feed/comments/draft-ietf-msec-newtype-keyid/ Information about active WGs, chairs, mailing lists, charters, etc (text; generated from the database on-the-fly): http://datatracker.ietf.org/wg/summary.txt http://datatracker.ietf.org/wg/1wg-charters.txt IPR disclosures by draft (tab-separated text; generated from the database on-the-fly): http://datatracker.ietf.org/ipr/by-draft/ --------- The upcoming datatracker UI changes will affect only HTML pages, not any of the text/Atom/etc. URLs listed above. Best regards, Pasi