[certid] fyi: Ivan Ristic / Qualsys SSL Labs release raw data from the Internet SSL survey

=JeffH <Jeff.Hodges@KingsMountain.com> Fri, 15 October 2010 19:10 UTC

Return-Path: <Jeff.Hodges@KingsMountain.com>
X-Original-To: certid@core3.amsl.com
Delivered-To: certid@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id D9F9D3A6AFE for <certid@core3.amsl.com>; Fri, 15 Oct 2010 12:10:42 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -100.839
X-Spam-Level:
X-Spam-Status: No, score=-100.839 tagged_above=-999 required=5 tests=[AWL=-1.174, BAYES_50=0.001, IP_NOT_FRIENDLY=0.334, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yiEk2HlT9amx for <certid@core3.amsl.com>; Fri, 15 Oct 2010 12:10:38 -0700 (PDT)
Received: from cpoproxy2-pub.bluehost.com (cpoproxy2-pub.bluehost.com [67.222.39.38]) by core3.amsl.com (Postfix) with SMTP id 7DE703A6981 for <certid@ietf.org>; Fri, 15 Oct 2010 12:10:38 -0700 (PDT)
Received: (qmail 3834 invoked by uid 0); 15 Oct 2010 19:12:00 -0000
Received: from unknown (HELO box514.bluehost.com) (74.220.219.114) by cpoproxy2.bluehost.com with SMTP; 15 Oct 2010 19:12:00 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=kingsmountain.com; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding:X-Identified-User; b=cOuZVCRUYFynFn3imJ23OeWqTaVIwgl7DCIJ2ExstJpLpLepnsAvJRNXbxGg3EviQN7dUG+irqGP3xd4Sw6GfPZx/X2jtUguNKQuAqGY18TKKH18E2UzEpgSPagYeVxg;
Received: from outbound4.ebay.com ([216.113.168.128] helo=[10.244.48.179]) by box514.bluehost.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from <Jeff.Hodges@KingsMountain.com>) id 1P6phA-0005Xn-2Z for certid@ietf.org; Fri, 15 Oct 2010 13:12:00 -0600
Message-ID: <4CB8A77E.90009@KingsMountain.com>
Date: Fri, 15 Oct 2010 12:11:58 -0700
From: =JeffH <Jeff.Hodges@KingsMountain.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20100411)
MIME-Version: 1.0
To: IETF cert-based identity <certid@ietf.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: quoted-printable
X-Identified-User: {11025:box514.bluehost.com:kingsmou:kingsmountain.com} {sentby:smtp auth 216.113.168.128 authed with jeff.hodges+kingsmountain.com}
Subject: [certid] fyi: Ivan Ristic / Qualsys SSL Labs release raw data from the Internet SSL survey
X-BeenThere: certid@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Representation and verification of identity in certificates <certid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/certid>, <mailto:certid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/certid>
List-Post: <mailto:certid@ietf.org>
List-Help: <mailto:certid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/certid>, <mailto:certid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Oct 2010 19:10:43 -0000

October 05, 2010
Qualys SSL Labs releases raw data from the Internet SSL survey
<http://blog.ivanristic.com/2010/10/ssl-labs-releases-raw-data-from-the-internet-ssl-survey.html>

About two months ago, Qualys SSL Labs published the results of an Internet-wide 
SSL survey. We said that we would make the raw data available, and today we are 
following up on that promise. (By the way, we realize that two months is a long 
time, but we couldn't complete the process faster on this occasion. We hope to 
make future releases pretty much as soon as we obtain the data. As you may 
remember, our plan it to make our survey a quarterly event from 2011.)

The raw data contains the SSL assessment results of about 850,000 domain names 
(out of about 120M we inspected). The main file (1.2 GB 120 MB compressed, 3.5 
GB 800 MB uncompressed) is a dump of our PostgreSQL database in CSV format. We 
include in the download a simple PHP script that iterates through all the rows, 
which means that you can consume the data directly. Alternatively, you can put 
the data back into the database and use SQL to run ad-hoc queries (we provide 
the schema along with the import instructions).

The database schema contains 63 fields that generally parallel the information 
you would obtain from the SSL Labs online test. The complete original 
certificate chain is included, which is handy if you want to look into the 
aspects we didn't. We chose not to release certain sensitive data: the 
information on the low entropy private keys, renegotiation support, and HTTP 
server signatures was removed.

This is what you need to do to obtain the data:

    1. First, make sure that our terms and conditions are acceptable to you. At 
the core, we use the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 
Unported licence, but there are a few additional requirements. For example, we 
ask the obvious -- that you don't use the data for illegal activities. The 
other requirements are just common sense. (Please do read the entire file, 
however.)
    2. Second, send us an email (username "ivanr"; domain name 
"webkreator.com"), introduce yourself,  and tell us how you intend to use the 
data. We will then send you back the download instructions. We need this second 
step to give us an idea if the data is used, and how.

Update: We are removing the certificate chain data from the database until we 
confirm that we are legally allowed to redistribute it. If you need such data 
in the meantime, retrieve it directly from the servers.

Posted by Ivan Ristić at 12:29:04 in SSL