Re: [certid] SSL Labs Survey Data

=JeffH <Jeff.Hodges@KingsMountain.com> Fri, 05 November 2010 16:10 UTC

Return-Path: <Jeff.Hodges@KingsMountain.com>
X-Original-To: certid@core3.amsl.com
Delivered-To: certid@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id EA2823A689E for <certid@core3.amsl.com>; Fri, 5 Nov 2010 09:10:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -99.965
X-Spam-Level:
X-Spam-Status: No, score=-99.965 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, IP_NOT_FRIENDLY=0.334, MANGLED_STOP=2.3, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cM9aqCJhsuRn for <certid@core3.amsl.com>; Fri, 5 Nov 2010 09:10:55 -0700 (PDT)
Received: from cpoproxy2-pub.bluehost.com (cpoproxy2-pub.bluehost.com [67.222.39.38]) by core3.amsl.com (Postfix) with SMTP id 4AA533A6894 for <certid@ietf.org>; Fri, 5 Nov 2010 09:10:54 -0700 (PDT)
Received: (qmail 12818 invoked by uid 0); 5 Nov 2010 16:11:07 -0000
Received: from unknown (HELO box514.bluehost.com) (74.220.219.114) by cpoproxy2.bluehost.com with SMTP; 5 Nov 2010 16:11:07 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=kingsmountain.com; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding:X-Identified-User; b=pbfyKHQGr5jAsQsK+3t6BuYuODqcH01D7l/Ki+ZNTWgsxF8L+sMCtgEcqtnSbiqpXgGH8W2oYYdQqBdwBfQy2Yw0vzp3t+jd2ctQaIqL1XKVHwQ1yGDtGKrEbFZKM2Tn;
Received: from outbound4.ebay.com ([216.113.168.128] helo=[10.244.137.163]) by box514.bluehost.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from <Jeff.Hodges@KingsMountain.com>) id 1PEOsc-0002O6-SS for certid@ietf.org; Fri, 05 Nov 2010 10:11:06 -0600
Message-ID: <4CD42C99.3080505@KingsMountain.com>
Date: Fri, 05 Nov 2010 09:11:05 -0700
From: =JeffH <Jeff.Hodges@KingsMountain.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20100411)
MIME-Version: 1.0
To: IETF cert-based identity <certid@ietf.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Identified-User: {11025:box514.bluehost.com:kingsmou:kingsmountain.com} {sentby:smtp auth 216.113.168.128 authed with jeff.hodges+kingsmountain.com}
Subject: Re: [certid] SSL Labs Survey Data
X-BeenThere: certid@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Representation and verification of identity in certificates <certid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/certid>, <mailto:certid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/certid>
List-Post: <mailto:certid@ietf.org>
List-Help: <mailto:certid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/certid>, <mailto:certid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Nov 2010 16:10:58 -0000

Ivan replied..
 >
 > JeffH wrote:
 >>
 >> That explanation hints that most all the certs represented in the dataset
 >> would be "valid" certs. However, there's ~150k more entries in the dbase
 >> than the ~720K valid certs he observed. Though, there's ~150k apparently
 >> "self-signed" certs in the dbase, so perhaps that's what's filling out the
 >> dbase.
 >
 > The term "potentially valid" would be more accurate. The purpose of the
 > survey was to investigate how is an average SSL server configured and for
 > that we wanted to look at those servers that someone at least tried to
 > configure properly. There are so many invalid certificates out there, so
 > taking the configuration of all SSL servers would pollute the data.
 >
 > I defined "potentially valid" as residing on a domain name that matches the
 > certificate. Trust was not a factor, and that's why there are self-signed
 > certificates in the database. In addition, there's only one certificate per
 > domain name and IP address.
 >
 > The 720K certificates were obtained from the 119M data set of domain name
 > registrations. The additional 150K were obtained by looking at the Alexa's
 > top 1M sites, as well as by data mining web site names from the certificates
 > we obtained. The fact that there's about 150K self-signed certificates is a
 > coincidence.

That's helpful, thanks.

It'd be great if you could include that explanation (and/or post it on the web) 
along with the info from..

<http://blog.ivanristic.com/2010/07/ssl-server-survey-so-whats-with-the-22m-invalid-certificates-claim.html>

<http://blog.ivanristic.com/2010/07/ssl-server-survey-what-data-are-we-collecting.html>

..in a file in the data distro -- it'd help folks to better make use of it.

thanks again, having this data available is quite useful.

=JeffH

ps: also it'd be good to explain things such as the subjectCommonName dbase 
column comprising all the CN values in found in the Subject and then 
space-separated-concatenated.