Re: [certid] SSL Labs Survey Data
=JeffH <Jeff.Hodges@KingsMountain.com> Fri, 05 November 2010 16:10 UTC
Return-Path: <Jeff.Hodges@KingsMountain.com>
X-Original-To: certid@core3.amsl.com
Delivered-To: certid@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix)
with ESMTP id EA2823A689E for <certid@core3.amsl.com>;
Fri, 5 Nov 2010 09:10:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -99.965
X-Spam-Level:
X-Spam-Status: No, score=-99.965 tagged_above=-999 required=5
tests=[BAYES_00=-2.599, IP_NOT_FRIENDLY=0.334, MANGLED_STOP=2.3,
USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com
[127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cM9aqCJhsuRn for
<certid@core3.amsl.com>; Fri, 5 Nov 2010 09:10:55 -0700 (PDT)
Received: from cpoproxy2-pub.bluehost.com (cpoproxy2-pub.bluehost.com
[67.222.39.38]) by core3.amsl.com (Postfix) with SMTP id 4AA533A6894 for
<certid@ietf.org>; Fri, 5 Nov 2010 09:10:54 -0700 (PDT)
Received: (qmail 12818 invoked by uid 0); 5 Nov 2010 16:11:07 -0000
Received: from unknown (HELO box514.bluehost.com) (74.220.219.114) by
cpoproxy2.bluehost.com with SMTP; 5 Nov 2010 16:11:07 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=kingsmountain.com;
h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding:X-Identified-User;
b=pbfyKHQGr5jAsQsK+3t6BuYuODqcH01D7l/Ki+ZNTWgsxF8L+sMCtgEcqtnSbiqpXgGH8W2oYYdQqBdwBfQy2Yw0vzp3t+jd2ctQaIqL1XKVHwQ1yGDtGKrEbFZKM2Tn;
Received: from outbound4.ebay.com ([216.113.168.128] helo=[10.244.137.163]) by
box514.bluehost.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69)
(envelope-from <Jeff.Hodges@KingsMountain.com>) id 1PEOsc-0002O6-SS for
certid@ietf.org; Fri, 05 Nov 2010 10:11:06 -0600
Message-ID: <4CD42C99.3080505@KingsMountain.com>
Date: Fri, 05 Nov 2010 09:11:05 -0700
From: =JeffH <Jeff.Hodges@KingsMountain.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20100411)
MIME-Version: 1.0
To: IETF cert-based identity <certid@ietf.org>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Identified-User: {11025:box514.bluehost.com:kingsmou:kingsmountain.com}
{sentby:smtp auth 216.113.168.128 authed with jeff.hodges+kingsmountain.com}
Subject: Re: [certid] SSL Labs Survey Data
X-BeenThere: certid@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Representation and verification of identity in certificates
<certid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/certid>,
<mailto:certid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/certid>
List-Post: <mailto:certid@ietf.org>
List-Help: <mailto:certid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/certid>,
<mailto:certid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Nov 2010 16:10:58 -0000
Ivan replied.. > > JeffH wrote: >> >> That explanation hints that most all the certs represented in the dataset >> would be "valid" certs. However, there's ~150k more entries in the dbase >> than the ~720K valid certs he observed. Though, there's ~150k apparently >> "self-signed" certs in the dbase, so perhaps that's what's filling out the >> dbase. > > The term "potentially valid" would be more accurate. The purpose of the > survey was to investigate how is an average SSL server configured and for > that we wanted to look at those servers that someone at least tried to > configure properly. There are so many invalid certificates out there, so > taking the configuration of all SSL servers would pollute the data. > > I defined "potentially valid" as residing on a domain name that matches the > certificate. Trust was not a factor, and that's why there are self-signed > certificates in the database. In addition, there's only one certificate per > domain name and IP address. > > The 720K certificates were obtained from the 119M data set of domain name > registrations. The additional 150K were obtained by looking at the Alexa's > top 1M sites, as well as by data mining web site names from the certificates > we obtained. The fact that there's about 150K self-signed certificates is a > coincidence. That's helpful, thanks. It'd be great if you could include that explanation (and/or post it on the web) along with the info from.. <http://blog.ivanristic.com/2010/07/ssl-server-survey-so-whats-with-the-22m-invalid-certificates-claim.html> <http://blog.ivanristic.com/2010/07/ssl-server-survey-what-data-are-we-collecting.html> ..in a file in the data distro -- it'd help folks to better make use of it. thanks again, having this data available is quite useful. =JeffH ps: also it'd be good to explain things such as the subjectCommonName dbase column comprising all the CN values in found in the Subject and then space-separated-concatenated.