Re: [certid] weird CN-IDs (subjectCommonName) in SSL Labs Survey Data

=JeffH <Jeff.Hodges@KingsMountain.com> Tue, 19 October 2010 17:05 UTC

Return-Path: <Jeff.Hodges@KingsMountain.com>
X-Original-To: certid@core3.amsl.com
Delivered-To: certid@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 3F6AF3A68CF for <certid@core3.amsl.com>; Tue, 19 Oct 2010 10:05:38 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.097
X-Spam-Level:
X-Spam-Status: No, score=-103.097 tagged_above=-999 required=5 tests=[AWL=1.168, BAYES_00=-2.599, GB_I_LETTER=-2, IP_NOT_FRIENDLY=0.334, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AdYaoA3fV+of for <certid@core3.amsl.com>; Tue, 19 Oct 2010 10:05:37 -0700 (PDT)
Received: from cpoproxy2-pub.bluehost.com (cpoproxy2-pub.bluehost.com [67.222.39.38]) by core3.amsl.com (Postfix) with SMTP id EBB803A68C2 for <certid@ietf.org>; Tue, 19 Oct 2010 10:05:36 -0700 (PDT)
Received: (qmail 8510 invoked by uid 0); 19 Oct 2010 17:07:08 -0000
Received: from unknown (HELO box514.bluehost.com) (74.220.219.114) by cpoproxy2.bluehost.com with SMTP; 19 Oct 2010 17:07:08 -0000
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=kingsmountain.com; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding:X-Identified-User; b=tdTDDHvcIQESmo3gG6J1IX+62tE2u7OxiKQbdu4wIfK5pUXQ3hqselectPyrnzTP0eL6Tifdz41DGuRPAhQgZCuGDZOLdMRsbb3CNw8DkcISwTuOqspKYcNoOp/0YDGh;
Received: from outbound4.ebay.com ([216.113.168.128] helo=[10.244.136.179]) by box514.bluehost.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from <Jeff.Hodges@KingsMountain.com>) id 1P8FeV-0005WX-KS; Tue, 19 Oct 2010 11:07:07 -0600
Message-ID: <4CBDD03A.4060406@KingsMountain.com>
Date: Tue, 19 Oct 2010 10:07:06 -0700
From: =JeffH <Jeff.Hodges@KingsMountain.com>
User-Agent: Thunderbird 2.0.0.24 (X11/20100411)
MIME-Version: 1.0
To: IETF cert-based identity <certid@ietf.org>, Peter Gutmann <pgut001@cs.auckland.ac.nz>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Identified-User: {11025:box514.bluehost.com:kingsmou:kingsmountain.com} {sentby:smtp auth 216.113.168.128 authed with jeff.hodges+kingsmountain.com}
Subject: Re: [certid] weird CN-IDs (subjectCommonName) in SSL Labs Survey Data
X-BeenThere: certid@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Representation and verification of identity in certificates <certid.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/certid>, <mailto:certid-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/certid>
List-Post: <mailto:certid@ietf.org>
List-Help: <mailto:certid-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/certid>, <mailto:certid-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Oct 2010 17:05:38 -0000

 > There sure are some bizarre things in CNs... is there any chance you could
 > implement what Marcus Ranum calls "artificial stupidity" in your anomaly-
 > detection, create a filter that accepts all standard DN component styles and
 > then kick out certs that don't pass the filter?

that's essentially what I did yesterday tho I refined the regex a bit this 
morning. Here's the (perl) regex..

   /[Cc][Nn]=(?!([\-\w\*]+\.)+[\-\w]+)/

..which appears to also work fine in NEdit. What it does (my intent anyway, I 
am not an awesome regex master) is recognize any CN-ID pattern that is /not/ 
syntatically a dot-separated LDH (letter digit hyphen) DNS domain name.

Employed thus..

while (<>) {
   if ( /[Cc][Nn]=(?!([\-\w\*]+\.)+[\-\w]+)/ ) {
     print "\n"; print;
   };
};

I think it does what you suggest below. What I fed into the above loop is a txt 
file output by querying the raw database table for all contacted domains and 
returned cert subject values (one pair per line, would of course work on a file 
with just subjects in it).

 >  In other words instead of
 > trying to create a regex to detect all the bizarre things that turn up in
 > there, create one to pass normal DNs and treat everything that doesn't pass as
 > an anomaly?

yes, done.

So there appear to (actually) be 433 (I'd mis-counted yesterday (haste makes 
waste)) such subject DNs (622 is an incorrect count AFAICT today).

Note that these subject DNs may /also/ contain syntactically correct CN-ID 
values (many do, some do not).


 > It'd be interesting to see what sort of stuff is floating around
 > out there...

of course. Note that the data is nominally available upon request as Ivan has 
blogged, which I posted here earlier here..

[certid] fyi: Ivan Ristic / Qualsys SSL Labs release raw data from the Internet 
SSL survey
http://www.ietf.org/mail-archive/web/certid/current/msg00484.html

There's a shrink-wrap EULA that Ivan wants folks to agree to before obtaining 
the data and I haven't groveled thru it enough to tell whether it's legit to 
just post all 433 results (or any other full result set) to the a public list & 
archive such as this.

Also note that he/qualsys may re-run the survey periodically. This is also the 
intention of the EFF folks and their "TLS/SSL observatory" 
<https://www.eff.org/observatory> (tho they have yet to publicly release their 
data). (note also that their survey data collection methodologies differ -- EFF 
folks say they "NMAPed Internet for hosts listening on tcp 443", where Ivan 
queried publicly registered domain names in a (large) subset of all TLDs)

HTH,

=JeffH