Re: [sip-ops] [dispatch] SIP-CLF: Results on ASCII vs. binary representation

Adam Roach <adam@nostrum.com> Thu, 30 April 2009 21:44 UTC

Return-Path: <adam@nostrum.com>
X-Original-To: sip-ops@core3.amsl.com
Delivered-To: sip-ops@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id F09493A6B8A; Thu, 30 Apr 2009 14:44:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.274
X-Spam-Level:
X-Spam-Status: No, score=-2.274 tagged_above=-999 required=5 tests=[AWL=-0.274, BAYES_00=-2.599, J_CHICKENPOX_47=0.6, SPF_PASS=-0.001]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8w+V68Ry1mpB; Thu, 30 Apr 2009 14:44:43 -0700 (PDT)
Received: from nostrum.com (nostrum-pt.tunnel.tserv2.fmt.ipv6.he.net [IPv6:2001:470:1f03:267::2]) by core3.amsl.com (Postfix) with ESMTP id D03FC3A6A4E; Thu, 30 Apr 2009 14:44:42 -0700 (PDT)
Received: from [172.16.3.231] (vicuna-alt.estacado.net [75.53.54.121]) (authenticated bits=0) by nostrum.com (8.14.3/8.14.3) with ESMTP id n3ULk09e045505 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 30 Apr 2009 16:46:00 -0500 (CDT) (envelope-from adam@nostrum.com)
Message-ID: <49FA1C18.5020702@nostrum.com>
Date: Thu, 30 Apr 2009 16:46:00 -0500
From: Adam Roach <adam@nostrum.com>
User-Agent: Postbox 1.0b11 (Macintosh/2009041623)
MIME-Version: 1.0
To: Simon Perreault <simon.perreault@viagenie.ca>
References: <49F864E8.20005@alcatel-lucent.com> <49FA100B.2050105@viagenie.ca>
In-Reply-To: <49FA100B.2050105@viagenie.ca>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Received-SPF: pass (nostrum.com: 75.53.54.121 is authenticated by a trusted mechanism)
Cc: sip-ops@ietf.org, dispatch@ietf.org
Subject: Re: [sip-ops] [dispatch] SIP-CLF: Results on ASCII vs. binary representation
X-BeenThere: sip-ops@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: SIP Operations <sip-ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/sip-ops>, <mailto:sip-ops-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-ops>
List-Post: <mailto:sip-ops@ietf.org>
List-Help: <mailto:sip-ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-ops>, <mailto:sip-ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 30 Apr 2009 21:44:44 -0000

Simon Perreault wrote:

> And I'm surprised nobody mentioned already that C would be MUCH faster than Perl.
>
> I think that speed is not an issue in text vs binary.

Okay, so maybe it's not really fair to do the binary decoder in C and 
the text decoder in perl. Let's do an apples-to-apples comparison.

As we've already established, Vijay's C program (with my optimizations) 
takes 6.17 seconds to find the Call-ID in a binary file on my system.

His perl script takes 20.154 seconds on my system (with your 
optimizations, plus a 'o' flag on the regex to avoid repeated 
recompilations) to find it in a text log file. This will grow in 
complexity (and, presumably, time) once we add an escaping mechanism to 
the log format, but I'm happy to ignore that for now.

The perl script below takes 2.359 seconds on my system to find the 
Call-ID of interest. That's... ummm... better than twice as fast as the 
C program, and approaching 10 times faster than the perl script for the 
text format.

So, for some reason, I *do* think speed is an issue in this particular 
discussion.


#!/usr/bin/perl
$LOGFILE = "sip-clf.bin";
open(LOGFILE) or die("Could not open log file.");
$search = shift || die "Usage: $0 [call-id]\n";

while (read(LOGFILE, $buffer, 4)>0)
{
   $rec_len = unpack('N',$buffer) & 0x7FFF;
   read(LOGFILE,$buffer,$rec_len-4) || die $!;
   ($cid_ptr,$cid_len) = unpack ('x48n2',$buffer);
   $cid = substr($buffer, $cid_ptr-4, $cid_len);
   if ($cid eq $search)
   {
     print "Call-ID: $cid *** FOUND!!\n";
     exit;
   }
}

(For what it's worth, I'm using perl 5.8.8)

/a