Re: [sip-ops] [dispatch] [Sipping] SIP-CLF: Results on ASCII vs. binaryrepresentation

"Mary Barnes" <mary.barnes@nortel.com> Wed, 29 April 2009 19:41 UTC

Return-Path: <mary.barnes@nortel.com>
X-Original-To: sip-ops@core3.amsl.com
Delivered-To: sip-ops@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id E4F1C3A6DB7; Wed, 29 Apr 2009 12:41:17 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.475
X-Spam-Level:
X-Spam-Status: No, score=-6.475 tagged_above=-999 required=5 tests=[AWL=0.124, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Iqh6Ht3770MK; Wed, 29 Apr 2009 12:41:16 -0700 (PDT)
Received: from zrtps0kp.nortel.com (zrtps0kp.nortel.com [47.140.192.56]) by core3.amsl.com (Postfix) with ESMTP id 8BC6A3A6A4D; Wed, 29 Apr 2009 12:40:39 -0700 (PDT)
Received: from zrc2hxm0.corp.nortel.com (zrc2hxm0.corp.nortel.com [47.103.123.71]) by zrtps0kp.nortel.com (Switch-2.2.6/Switch-2.2.0) with ESMTP id n3TJfw403677; Wed, 29 Apr 2009 19:41:58 GMT
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Wed, 29 Apr 2009 14:44:43 -0500
Message-ID: <1ECE0EB50388174790F9694F77522CCF1DBA1142@zrc2hxm0.corp.nortel.com>
In-Reply-To: <0D5F89FAC29E2C41B98A6A762007F5D001D4B064@GBNTHT12009MSX.gb002.siemens.net>
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Thread-Topic: [dispatch] [Sipping] SIP-CLF: Results on ASCII vs. binaryrepresentation
Thread-Index: AcnI9br/nWx/gqT5RQOLEzQ9F4PdmgACpchgAACaquA=
References: <49F864E8.20005@alcatel-lucent.com> <0D5F89FAC29E2C41B98A6A762007F5D001D4B064@GBNTHT12009MSX.gb002.siemens.net>
From: "Mary Barnes" <mary.barnes@nortel.com>
To: "Elwell, John" <john.elwell@siemens-enterprise.com>, "Vijay K. Gurbani" <vkg@alcatel-lucent.com>, <sip-ops@ietf.org>, <dispatch@ietf.org>
Subject: Re: [sip-ops] [dispatch] [Sipping] SIP-CLF: Results on ASCII vs. binaryrepresentation
X-BeenThere: sip-ops@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: SIP Operations <sip-ops.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/sip-ops>, <mailto:sip-ops-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-ops>
List-Post: <mailto:sip-ops@ietf.org>
List-Help: <mailto:sip-ops-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-ops>, <mailto:sip-ops-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 29 Apr 2009 19:41:18 -0000

I posted a note on SIPPING WG mailing list suggesting that this
discussion should occur on DISPATCH WG mailing list - sorry I should
have cross-posted that one posting.  

Mary. 

-----Original Message-----
From: dispatch-bounces@ietf.org [mailto:dispatch-bounces@ietf.org] On
Behalf Of Elwell, John
Sent: Wednesday, April 29, 2009 2:35 PM
To: Vijay K. Gurbani; sip-ops@ietf.org; dispatch@ietf.org
Subject: Re: [dispatch] [Sipping] SIP-CLF: Results on ASCII vs.
binaryrepresentation

Vijay,

Useful figures, which do indeed suggest generating the initial log in
ASCII. The question is whether we need to standardise a binary format
too, or whether any off-line conversion for the purpose of optimising
searches can be done in a non-standardised manner. In other words, if it
is acceptable always to pass around the log in ASCII format, any system
using it can perform its own optimisation to suit its own purposes.

John

p.s. which list - DISPATCH or SIP-OPS?
 

> -----Original Message-----
> From: sipping-bounces@ietf.org
> [mailto:sipping-bounces@ietf.org] On Behalf Of Vijay K. Gurbani
> Sent: 29 April 2009 15:32
> To: sip-ops@ietf.org; dispatch@ietf.org
> Subject: [Sipping] SIP-CLF: Results on ASCII vs. binary representation
> 
> [Bcc'd to sipping]
> 
> Hello:
> 
> During the SF IETF, the SIP CLF work [2] garnered support and 
> attention; the minutes of the ad-hoc are archived in [1].
> 
> While there was near universal support for having a common log format,

> there was a lot of discussion about whether the format should be text 
> or binary, the argument for binary being that it should be much faster

> to search.  An option for text generation is in [2] and an option for 
> binary generation is in [3].
> 
> We realized the question is not "binary vs. text?" but "should we 
> optimize for log generation vs. optimize for log processing?"  To that

> extent, this email is to socialize the performance results we have 
> obtained for generating both binary and ASCII formats, including a 
> simulation of a worst-case analysis by retrieving the last record from

> large binary and ASCII files.
> 
> To get these results, we generated 1 Million SIP CLF entries into an 
> ASCII file and the same 1 Million into a binary file.
> The ASCII file followed the convention of [2] and the binary file that

> of [3].  The last entry in these files was a SIP request with a 
> special Call-ID.  We measured the time it took to search for the 
> special Call-ID in both the binary and ASCII files.  Here are the 
> results, followed by some discussion; the source code to the programs 
> that generated these results is also available (see [4].)
> 
> Total records in binary and ASCII CLF file: 1 Million File size:
>    Binary: 300,999,984 bytes
>    ASCII:  258,999,984 bytes
> 
> Time taken to generate the CLF file with 1 Million records:
>    Binary CLF: 138.60s
>    ASCII CLF:    7.26s
> 
>    This is a difference of almost 20x in favor of the ASCII CLF.
> 
> Time taken to seek to the last record of the CLF file:
>    Binary CLF:   3.08s
>    ASCII CLF:   16.55s (using perl v5.6.1)
>                 42.92s (using perl v.5.8 and v5.10)
> 
>    The ASCII CLF seek is five times slower using perl v5.6.1,
>    and 13x slower using perl v5.8 and v5.10.  It looks like later
>    versions of perl may have inadvertently made the regex compiler
>    less optimized.  We don't know why.
> 
>    The above data is from experiments ran on an Intel dual-core
>    (T2500 @ 2.00 GHz) IBM T60 laptop running Linux 2.6.27 with 1
>    GByte of memory.
>    We also ran the programs on a more powerful machine: Intel
>    dual-core (X6800 @ 2.93GHz) machines with 8GB RAM and a
>    Linux 2.6.24 kernel.  The results scaled accordingly.
> 
> Clearly, the biggest difference in the above data is the time taken to

> produce the CLF file.  ASCII is a lightweight approach since the SIP 
> entity producing the ASCII CLF file already has the SIP message in 
> text form.  It is then just a matter of writing the fields out on 
> disk.  With the binary form, the SIP entity producing the binary CLF 
> file has to calculate offsets, which takes a non-negligible amount of 
> time.  Since the entity producing the SIP CLF log file should not be 
> over- burdened with the act of producing it, we feel that ASCII CLF 
> generation is the only choice here (i.e., we should optimize for log 
> generation.)  Otherwise, the SIP entity producing the binary CLF file 
> will spend an inordinate time in calculating offsets, creating a table

> of contents, etc. to the detriment of providing the service it is 
> supposed to.
> 
> That said, it is also clear that the the worst-case search for a 
> record is at five to 13x slower when using ASCII.  But, because 
> searching is done offline, we feel that this sub-optimality can well 
> be tolerated.  We also feel that there is value in specifying a binary

> format because it allows for SIP operators who want to do such 
> searches to convert their ASCII files to binary for optimized 
> traversal and other such uses.  A binary format  must be defined so 
> that offline processes can convert the captured ASCII data to binary 
> format for optimized traversal.
> 
> Comments and discussions on these results are welcome.  If you find 
> any errors in the programs used to generate these results, please do 
> let us know.
> 
> [1] Thread "[Sipping] Meeting Minutes: Ad-hoc Common Log Format
>   meeting," IETF SIPPING WG, March 27, 2009.  Archived at:
>   http://www.ietf.org/mail-archive/web/sipping/current/msg17199.html
> 
> [2] V. Gurbani, E. Burger, T. Anjali, H. Abdelnur and O. Festor,
>   "The Common Log File (CLF) format for the Session Initiation
>   Protocol (SIP)," IETF Internet-Draft, work in progress, March 9,
>   2009.  Archived at:
>   http://tools.ietf.org/html/draft-gurbani-sipping-clf-01
> 
> [3] A. Roach, "Binary Syntax for SIP Common Log Format," IETF
>   Internet-Draft, work in progress, March 25, 2009.  Archived at:
>   http://tools.ietf.org/html/draft-roach-sipping-clf-syntax-00.
> 
> [4] Source code available at the following URLs; please see
>   comment block in clf-write.c on how to generate ASCII and
>   binary CLF files.
>   http://ect.bell-labs.com/who/vkg/IETF/sip-clf/write-clf.c
>   http://ect.bell-labs.com/who/vkg/IETF/sip-clf/clf.h
>   http://ect.bell-labs.com/who/vkg/IETF/sip-clf/read-clf-record.c
>   http://ect.bell-labs.com/who/vkg/IETF/sip-clf/read-clf-record.pl
>   http://ect.bell-labs.com/who/vkg/IETF/sip-clf/Makefile
> 
> Thanks,
> 
> - vijay
> --
> Vijay K. Gurbani, Bell Laboratories, Alcatel-Lucent 1960 Lucent Lane, 
> Rm. 9C-533, Naperville, Illinois 60566 (USA)
> Email: vkg@{alcatel-lucent.com,bell-labs.com,acm.org}
> Web:   http://ect.bell-labs.com/who/vkg/
> _______________________________________________
> Sipping mailing list  https://www.ietf.org/mailman/listinfo/sipping
> This list is for NEW development of the application of SIP Use 
> sip-implementors@cs.columbia.edu for questions on current sip Use 
> sip@ietf.org for new developments of core SIP
> 
_______________________________________________
dispatch mailing list
dispatch@ietf.org
https://www.ietf.org/mailman/listinfo/dispatch