Re: [sip-ops] [dispatch] SIP-CLF: Results on ASCII vs. binary representation

"Vijay K. Gurbani" <> Wed, 29 April 2009 18:11 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id AA6E43A67ED; Wed, 29 Apr 2009 11:11:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.514
X-Spam-Status: No, score=-2.514 tagged_above=-999 required=5 tests=[AWL=0.085, BAYES_00=-2.599]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id j2JLLYdef1XX; Wed, 29 Apr 2009 11:11:18 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 7ABEA3A6D5C; Wed, 29 Apr 2009 11:11:13 -0700 (PDT)
Received: from ( []) by (8.13.8/IER-o) with ESMTP id n3TICTkW011582 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Wed, 29 Apr 2009 13:12:29 -0500 (CDT)
Received: from [] ( []) by (8.13.8/TPES) with ESMTP id n3TICT2e010603; Wed, 29 Apr 2009 13:12:29 -0500 (CDT)
Message-ID: <>
Date: Wed, 29 Apr 2009 13:12:28 -0500
From: "Vijay K. Gurbani" <>
Organization: Bell Labs Security Technology Research Group
User-Agent: Thunderbird (Windows/20070728)
MIME-Version: 1.0
To: Theo Zourzouvillys <>
References: <> <>
In-Reply-To: <>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.57 on
Cc:,, sipping WG <>
Subject: Re: [sip-ops] [dispatch] SIP-CLF: Results on ASCII vs. binary representation
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: SIP Operations <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 29 Apr 2009 18:11:19 -0000

Theo Zourzouvillys wrote:
> actually, your test program is *grossly* skewed in favour of the ASCII
> implementation.  If you modify it slightly to behave in a way i'd
> expect any developer to, you get (avg 5 runs on a crappy dell vostro
> desktop):
>  Binary CLF:   0m6.947s
>  ASCII CLF:    0m7.004s

Theo: True, using scatter-gather (s-g) writes you can optimize the
binary I/O.

By the same token, I can optimize the ASCII writes a bit using
s-g writes; for instance, I was able to bring the average down
by 1.02s for ASCII CLF using s-g writes.

But we intentionally stayed away from s-g writes for the
following four reasons:

1) On some systems, the value of IOV_MAX is set to a low number.
   For example, in Solaris 8 the value of IOV_MAX is set to 16,
   forcing you to do multiple s-g I/O calls (thereby negating
   some of the optimization effects.)

2) Some systems have a maximum ceiling on how many bytes can
   be transferred in one writev(), i.e., the sum of all iov_len
   members of the iov array should be less than a certain
   system-defined maximum.

     [Note: I seem to remember that a few years ago, the
     Apache lists were full of problems related to 1 and 2.]

3) Portability: I was not sure how portable the writev() system
   call would be on all kinds of operating environments.  Since
   the SIP CLF is designed for all SIP entities, if a (real-time)
   operating system does not have the writev() system call, one
   is forced to used the non-optimized method anyway.  Furthermore,
   in a RT OS, the limits for 1 and 2 will be much lower, if
   writev() is provided at all.

4) We did not necessarily want to make any assumptions about
   how implementations have created data structures to hold
   the results of parsing (i.e., some implementations may very
   well use struct's to store the text and length for each
   SIP token, while others may simply store the text and
   compute the length when needed, etc.)  For the sake of
   demonstration, our program implements the first option
   (i.e., uses struct's to store the text and length), which
   is actually more conducive to the s-g approach, but by no
   means is this the only way to design your data structures.

(1) is a real concern because as you can well imagine that URIs,
once parsed, can be composed of many different objects (or
structs in C.)  As such, the representation of a composed URI
in a iov structure will require multiple indexes.

Hence, we wanted to use the most common denominator to do the
measurements -- in our initial performance data, there is no
optimization for either the binary CLF case or the ASCII CLF case.

> note that i wrote it in all of about 120 seconds, so there may be some
> errors in the output format, but my point stands :-)

There appear to be since I cannot read the last record; but I
have not had the chance to look at the output format from your
program in any detail.


- vijay
Vijay K. Gurbani, Bell Laboratories, Alcatel-Lucent
1960 Lucent Lane, Rm. 9C-533, Naperville, Illinois 60566 (USA)
Email: vkg@{,,}