Re: [sip-ops] [dispatch] SIP-CLF: Extensibility considerations (was Results on ASCII vs. binary representation)

"Vijay K. Gurbani" <> Thu, 30 April 2009 21:11 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 336FC3A6870; Thu, 30 Apr 2009 14:11:40 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.524
X-Spam-Status: No, score=-2.524 tagged_above=-999 required=5 tests=[AWL=0.075, BAYES_00=-2.599]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id s+bcEAFdDNZV; Thu, 30 Apr 2009 14:11:39 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id F3AD13A6D53; Thu, 30 Apr 2009 14:10:53 -0700 (PDT)
Received: from ( []) by (8.13.8/IER-o) with ESMTP id n3ULCEBB015556 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 30 Apr 2009 16:12:14 -0500 (CDT)
Received: from [] ( []) by (8.13.8/TPES) with ESMTP id n3ULCER0028172; Thu, 30 Apr 2009 16:12:14 -0500 (CDT)
Message-ID: <>
Date: Thu, 30 Apr 2009 16:12:14 -0500
From: "Vijay K. Gurbani" <>
Organization: Bell Labs Security Technology Research Group
User-Agent: Thunderbird (Windows/20070728)
MIME-Version: 1.0
To: Adam Roach <>
References: <>
In-Reply-To: <>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.57 on
Cc: "" <>, "" <>
Subject: Re: [sip-ops] [dispatch] SIP-CLF: Extensibility considerations (was Results on ASCII vs. binary representation)
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: SIP Operations <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 30 Apr 2009 21:11:40 -0000

Adam Roach wrote:
> Even knowing the list of purposes, if we go with a text format
> similar to what is proposed, we are going to be forced to nail down
> the complete set of log record fields now, with little hope for
> backwards-compatible extensibility in the future. Admittedly, we
> *could* go to a tagged text format (e.g. where the fields are
> explicitly labeled instead of being inferred by position) to address
> this shortcoming, but that's not what's being proposed at the moment.

The tagged format will add further latency to an ASCII format,
so I did not include it.  In the best case, I am looking for an
ASCII format that is amenable to taking a line and using a
regexp to break it down to its constituent fields.

> So, if we go down the path proposed in draft-gurbani-..., we're
> strapped with coming up with the perfect set of future-proof fields
> that encompasses everything anyone will ever need in the log file,
> while (at the same time) not including extraneous information that
> people don't want to bother generating and/or parsing.

I don't think we are constrained to a perfect set of future-proof
fields.  Rather, I think that there should be a list of fields
that are mandatory -- and these fields should be the ones that
allow for dialog and transaction identification and correlating
the various forked branches to a parent transaction, etc.

Regarding people don't wanting to get bothered by generating
and/or parsing extra fields, even in the currently
defined ASCII format, there is a delimiter after which the
rest of the fields are optional.  An off-the-shelf open source
CLF parser can parse all the mandatory ones and can just
disregard the optional ones; no harm done.  Of course, since
it will be written most probably in an interpreted language
of some sort (and I am thinking of perl/python here), it could
be extended to more easily to parse these fields.

Remember that we are not the consumers of the log file, rather
it is the people who will be feeding SIP servers.  And given
that constituency, I think they'd rather prefer to write tools
that operate on ASCII.

> The binary format I've proposed allows for exclusion of information
> the logging node doesn't consider relevant, as well as inclusion of 
> information that we don't define at this time. For me, that's almost
> as big a win as the efficiency in searching a file for records of
> interest.

Given the updated results and all, I will stop arguing on the
grounds of efficiency.  Reading binary CLF was always more
efficient, and so is producing the binary CLF.  But if the
only CLF format we define is a binary CLF, then I want to be
clear that we understand that we are making a tacit decision
to force all implementations to deal with digesting their SIP
message in this new format for logging purpose.  There will be
some SIP servers that already digest the incoming SIP message
and turn it into binary (complete with a ToC) for efficient
memory copying, etc.  To these servers, producing a binary CLF
will be a low impact activity.  But, there are and will be
SIP servers that do not carry an internal binary representation
of the SIP message.  We will, in essence, force these servers
to do so just to produce binary CLF.  And that is a big tradeoff.

A binary CLF can always be produced from an ASCII one using
offline transformations.  It is just that producing an ASCII
CLF is low-impact since the messages that enter and exit
a SIP server are ASCII to begin with.

Before going down the path of mandating a binary-only option,
I would at the very least like us to understand the tradeoffs
of the decision and keep in mind who the ultimate consumers
of the log file are.


- vijay
Vijay K. Gurbani, Bell Laboratories, Alcatel-Lucent
1960 Lucent Lane, Rm. 9C-533, Naperville, Illinois 60566 (USA)
Email: vkg@{,,}