Re: [sip-clf] WGLC: SIPCLF Problem Statement(draft-gurbani-sipclf-problem-statement-01)

"Vijay K. Gurbani" <vkg@alcatel-lucent.com> Tue, 02 February 2010 20:58 UTC

Return-Path: <vkg@alcatel-lucent.com>
X-Original-To: sip-clf@core3.amsl.com
Delivered-To: sip-clf@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 38E403A67A6 for <sip-clf@core3.amsl.com>; Tue, 2 Feb 2010 12:58:16 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.539
X-Spam-Level:
X-Spam-Status: No, score=-2.539 tagged_above=-999 required=5 tests=[AWL=0.060, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rEE22XmQhduu for <sip-clf@core3.amsl.com>; Tue, 2 Feb 2010 12:58:14 -0800 (PST)
Received: from ihemail3.lucent.com (ihemail3.lucent.com [135.245.0.37]) by core3.amsl.com (Postfix) with ESMTP id A59293A6984 for <sip-clf@ietf.org>; Tue, 2 Feb 2010 12:58:14 -0800 (PST)
Received: from umail.lucent.com (h135-3-40-63.lucent.com [135.3.40.63]) by ihemail3.lucent.com (8.13.8/IER-o) with ESMTP id o12KwrNW003186 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 2 Feb 2010 14:58:53 -0600 (CST)
Received: from [135.185.236.17] (il0015vkg1.ih.lucent.com [135.185.236.17]) by umail.lucent.com (8.13.8/TPES) with ESMTP id o12KwrLc007595; Tue, 2 Feb 2010 14:58:53 -0600 (CST)
Message-ID: <4B68920B.5090908@alcatel-lucent.com>
Date: Tue, 02 Feb 2010 14:58:51 -0600
From: "Vijay K. Gurbani" <vkg@alcatel-lucent.com>
Organization: Bell Labs Security Technology Research Group
User-Agent: Thunderbird 2.0.0.6 (Windows/20070728)
MIME-Version: 1.0
To: David Harrington <ietfdbh@comcast.net>, "sip-clf@ietf.org" <sip-clf@ietf.org>
References: <7505A2C58D8F4FD88B47D10EA74649CD@china.huawei.com> <00ce01caa41e$fe5a5ef0$0600a8c0@china.huawei.com>
In-Reply-To: <00ce01caa41e$fe5a5ef0$0600a8c0@china.huawei.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Scanned-By: MIMEDefang 2.57 on 135.245.2.37
Subject: Re: [sip-clf] WGLC: SIPCLF Problem Statement(draft-gurbani-sipclf-problem-statement-01)
X-BeenThere: sip-clf@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: SIP Common Log File format discussion list <sip-clf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-clf>
List-Post: <mailto:sip-clf@ietf.org>
List-Help: <mailto:sip-clf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Feb 2010 20:58:16 -0000

David Harrington wrote:
> I think the problem statement should be adopted as a WG item. I agree
> with Cullen that it is a good starting place for a WG problem 
> statement doc.

David: Thank you for your comments.

> I definitely do NOT agree this is ready for WGLC.

IMHO, the differences are not that great to hold up WGLC.
More inline -- let's see if we can iron them out.

> There are multiple use cases that lead to different requirements.
> Some people want to correlate flows of conversation; others only want
> to pay attention to the data dumped by one server. I think there are
> lots of conflicting requirements.

At the highest level there are two requirements:

1) Correlate transaction information produced by a single
  SIP entity.

  The correlation here is to figure out which responses
  correspond to which requests, and furthermore enunciate
  the number of branches that an incoming INVITE was
  forked to.

  This correlation will also allow one to recreate SIP dialog
  information (for those in the list not steeped in SIP lore,
  a SIP dialog is an end-to-end relationship between two peers.
  There are certain headers in the SIP field that constitute
  to create a dialog identifier.)

  This sort of correlation is what I had in mind when I initially
  started the work.  This correlation is done entirely on the
  CLF file produced by one SIP entity.

2) Correlate SIP calls across multiple SIP service providers.

  As the work progressed and was socialized, some list
  participants felt that it will be advantageous to trace a
  SIP call end-to-end, i.e., across different SIP domains.

  Because we are concerned with a SIP call across different
  operating domains, this type of correlation requires some
  more work, specifically a unique session ID that survives
  in signaling as the session request leaves on operating
  domain and enters another.

  Furthermore, this type of correlation cannot be done by using
  only the CLF file produced by one entity since that entity
  would only have information about the session in its own domain.
  One would need to correlate the logs of one domain with the
  other.  Clearly, such type of correlation is best treated
  by implicitly supporting it but explicitly leaving the exact
  mechanism on how to do this outside of the draft.

  Thus, coming up with a unique session identifier is out of scope
  for SIPCLF working group.  However, using such an identifier,
  if one exists now or later, is perfectly within scope and can
  be supported by keeping the logging format such that other
  fields can be added in later.

In either of these cases, I do not think it is fair to say
that use cases lead to different requirements.  As far as I
can see, requirements for both use cases are supported.

> This problem statement document lacks any analysis of existing 
> protocols that might address the different problems people seek to
> address.

I strongly disagree.  Version -00 had a section entitled
"Relationship to other protocols" (c.f., Section 7,
http://tools.ietf.org/html/draft-gurbani-sipclf-problem-statement-00#section-7).

In that we discuss syslog, IDMEF, IPFIX and PCAP.  In
Hiroshima during the meeting we decided that syslog was
not a good fit (simply because of the number of messages
that would need to be logged.)  I believe that syslog is
a good solution if one wants to enunciate messages based
on severity, etc.  In SIP CLF, the intent is to log the
summary of *every* SIP message.  A SIP server can still
use syslog to send out critical messages for the NOC, but
using it as a substitute for CLF is probably overkill.

No one spoke up for IDMEF, and the proponent of PCAP
decided not to pursue it.  That left IPFIX, which has a
dedicated block of people looking at it.  Thus in version
-01, I did not put in a section analyzing other protocols
since it appeared to me that we had reached a decision on
what to do with syslog, IDMEF and IPFIX.

> Everything is just munged together into one problem statement, as if 
> everybody was in agreement about a single problem to be solved, and 
> the single solution that should solve all those problems.

My attempt at a problem statement is given in Section 3 of
the sipclf draft 
(http://tools.ietf.org/html/draft-gurbani-sipclf-problem-statement-01).
If this gives you the feeling that everything is munged together,
I will be happy to work with you on any text you can contribute
that makes this more transparent.

> I am very concerned about developing a logging standard for one
> single protocol. Following this approach could easily lead to
> standards for a MPLS CLF and a PCE CLF and a XYZ-protocol CLF.

MPLS and PCE are lower-level routing protocols.  SIP is an
application layer protocol.  I am unsure whether we can use the
same hammer across the entire IP stack.  The logging behavior
and requirements for MPLS and PCE are probably much different
than SIP.

> I disagree with the following analysis in the document: It can be
> argued that a good part of the success of Apache has been its CLF
> because it allowed third parties to produce tools that analyzed the
> data and generated traffic reports and trends.  The Apache CLF has
> been so successful that not only did it become the de-facto standard
> in producing logging data for web servers, but also many commercial 
> web servers can be configured to produce logs in this format.
[...]
> I think claiming Apache was successful because it supported a common 
> log format would be akin to claiming that UNIX was successful because
> it supported a common logging format (syslog).

Fair enough.  Please allow me to refine my statement: I do
not claim that Apache was successful because of CLF, but rather
that the CLF model espoused by Apache has been a successful one --
one that should be emulated not shunned.

> This WG needs to really understand the **multiple** problems that 
> people are trying to solve, and the different requirements these 
> different use cases place on the solution space.

I believe that Section 3 of the SIPCLF draft lays out the
problem statement, and I don't think there are multiple
problems being enunciated there.  Again, I am happy to refine
it based on your input.  However, I would request that you
sent me specific text that we can argue and wordsmith.

> The abstract talks about problems such as mining the log files to
> produce reports and trends, training anomaly detection systems and
> feeding events into a security event management system.

But those are not problems.  They are the benefits of having a
CLF.

> I think the section on "Alternative approaches to SIP CLF" is
> woefully inadquate.

During the Hiroshima IETF, we spent a lot of time talking about
CDR and how we should distinguish this work from CDR.  Specifically,
why existing solutions like CDR and Wireshark are inadequate.
That is exactly what the section now contains.  If there are
other reasons why you feel this section is inadequate, please
let me know.

> Existing IETF standards, such as syslog, ipfix, and SNMP are already
> being used to address some of the problems WG contributors have
> stated they want to address. [...] Yet this document does not even
> **mention** these existing IETF standards as possible alternative
> approaches.

I respectfully disagree.  We ruled out syslog and IDMEF after
version -00 and during the Hiroshima meeting.  An IPFIX-based
study is under way.  The current draft (-01) contains the
two remaining (quasi)-standards we talked about in Hiroshima:
CDRs and Wireshark format.  I think we have looked at both IETF
and non-IETF protocols.

Thanks,

- vijay
-- 
Vijay K. Gurbani, Bell Laboratories, Alcatel-Lucent
1960 Lucent Lane, Rm. 9C-533, Naperville, Illinois 60566 (USA)
Email: vkg@{alcatel-lucent.com,bell-labs.com,acm.org}
Web:   http://ect.bell-labs.com/who/vkg/