Re: [Syslog] draft-cloud-log-00 / CEE - why not IPFIX?

Jeroen Massar <jeroen@unfix.org> Wed, 16 February 2011 10:55 UTC

Return-Path: <jeroen@unfix.org>
X-Original-To: syslog@core3.amsl.com
Delivered-To: syslog@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 6DEB23A6C9B for <syslog@core3.amsl.com>; Wed, 16 Feb 2011 02:55:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.197
X-Spam-Level:
X-Spam-Status: No, score=-102.197 tagged_above=-999 required=5 tests=[AWL=0.402, BAYES_00=-2.599, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fqD7J8i0msHb for <syslog@core3.amsl.com>; Wed, 16 Feb 2011 02:55:42 -0800 (PST)
Received: from abaddon.unfix.org (abaddon.unfix.org [62.220.146.203]) by core3.amsl.com (Postfix) with ESMTP id 1F9153A6C8F for <syslog@ietf.org>; Wed, 16 Feb 2011 02:55:42 -0800 (PST)
Received: from [IPv6:2001:41e0:ff42:99:222:cfff:fe31:ce41] (spaghetti.ch.unfix.org [IPv6:2001:41e0:ff42:99:222:cfff:fe31:ce41]) (using SSLv3 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: jeroen) by abaddon.unfix.org (Postfix) with ESMTPSA id BB7C921780; Wed, 16 Feb 2011 11:55:42 +0100 (CET)
Message-ID: <4D5BAD69.2060608@unfix.org>
Date: Wed, 16 Feb 2011 11:56:41 +0100
From: Jeroen Massar <jeroen@unfix.org>
Organization: Unfix
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) Gecko/20101207 Thunderbird/3.1.7
MIME-Version: 1.0
To: Rainer Gerhards <rgerhards@hq.adiscon.com>
References: <4D5A60C8.3090000@unfix.org><93ED0A84F9A1D74FA65021D940AA588405446C41F9@IMCMBX3.MITRE.ORG> <4D5BA85B.7040007@unfix.org> <9B6E2A8877C38245BFB15CC491A11DA71DDC71@GRFEXC.intern.adiscon.com>
In-Reply-To: <9B6E2A8877C38245BFB15CC491A11DA71DDC71@GRFEXC.intern.adiscon.com>
X-Enigmail-Version: 1.1.1
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Cc: Sam Johnston <sj@google.com>, cee@mitre.org, syslog@ietf.org
Subject: Re: [Syslog] draft-cloud-log-00 / CEE - why not IPFIX?
X-BeenThere: syslog@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Security Issues in Network Event Logging <syslog.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/syslog>, <mailto:syslog-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/syslog>
List-Post: <mailto:syslog@ietf.org>
List-Help: <mailto:syslog-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/syslog>, <mailto:syslog-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Feb 2011 10:55:43 -0000

On 2011-02-16 11:39, Rainer Gerhards wrote:
> The SIP CLF WG has just recently rejected IPFIX for it being binary and
> chosen indexed ASCII instead for their format. Their reasoning (after a long
> struggle) is probably educating:
> 
> http://www.ietf.org/mail-archive/web/sip-clf/current/msg00364.html
> 
> I don't think that IPFIX is a good solution *in the syslog context*. It is
> very far from what people expect. Other than that, I'd probably need to
> re-iterate the arguments made on the SIP CLF mailing list, so it probably is
> better to refer to their archive ;)

Why would they expect anything about the *DATA* format of a protocol?

Note that the whole point that IPFIX (or any other structured data
format for that matter) 'solves' is that one has to make a parser for
every single log file format out there. Doing this at the meter tends to
be cheaper due to the ability to distribute that than at the aggregated
part. (then again sFlow as an example does it exactly the other way
around, just pushing packets and letting the collector do the hard
parsing part, but we are talking about sampled flows here thus you will
miss out on events which is not a decision you can make at the meter if
you are looking at say breaking attempts or failures ;)

I think the pro-ascii versus binary argument comes effectively primarily
from organizations who process large amounts of variable-string ascii
data already and who do not really care about a few extra bits or a bit
more overhead in processing data as they have large global clusters of
hosts already doing that work. Their programming languages tend to be of
a scripted-style too which tend to make it harder / less efficient to
work on binary data but work great with ascii-alike data.

Nevertheless, I've a generic logline parser which simply converts syslog
and other log file formats into IPFIX. The problem with the whole ascii
thing though is that one has to teach the parser what fields are what,
and in the case of for instance the Apache CLF teach it the weird
delimiters that are present. These are all special cases, something that
one would really like to avoid if one wants to keep it speedy.

My model partially solves that as I only have to do the special casing
at the edge, where the log file gets converted into IPFIX. As those are
considered 'meters' I just deploy more and more of those, while I can
keep the collector side generally either a single box and otherwise
easily distribute the data amongst them.

And of course, the conversion goes the other way too, it can spit out
reformatted 'ascii' again if needed.

Greets,
 Jeroen

 (who finds it funny to see ASCII btw, as there is this thing called
  UTF-8 that makes it possible to express things in all languages of
  the world. I guess those people have to live with punycode etc...)