[sip-clf] A syslog approach to sip logging

"David B Harrington" <dbharrington@comcast.net> Tue, 02 February 2010 18:52 UTC

Return-Path: <dbharrington@comcast.net>
X-Original-To: sip-clf@core3.amsl.com
Delivered-To: sip-clf@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id AC3B23A697B for <sip-clf@core3.amsl.com>; Tue, 2 Feb 2010 10:52:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RPHAUOIVEYb5 for <sip-clf@core3.amsl.com>; Tue, 2 Feb 2010 10:52:25 -0800 (PST)
Received: from qmta04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by core3.amsl.com (Postfix) with ESMTP id 3FA913A6977 for <sip-clf@ietf.org>; Tue, 2 Feb 2010 10:52:25 -0800 (PST)
Received: from omta07.westchester.pa.mail.comcast.net ([76.96.62.59]) by qmta04.westchester.pa.mail.comcast.net with comcast id d5t11d0011GhbT8546t2dr; Tue, 02 Feb 2010 18:53:02 +0000
Received: from Harrington73653 ([24.147.240.98]) by omta07.westchester.pa.mail.comcast.net with comcast id d6t11d00g284sdk3T6t2hF; Tue, 02 Feb 2010 18:53:02 +0000
From: David B Harrington <dbharrington@comcast.net>
To: 'SIP-CLF Mailing List' <sip-clf@ietf.org>
Date: Tue, 02 Feb 2010 13:53:00 -0500
Message-ID: <013201caa438$f19aac50$0600a8c0@china.huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 11
Thread-Index: AcqZH0TAsDVTbi8JRpeDUjYC7HRWbAAAHY7QArr3O2A=
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
Subject: [sip-clf] A syslog approach to sip logging
X-BeenThere: sip-clf@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: SIP Common Log File format discussion list <sip-clf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-clf>
List-Post: <mailto:sip-clf@ietf.org>
List-Help: <mailto:sip-clf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Feb 2010 18:52:26 -0000

Hi,

Operators often use syslog to carry Apache CLF log data.  Syslog, in
practice, is primarily used for tunneling Apache CLF format. This
seems to be attractive for operators, because of already existing
infrastructure and syslog knowledge.
I suggest it makes sense to develop a sipclf format that can be
carried within an IETF syslog message. In general, that means
information in an ascii format dumped into a local file that can be
parsed with tools like grep, and that can later be secured, filtered,
transported, aggregated, and correlated using existing infrastructure.


Multiple use cases raised by contributors in this WG lead to different
requirements. Some want to pay attention to the data dumped by one
server; some want to follow traffic flows through the network; some
want to filter on standardized fields; some want to aggregate and
correlate log information. 

It is not enough to figure out how to dump data on a single system;
that data will need to be compatible with infrastructure used to
provide secure transport, filtering, correlation, etc. Operators
already have existing infrastructures designed for long-term archiving
of (potentially enormous) logging information, and the goal to
correlate log records, including data-mining.

Syslog is already widely deployed, is well understood by operators,
and the IETF syslog WG has standardized many aspects of security and
transport, such as (D)TLS-secured transport, support for large
messages, optional digitally signed logging for law enforcement and
for message stream integrity checking, etc. 

IETF syslog standardizes a number of parameters useful for
correlation, such as
facility (specific applications), severity classification, timestamp,
hostname, the name of the application sending the message (often
syslogd), process ID, and message ID that are in the syslog header in
ascii format. These were designed to be compatible with ITU logging
standards, and the ALARM-MIB, to provide easier correlation of events
across different event reporting mechanisms. Also to improve
correlation, work has been done to translate syslog messages into SNMP
traps, and SNMP traps into syslog messages.

The IETF syslog standard also provides structured data elements. SDEs
are
designed to supplement the human-readable text with
application-parseable data fields (also encoded in 7-bit ascii), which
makes it easier for applications, such as security management systems,
to extract and correlate the data across vendor implementations, and
across nodes in a network. 

The IETF syslog standard already defines some SDEs that would likely
be useful for the problems sip clf is trying to resolve, like
precisiely tracking sequence inside a networked system: a
high-precision timestamp, the quality of the time source for a given
system, time zone accuracy, whether a node is synched with a network
time source, the origin of a log entry (useful after aggregation and
relay), the ip address at time of logging, an enterprise identifier,
the software that generated the message (i.e., the application that
asked syslogd to send the message), the software version, a sequence
number to provide sequence as seen by a single node, the sysUpTime of
a co-resident SNMP system, and the language used within the
human-readable MSG. The IETF syslog standard also contains some
recommendations on intelligently dropping messages if the log volume
becomes overwhelming.

The syslog WG deliberately did not standardize the content of the
human-readable message field. The WG standardized the header, and has
provided SDEs to standardize certain aspects of the information where
consensus can be reached. Having both (potentially non-standardized)
human-readable data, and standardized human-and-machine-readable
structured data in the same message addresses a wide range of use
cases, and gives the human more information to work with to interpret
an event. 

I propose that the WG reach consensus on specific fields of data that
would be good to standardize, such as those defined in the problem
statement doc, and define them as syslog SDEs (which, remember, are
text fields so they would be greppable and printable and diff-able and
human-readable). Structured data elements would better support
application-parsing of the data, such as for training IDS/IPS anomaly
engines.

There are only a few restrictions placed on the content of a MSG field
in a syslog message. According to the problem-statement document,
there already exist a number of proprietary sip clf formats. Well, if
those are in a format that can fit within the MSG field within an IETF
standard syslog message, then that proprietary data can also be
carried in the syslog message. Any vendor-specific log-parsing tools
would continue to work with the extracted MSG field, and they could be
supplemented by tools that can parse the standardized SDE information.
The IETF syslog standard also supports vendor-specific SDEs for
extensibility of structured data.

In a similar manner to the dual stack approach for IPv4/IPv6
transition, implementers could choose to drop specific fields from
their proprietary formats as consensus on useful SDEs is reached, and
their tools are adapted to use the standardized header and SDE
information.

This approach would work with the WG goal to constrain its focus to
the "useful  information" and not need to reinvent solutions such as a
data modeling language, character sets, delimiters, secure transport,
log integrity checking, log filtering, log aggregation and correlation
issues, and so on. 

I do not see much benefit from designing a whole new ascii file format
that no existing tools support (except generic text-handling tools
like grep), and operators would need to learn in addition to the
semantics in the information model. 

I recommend the WG focus on specific and actually existing problem
cases, and build the semantic "information model" incrementally. Then
use the existing standard syslog format to provide an example data
model, which inherits the benefits of an existing widely-deployed
infrastructure for logging.

David Harrington
dbharrington@comcast.net
ietfdbh@comcast.net
dharrington@huawei.com