[sip-clf] A syslog approach to sip logging
"David B Harrington" <dbharrington@comcast.net> Tue, 02 February 2010 18:52 UTC
Return-Path: <dbharrington@comcast.net>
X-Original-To: sip-clf@core3.amsl.com
Delivered-To: sip-clf@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id AC3B23A697B for <sip-clf@core3.amsl.com>; Tue, 2 Feb 2010 10:52:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RPHAUOIVEYb5 for <sip-clf@core3.amsl.com>; Tue, 2 Feb 2010 10:52:25 -0800 (PST)
Received: from qmta04.westchester.pa.mail.comcast.net (qmta04.westchester.pa.mail.comcast.net [76.96.62.40]) by core3.amsl.com (Postfix) with ESMTP id 3FA913A6977 for <sip-clf@ietf.org>; Tue, 2 Feb 2010 10:52:25 -0800 (PST)
Received: from omta07.westchester.pa.mail.comcast.net ([76.96.62.59]) by qmta04.westchester.pa.mail.comcast.net with comcast id d5t11d0011GhbT8546t2dr; Tue, 02 Feb 2010 18:53:02 +0000
Received: from Harrington73653 ([24.147.240.98]) by omta07.westchester.pa.mail.comcast.net with comcast id d6t11d00g284sdk3T6t2hF; Tue, 02 Feb 2010 18:53:02 +0000
From: David B Harrington <dbharrington@comcast.net>
To: 'SIP-CLF Mailing List' <sip-clf@ietf.org>
Date: Tue, 02 Feb 2010 13:53:00 -0500
Message-ID: <013201caa438$f19aac50$0600a8c0@china.huawei.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: Microsoft Office Outlook 11
Thread-Index: AcqZH0TAsDVTbi8JRpeDUjYC7HRWbAAAHY7QArr3O2A=
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3198
Subject: [sip-clf] A syslog approach to sip logging
X-BeenThere: sip-clf@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: SIP Common Log File format discussion list <sip-clf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-clf>
List-Post: <mailto:sip-clf@ietf.org>
List-Help: <mailto:sip-clf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 02 Feb 2010 18:52:26 -0000
Hi, Operators often use syslog to carry Apache CLF log data. Syslog, in practice, is primarily used for tunneling Apache CLF format. This seems to be attractive for operators, because of already existing infrastructure and syslog knowledge. I suggest it makes sense to develop a sipclf format that can be carried within an IETF syslog message. In general, that means information in an ascii format dumped into a local file that can be parsed with tools like grep, and that can later be secured, filtered, transported, aggregated, and correlated using existing infrastructure. Multiple use cases raised by contributors in this WG lead to different requirements. Some want to pay attention to the data dumped by one server; some want to follow traffic flows through the network; some want to filter on standardized fields; some want to aggregate and correlate log information. It is not enough to figure out how to dump data on a single system; that data will need to be compatible with infrastructure used to provide secure transport, filtering, correlation, etc. Operators already have existing infrastructures designed for long-term archiving of (potentially enormous) logging information, and the goal to correlate log records, including data-mining. Syslog is already widely deployed, is well understood by operators, and the IETF syslog WG has standardized many aspects of security and transport, such as (D)TLS-secured transport, support for large messages, optional digitally signed logging for law enforcement and for message stream integrity checking, etc. IETF syslog standardizes a number of parameters useful for correlation, such as facility (specific applications), severity classification, timestamp, hostname, the name of the application sending the message (often syslogd), process ID, and message ID that are in the syslog header in ascii format. These were designed to be compatible with ITU logging standards, and the ALARM-MIB, to provide easier correlation of events across different event reporting mechanisms. Also to improve correlation, work has been done to translate syslog messages into SNMP traps, and SNMP traps into syslog messages. The IETF syslog standard also provides structured data elements. SDEs are designed to supplement the human-readable text with application-parseable data fields (also encoded in 7-bit ascii), which makes it easier for applications, such as security management systems, to extract and correlate the data across vendor implementations, and across nodes in a network. The IETF syslog standard already defines some SDEs that would likely be useful for the problems sip clf is trying to resolve, like precisiely tracking sequence inside a networked system: a high-precision timestamp, the quality of the time source for a given system, time zone accuracy, whether a node is synched with a network time source, the origin of a log entry (useful after aggregation and relay), the ip address at time of logging, an enterprise identifier, the software that generated the message (i.e., the application that asked syslogd to send the message), the software version, a sequence number to provide sequence as seen by a single node, the sysUpTime of a co-resident SNMP system, and the language used within the human-readable MSG. The IETF syslog standard also contains some recommendations on intelligently dropping messages if the log volume becomes overwhelming. The syslog WG deliberately did not standardize the content of the human-readable message field. The WG standardized the header, and has provided SDEs to standardize certain aspects of the information where consensus can be reached. Having both (potentially non-standardized) human-readable data, and standardized human-and-machine-readable structured data in the same message addresses a wide range of use cases, and gives the human more information to work with to interpret an event. I propose that the WG reach consensus on specific fields of data that would be good to standardize, such as those defined in the problem statement doc, and define them as syslog SDEs (which, remember, are text fields so they would be greppable and printable and diff-able and human-readable). Structured data elements would better support application-parsing of the data, such as for training IDS/IPS anomaly engines. There are only a few restrictions placed on the content of a MSG field in a syslog message. According to the problem-statement document, there already exist a number of proprietary sip clf formats. Well, if those are in a format that can fit within the MSG field within an IETF standard syslog message, then that proprietary data can also be carried in the syslog message. Any vendor-specific log-parsing tools would continue to work with the extracted MSG field, and they could be supplemented by tools that can parse the standardized SDE information. The IETF syslog standard also supports vendor-specific SDEs for extensibility of structured data. In a similar manner to the dual stack approach for IPv4/IPv6 transition, implementers could choose to drop specific fields from their proprietary formats as consensus on useful SDEs is reached, and their tools are adapted to use the standardized header and SDE information. This approach would work with the WG goal to constrain its focus to the "useful information" and not need to reinvent solutions such as a data modeling language, character sets, delimiters, secure transport, log integrity checking, log filtering, log aggregation and correlation issues, and so on. I do not see much benefit from designing a whole new ascii file format that no existing tools support (except generic text-handling tools like grep), and operators would need to learn in addition to the semantics in the information model. I recommend the WG focus on specific and actually existing problem cases, and build the semantic "information model" incrementally. Then use the existing standard syslog format to provide an example data model, which inherits the benefits of an existing widely-deployed infrastructure for logging. David Harrington dbharrington@comcast.net ietfdbh@comcast.net dharrington@huawei.com
- Re: [sip-clf] A syslog approach to sip logging Vijay K. Gurbani
- Re: [sip-clf] A syslog approach to sip logging Cullen Jennings
- [sip-clf] A syslog approach to sip logging David B Harrington
- Re: [sip-clf] A syslog approach to sip logging Spencer Dawkins
- Re: [sip-clf] A syslog approach to sip logging Spencer Dawkins
- Re: [sip-clf] A syslog approach to sip logging Rainer Gerhards
- Re: [sip-clf] A syslog approach to sip logging Spencer Dawkins
- Re: [sip-clf] A syslog approach to sip logging Vijay K. Gurbani
- Re: [sip-clf] A syslog approach to sip logging Hadriel Kaplan
- Re: [sip-clf] A syslog approach to sip logging Adam Roach
- Re: [sip-clf] A syslog approach to sip logging Rainer Gerhards
- Re: [sip-clf] A syslog approach to sip logging Rainer Gerhards
- Re: [sip-clf] A syslog approach to sip logging Rainer Gerhards
- Re: [sip-clf] A syslog approach to sip logging Rainer Gerhards
- Re: [sip-clf] A syslog approach to sip logging Vijay K. Gurbani