Re: [sip-clf] A syslog approach to sip logging

Speaking as individual,

Just as a heads-up, David and I had a private conversation late last week, 
where David told me that he did some investigation after our Hiroshima 
meeting, and discovered (to his surprise) that it's fairly common to use 
SYSLOG for Apache CLF.

I'm very interested in hearing reactions to this note, because it makes me 
much more comfortable with our charter - most of our chartered work is to 
identify fields that need to be logged and figure out a story for 
correlation; once we have identified the fields that should be logged, IFF 
we agree that SYSLOG makes sense, defining SYSLOG structured data elements 
for those fields should be pretty straightforward.

Thanks,

Spencer

> Hi,
>
> Operators often use syslog to carry Apache CLF log data.  Syslog, in
> practice, is primarily used for tunneling Apache CLF format. This
> seems to be attractive for operators, because of already existing
> infrastructure and syslog knowledge.
> I suggest it makes sense to develop a sipclf format that can be
> carried within an IETF syslog message. In general, that means
> information in an ascii format dumped into a local file that can be
> parsed with tools like grep, and that can later be secured, filtered,
> transported, aggregated, and correlated using existing infrastructure.
>
>
> Multiple use cases raised by contributors in this WG lead to different
> requirements. Some want to pay attention to the data dumped by one
> server; some want to follow traffic flows through the network; some
> want to filter on standardized fields; some want to aggregate and
> correlate log information.
>
> It is not enough to figure out how to dump data on a single system;
> that data will need to be compatible with infrastructure used to
> provide secure transport, filtering, correlation, etc. Operators
> already have existing infrastructures designed for long-term archiving
> of (potentially enormous) logging information, and the goal to
> correlate log records, including data-mining.
>
> Syslog is already widely deployed, is well understood by operators,
> and the IETF syslog WG has standardized many aspects of security and
> transport, such as (D)TLS-secured transport, support for large
> messages, optional digitally signed logging for law enforcement and
> for message stream integrity checking, etc.
>
> IETF syslog standardizes a number of parameters useful for
> correlation, such as
> facility (specific applications), severity classification, timestamp,
> hostname, the name of the application sending the message (often
> syslogd), process ID, and message ID that are in the syslog header in
> ascii format. These were designed to be compatible with ITU logging
> standards, and the ALARM-MIB, to provide easier correlation of events
> across different event reporting mechanisms. Also to improve
> correlation, work has been done to translate syslog messages into SNMP
> traps, and SNMP traps into syslog messages.
>
> The IETF syslog standard also provides structured data elements. SDEs
> are
> designed to supplement the human-readable text with
> application-parseable data fields (also encoded in 7-bit ascii), which
> makes it easier for applications, such as security management systems,
> to extract and correlate the data across vendor implementations, and
> across nodes in a network.
>
> The IETF syslog standard already defines some SDEs that would likely
> be useful for the problems sip clf is trying to resolve, like
> precisiely tracking sequence inside a networked system: a
> high-precision timestamp, the quality of the time source for a given
> system, time zone accuracy, whether a node is synched with a network
> time source, the origin of a log entry (useful after aggregation and
> relay), the ip address at time of logging, an enterprise identifier,
> the software that generated the message (i.e., the application that
> asked syslogd to send the message), the software version, a sequence
> number to provide sequence as seen by a single node, the sysUpTime of
> a co-resident SNMP system, and the language used within the
> human-readable MSG. The IETF syslog standard also contains some
> recommendations on intelligently dropping messages if the log volume
> becomes overwhelming.
>
> The syslog WG deliberately did not standardize the content of the
> human-readable message field. The WG standardized the header, and has
> provided SDEs to standardize certain aspects of the information where
> consensus can be reached. Having both (potentially non-standardized)
> human-readable data, and standardized human-and-machine-readable
> structured data in the same message addresses a wide range of use
> cases, and gives the human more information to work with to interpret
> an event.
>
> I propose that the WG reach consensus on specific fields of data that
> would be good to standardize, such as those defined in the problem
> statement doc, and define them as syslog SDEs (which, remember, are
> text fields so they would be greppable and printable and diff-able and
> human-readable). Structured data elements would better support
> application-parsing of the data, such as for training IDS/IPS anomaly
> engines.
>
> There are only a few restrictions placed on the content of a MSG field
> in a syslog message. According to the problem-statement document,
> there already exist a number of proprietary sip clf formats. Well, if
> those are in a format that can fit within the MSG field within an IETF
> standard syslog message, then that proprietary data can also be
> carried in the syslog message. Any vendor-specific log-parsing tools
> would continue to work with the extracted MSG field, and they could be
> supplemented by tools that can parse the standardized SDE information.
> The IETF syslog standard also supports vendor-specific SDEs for
> extensibility of structured data.
>
> In a similar manner to the dual stack approach for IPv4/IPv6
> transition, implementers could choose to drop specific fields from
> their proprietary formats as consensus on useful SDEs is reached, and
> their tools are adapted to use the standardized header and SDE
> information.
>
> This approach would work with the WG goal to constrain its focus to
> the "useful  information" and not need to reinvent solutions such as a
> data modeling language, character sets, delimiters, secure transport,
> log integrity checking, log filtering, log aggregation and correlation
> issues, and so on.
>
> I do not see much benefit from designing a whole new ascii file format
> that no existing tools support (except generic text-handling tools
> like grep), and operators would need to learn in addition to the
> semantics in the information model.
>
> I recommend the WG focus on specific and actually existing problem
> cases, and build the semantic "information model" incrementally. Then
> use the existing standard syslog format to provide an example data
> model, which inherits the benefits of an existing widely-deployed
> infrastructure for logging.
>
> David Harrington
> dbharrington@comcast.net
> ietfdbh@comcast.net
> dharrington@huawei.com
>
>
>
> _______________________________________________
> sip-clf mailing list
> sip-clf@ietf.org
> https://www.ietf.org/mailman/listinfo/sip-clf