Re: [sip-clf] AD review: draft-ietf-sipclf-format-05

Gonzalo Salgueiro <gsalguei@cisco.com> Thu, 09 February 2012 00:25 UTC

Return-Path: <gsalguei@cisco.com>
X-Original-To: sip-clf@ietfa.amsl.com
Delivered-To: sip-clf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 89CCE11E80B3 for <sip-clf@ietfa.amsl.com>; Wed, 8 Feb 2012 16:25:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.598
X-Spam-Level:
X-Spam-Status: No, score=-10.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id uEZNbSPwPdDh for <sip-clf@ietfa.amsl.com>; Wed, 8 Feb 2012 16:25:41 -0800 (PST)
Received: from av-tac-rtp.cisco.com (hen.cisco.com [64.102.19.198]) by ietfa.amsl.com (Postfix) with ESMTP id DCBFF11E80B1 for <sip-clf@ietf.org>; Wed, 8 Feb 2012 16:25:40 -0800 (PST)
X-TACSUNS: Virus Scanned
Received: from chook.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-rtp.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id q190Pdfj011430 for <sip-clf@ietf.org>; Wed, 8 Feb 2012 19:25:40 -0500 (EST)
Received: from dhcp-64-102-210-22.cisco.com (dhcp-64-102-210-22.cisco.com [64.102.210.22]) by chook.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id q190PaFb023774; Wed, 8 Feb 2012 19:25:36 -0500 (EST)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: multipart/alternative; boundary=Apple-Mail-15-400101793
From: Gonzalo Salgueiro <gsalguei@cisco.com>
In-Reply-To: <4F21DD3E.7000002@nostrum.com>
Date: Wed, 8 Feb 2012 19:25:36 -0500
Message-Id: <31A5C897-B767-4527-9346-905A80977F35@cisco.com>
References: <4F21DD3E.7000002@nostrum.com>
To: Robert Sparks <rjsparks@nostrum.com>
X-Mailer: Apple Mail (2.1084)
Cc: draft-ietf-sipclf-format@tools.ietf.org, sipclf-chairs@tools.ietf.org, "sip-clf@ietf.org Mailing" <sip-clf@ietf.org>
Subject: Re: [sip-clf] AD review: draft-ietf-sipclf-format-05
X-BeenThere: sip-clf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: SIP Common Log File format discussion list <sip-clf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-clf>
List-Post: <mailto:sip-clf@ietf.org>
List-Help: <mailto:sip-clf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Feb 2012 00:25:43 -0000

Robert - 

Thanks for your thorough and thoughtful review. Below are some inline comments to your feedback. 

I will release the -06 version of the draft once we have settled on all of the below points you have raised.


On Jan 26, 2012, at 6:09 PM, Robert Sparks wrote:

> AD review: draft-ietf-sipclf-format-05
> 
> Summary: This document has open issues to address before progressing to IETF LC
> 
> Major (mostly in document order)
> 
> 1) The version string claims it is hexadecimal encoded. The version for this
> document is expressed by the character A. Does that mean this is version 10
> (base 10)? Are we limited to versions 10 through 15? (we only have one byte
> and section 6 points out that index lines always start with an alphabetical
> character). This description needs to be clarified. While you're looking at
> it, consider discussing whether a given log file can have records with
> different versions in it.

In Section 4.1, the text says that the Version is "hexadecimal encoded."  
This was inherited from the original indexed-ASCII draft and is confusing. 
I will update it to indicate that it is ASCII encoded so we aren't subject 
to the limit of 6 versions. 

For quickly skipping of index records, the first character in the log
file for index records is an alphabetic character (Section 6).  So,
given this, we felt that characters "A" -> "Z" (i.e., 26 versions)
should be enough.  If the working group feels that we need more
versions, one could think of having the set ["a"-"z","A"-"Z"] as
versions, thus giving us 52 version numbers.

Regarding the same log file having records with different record
versions --- since the version number is per CLF record, it does
allow for having a SIP CLF file contain different records.  However,
our expectation is that this would not happen too often.  We can put
a sentence or two saying the above.

> 2) The description of Timestamp and Fractional Seconds is very confusing.
> What does it mean to be represented in big-endian fashion with a most
> significant octet and to be decimal encoded? It's very far from clear that
> you intend this to be an ascii string representing an integer, and it's
> not clear that you intend that it to be prefix-padded with 0s if necessary.
> Did the group consider using ABNF to describe these fields (or the whole
> log record) to avoid this kind of ambiguity?

Good point. The text needs clarifying. I will remove the portion of the text 
that states "Represented in big-endian fashion with most significant octet 
first from zero starting at the left, or high-order, position.  Decimal encoded." 
from the description of both Timestamp and Fractional Seconds fields. 
I will also supplement the text with the following summary text:

The timestamp is represented in the log file as a UTF-8 string
representing the date and time of the request or response
represented as the number of seconds and milliseconds since the
Unix epoch.  The number of milliseconds MUST be separated by a
"." from the number of seconds.  It is not required to zero-pad
the seconds and milliseconds, although if they are zero padded,
a SIP CLF reader MUST be able to interpret them by discarding
leading zeroes.

> 3) The document should be clear about whether the flags field values are
> extensible, and if so, how extension is expected to be managed. How will
> you handle SIP received over DCCP, Websockets, or something we haven't
> heard of yet?

About the only flag that is expected to be extended is the Transport
Flag (byte 4).  Currently, the format draft defines 3 values for
it (UDP, TCP, SCTP).  If Websockets and DCCP become supported transports
in SIP, then yes, this field will need to be extended.  We can do one
of two things for extension:

 1) Have IANA open up a registry for this field
 2) Use the Options part of the SIPCLF record to encode the transport.
  That is, we define byte 4 -- Transport Flag to contain:

  U = UDP
  T = TCP
  S = SCTP
  O = Other

  Then, when a new transport is supported, a log writer MUST put "O"
  for the Transport Flag and MUST create an optional field using
  Vendor-ID 00000000.  Example: when DCCP becomes supported, it will
  be encoded as follows:

  00@00000000,000F,Transport: DCCP


If we don't think that it is reasonable for new transports to be ascribed a 
value of 'Other' in the Transport Flag field, then the IANA registry is the 
more elegant solution. Let me know your thoughts on a way forward here 
since the draft currently has no IANA dependencies.

> 4) Section 4.2 speaks of placing "the actual values for the mandatory
> fields" into the record. It's not clear what "actual values" means. There
> are other sections of the document that talk about transforming (escaping)
> values before putting them in the log record, including dealing with tabs
> and edge conditions like field that has only a question mark in it. The
> concept of encoding and escaping needs to be much clearer throughout the
> document.

I'll edit that text to clarify actual values vs. escaped values. I'll also look 
through the document and ensure the text describing the escaping 
mechanism(s) is clear.

> 5) The document should more explicitly discuss the decision to replace
> tabs with spaces. That decision was based on there being no known use
> of tabs in SIP messages that means something semantically other than
> whitespace. The discussion should point that out, and note that you
> will not be able to use the logged information to reconstruct a
> signature over the logged fields, and that any future (or proprietary)
> SIP header fields that include tabs with semantic meaning (carrying a
> blob perhaps) will lose this meaning while being logged. It should also
> be made brutally explicit whether you want this substitution to happen
> when you are logging the entire message, or a body (that might happen
> to have tabs in it).

We will add a new paragraph that goes something
like this (the new paragraph is added at the end of S4.2 paragraph
"This data MUST appear ... (0x20) prior to being logged."):

 The decision to replace tabs with spaces was based on there
 being no known use of tabs in SIP messages to convey any other
 meaning than whitespace.  Two consequences of the decision to
 replace tab with a space character are: (a) it will become
 impossible to reconstruct a signature over the logged field that
 matches the signature over fields in the original SIP message,
 and (b) any future SIP header fields that include tabs with a
 different semantic meaning than simply signifying whitespace
 will loose this meaning when logged.  And finally, the tabs to
 spaces substitution MUST occur when logging mandatory fields;
 it SHOULD occur when when logging either the entire message or
 simply a SIP body.

> 6) There are a couple of places where you talk about escaping '-' and '?'
> characters. The document should note that this could produce a logged record
> that is not syntactically parsible with a SIP parser (quoted-pair is only
> used in certain places in SIP messages). There should also be some discussion
> about whether a log reader unescapes anything.

It was never the intent for a SIP CLF to be read by a SIP parser as it would be 
entirely nonsensical to it. The belief is that the reader of the SIP CLF would know 
what to do with it. I'll add a statement to the draft indicating this explicitly.

> 7) Why does Tag -01 restrict message bodies to just the listed types, especially
> with the open ended "miscellaneous text content" type category? If there's a real
> restriction here, it needs to be much more clearly stated. Why are you allowing
> any LWS for the separator between the content type header field value and the
> message body itself? Why isn't this simply a space? It is not clear what
> "byte encoded" means for representing binary bodies - do you mean expanding
> it into a hex representation, a uuencode representation, or soemthing else?

The decision to restrict logging of bodies to the specific types listed in the draft 
was a decision made by the WG [1]. The usage of a LWS separator was specified 
as part of that proposal for logging bodies. Making the separator a space 
instead of LWS is fine, in fact, it is preferable. I'll update the draft to reflect that 
change.

I'll also draw a line in the sand and update the text indicating that "byte encoded" 
in this context means that bodies with binary data MUST be base64 encoded for 
logging (since MIME uses base64 encoding).

> 8) There's no description of Tag-02. The text following the line that starts
> the description of the tag speaks of more general things covering both tag-01
> and tag-02.

I'll expand a bit on the text that says "Log entire SIP message".

> 9) When describing repeating tag-01 or tag-02 multiple times to deal with
> large entities, the document should be clear that the fragments get logged
> in order, and how to reassemble the actual entity. It should also say whether
> it's ok to interleave (for example) tag-01 fragements with other optional fields.

This is a great callout. There is text in the description of Tag=01 that needs to 
be updated to clearly state that if your body or the entire SIP
message is > 4096 bytes, then tough luck. We can't chunk this as it would open 
up a can of worms that we don't want to deal with; like indexing the chunks 
and a reassembly scheme. Better not to go there. I will update the text 
to clearly state this point.

> 10) The description of the Length field in section 4.3 is ambiguous. It says
> "This length does not include the header shown in Figure 5." But figure 5 has
> nothing clearly identified as "the header".

I'll update the description of the Length field to state:

Length Field (4 bytes):  Indicates the length of only the "Value"
field of this optionally logged element (as shown in Figure 5),
hexadecimal encoded. This length corresponds to the length of the
"Value" field only and MUST NOT include any of the other elements
shown in Figure 5.

> 11) Where does the document talk about how to log fields that are line folded
> in the original message? Is it ok for the logged field to contain CRLFs, or is
> the intent that the field be unfolded before logging?

I will update the text to indicate that:

For mandatory fields:

 The mandatory fields MUST NOT contain CRLF when logged.

For optional fields:

If an optional field contains a CRLF as part of its normal 
production rule, the CRLF MUST be escaped by using the 
URI encoded equivalent value of "%0D%0A".

For logging bodies:

All CRLF digraphs in the SIP body MUST be escaped by using the 
URI encoded equivalent value of "%0D%0A"

For logging the entire SIP message:

All instances of CRLF digraphs, whether they appear in the SIP
headers or the body MUST be escaped by using the URI encoded 
equivalent value of "%0D%0A".

> 12) It's not clear if the \r\n used in the examples of logging the SDP body
> are there as text-conventions for showing the example in an RFC, or if you
> intend that CRLFs be replaced by \r\n's when logging. If it's just a convention,
> please make it clear that the actual log record will have CRLFs. If you were
> expecting the CRLFs to be escaped in the log, there needs to be text that
> says that normatively somewhere, and you need to consider escaping the escape
> character.

As per my comments for point #11 above, the \r\n in that example needs to be 
escaped with "%0D%0A". I'll update the examples to match with this new 
text that I will be adding.

> Minor
> 
> There is a maintenance risk in having all the detail of each block duplicated
> in Figure 1. Please consider taking that figure up a level, showing the three
> blocks and how they're separated, and let the other figures show the internal
> detail.

This is something that we have discussed and have not found an elegant solution for. 
It was thought to be very valuable to display the pertinent portion of the complete 
CLF within the section where that was discussed. I'll do my best to ensure that they 
are all copied over properly from Figure 1.

> There are blocks of text in this document that are really laying out the problem.
> Can they be moved into the problem statement/information model/framework document
> or simply deleted?
> For example: "Implementing a logging scheme for SIP is a considerable challenge.",
> "Whilst one may question the value of the From URI [...]", etc.

Ok. I'll scour the document for instances of such text and remove them.

> In section 4.1, you talk of "a single, 52 character hexadecimal encoded string"
> between bytes 8 and 55. There are only 48 octets between bytes 8 and 55
> (inclusive). Please adjust the text to note that you are also talking about
> the pointers to the optional fields or describe the shorter length.

I'll edit this text to be clearer.

> In section 4.2's description of the To tag field you say "Value of the
> tag parameter (if present)" but don't talk about what to do if the tag
> parameter isn't present. In the parallel section on the From tag, you
> don't talk about the potential for messages without From tags. If it's
> the intent to not be able to log messages from legacy 2543 devices, that's
> fine, but the document should call that out as a design decision and it wouldn't
> hurt to provide some guidance on what to do if such a message shows up.

I'll add some text to clarify and simplify this. Basically, if the tag is present in To and From, 
it is logged. If it is not present, a "-" is logged. I'll leave it at that. The intent is not 
to differentiate between messages from legacy rfc2543 devices versus the 
updated rfc3261.  

> There is a lot of descriptive text in 4.2's sections on the server and client
> transaction that is a copy of what's in the problem-statement/information model
> document. It would be better to just point to the description in the other document.

This is something that we debated about quite a bit. In the end we thought it more 
useful for implementers to have the description of all mandatory fields right in the 
format draft.

> 
> Nits
> 
> How is this a "functionally equivalent event logging mechanism"? Suggest replacing
> that sentence with the single word "analogous".

I'll make this change.

> In the introduction, the document claims to define a format that is no more difficult
> to generate, but it's not clear what it's being compared to. No more difficult than
> what?

I'll clarify this.

> Section 4.1's description of Record Length can be misinterpreted. By listing
> a subset of the fields that are included, it leaves open whether any fields
> are excluded. It would be better to say "beginning with the Version octet
> and ending with the terminating line-feed".

I'll make this change.

> It would help to be more explicit that you are not logging the CRLF at the end
> of a given header field value. (Or is it an option for a logger to choose to do
> so?)

I discussed this in one of my earlier responses. This will be stated more explicitly 
that the terminating CRLF of a logged field should not be logged. The only CRLF's 
that should be logged are those of a multi-line body, field, or complete message 
and they should be escaped as will be specified in the draft.

> Please consider scrubbing the message used in example 5 back to the minimal
> number of octets you need to make the example useful. In particular, consider
> removing elements that have not already been standardized. I'm surprised to see
> the literal strings "server-tx" and "client-tx" in the example log message. Was
> that what you really intended?

I'll look to trim down the sample message in Section 5 to a minimal set of fields. 
The use of server-txn and client-txn was intentional and done to parallel the usage in 
the data model draft. If you think a specific value is best, then I'll make one up.

> The code snippet should be delineated as such, following the
> guidance in the TLP (see section 4.b of <http://trustee.ietf.org/docs/IETF-Trust-License-Policy.pdf>

No problem. I'll add the the markers <CODE BEGINS> and <CODE ENDS> around 
the code snippet.


[1] http://www.ietf.org/mail-archive/web/sip-clf/current/msg00423.html

Thanks,

Gonzalo

> 
> 
> _______________________________________________
> sip-clf mailing list
> sip-clf@ietf.org
> https://www.ietf.org/mailman/listinfo/sip-clf
>