Re: [sip-clf] AD review: draft-ietf-sipclf-format-05

Robert Sparks <rjsparks@nostrum.com> Thu, 09 February 2012 15:18 UTC

Return-Path: <rjsparks@nostrum.com>
X-Original-To: sip-clf@ietfa.amsl.com
Delivered-To: sip-clf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 81D9221F8567 for <sip-clf@ietfa.amsl.com>; Thu, 9 Feb 2012 07:18:48 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.599
X-Spam-Level:
X-Spam-Status: No, score=-102.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id fDZiVf7qqoml for <sip-clf@ietfa.amsl.com>; Thu, 9 Feb 2012 07:18:46 -0800 (PST)
Received: from nostrum.com (nostrum-pt.tunnel.tserv2.fmt.ipv6.he.net [IPv6:2001:470:1f03:267::2]) by ietfa.amsl.com (Postfix) with ESMTP id 64DBB21F84E6 for <sip-clf@ietf.org>; Thu, 9 Feb 2012 07:18:45 -0800 (PST)
Received: from unexplicable.local (pool-71-170-125-181.dllstx.fios.verizon.net [71.170.125.181]) (authenticated bits=0) by nostrum.com (8.14.3/8.14.3) with ESMTP id q19FIe9E014179 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 9 Feb 2012 09:18:41 -0600 (CST) (envelope-from rjsparks@nostrum.com)
Message-ID: <4F33E3D0.7000605@nostrum.com>
Date: Thu, 09 Feb 2012 09:18:40 -0600
From: Robert Sparks <rjsparks@nostrum.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:9.0) Gecko/20111222 Thunderbird/9.0.1
MIME-Version: 1.0
To: Gonzalo Salgueiro <gsalguei@cisco.com>
References: <4F21DD3E.7000002@nostrum.com> <31A5C897-B767-4527-9346-905A80977F35@cisco.com>
In-Reply-To: <31A5C897-B767-4527-9346-905A80977F35@cisco.com>
Content-Type: multipart/alternative; boundary="------------030102090202080107060901"
Received-SPF: pass (nostrum.com: 71.170.125.181 is authenticated by a trusted mechanism)
Cc: draft-ietf-sipclf-format@tools.ietf.org, sipclf-chairs@tools.ietf.org, "sip-clf@ietf.org Mailing" <sip-clf@ietf.org>
Subject: Re: [sip-clf] AD review: draft-ietf-sipclf-format-05
X-BeenThere: sip-clf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: SIP Common Log File format discussion list <sip-clf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-clf>
List-Post: <mailto:sip-clf@ietf.org>
List-Help: <mailto:sip-clf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Feb 2012 15:18:48 -0000

Inline, trimming to points with responses

On 2/8/12 6:25 PM, Gonzalo Salgueiro wrote:
> Robert -
>
> Thanks for your thorough and thoughtful review. Below are some inline 
> comments to your feedback.
>
> I will release the -06 version of the draft once we have settled on 
> all of the below points you have raised.
>
>
> On Jan 26, 2012, at 6:09 PM, Robert Sparks wrote:
>
>> AD review: draft-ietf-sipclf-format-05
>>
>> Summary: This document has open issues to address before progressing 
>> to IETF LC
>>
>> Major (mostly in document order)
>>
<snip/>
>
>> 2) The description of Timestamp and Fractional Seconds is very confusing.
>> What does it mean to be represented in big-endian fashion with a most
>> significant octet and to be decimal encoded? It's very far from clear 
>> that
>> you intend this to be an ascii string representing an integer, and it's
>> not clear that you intend that it to be prefix-padded with 0s if 
>> necessary.
>> Did the group consider using ABNF to describe these fields (or the whole
>> log record) to avoid this kind of ambiguity?
>
> Good point. The text needs clarifying. I will remove the portion of 
> the text
> that states "Represented in big-endian fashion with most significant 
> octet
> first from zero starting at the left, or high-order, position. 
>  Decimal encoded."
> from the description of both Timestamp and Fractional Seconds fields.
> I will also supplement the text with the following summary text:
>
> The timestamp is represented in the log file as a UTF-8 string
> representing the date and time of the request or response
> represented as the number of seconds and milliseconds since the
> Unix epoch.  The number of milliseconds MUST be separated by a
> "." from the number of seconds.  It is not required to zero-pad
> the seconds and milliseconds, although if they are zero padded,
> a SIP CLF reader MUST be able to interpret them by discarding
> leading zeroes.
Actually, I was under the impression that the zero padding was
expected. The examples have them consistently - why did that happen?
Either way, group members should say something about what the intent was.

>
>> 3) The document should be clear about whether the flags field values are
>> extensible, and if so, how extension is expected to be managed. How will
>> you handle SIP received over DCCP, Websockets, or something we haven't
>> heard of yet?
>
> About the only flag that is expected to be extended is the Transport
> Flag (byte 4).  Currently, the format draft defines 3 values for
> it (UDP, TCP, SCTP).  If Websockets and DCCP become supported transports
> in SIP, then yes, this field will need to be extended.  We can do one
> of two things for extension:
>
>  1) Have IANA open up a registry for this field
>  2) Use the Options part of the SIPCLF record to encode the transport.
>   That is, we define byte 4 -- Transport Flag to contain:
>
>   U = UDP
>   T = TCP
>   S = SCTP
>   O = Other
>
>   Then, when a new transport is supported, a log writer MUST put "O"
>   for the Transport Flag and MUST create an optional field using
>   Vendor-ID 00000000.  Example: when DCCP becomes supported, it will
>   be encoded as follows:
>
>   00@00000000,000F,Transport: DCCP
>
>
> If we don't think that it is reasonable for new transports to be 
> ascribed a
> value of 'Other' in the Transport Flag field, then the IANA registry 
> is the
> more elegant solution. Let me know your thoughts on a way forward here
> since the draft currently has no IANA dependencies.
That's a choice the group should make - the document just needs to be 
explicit
about what's extensible (for all the fields) and if a field is 
extensible, what the mechanism
looks like. For the transport field, as an individual contributor, I 
prefer the IANA registry approach.

<snip/>
>
>> 5) The document should more explicitly discuss the decision to replace
>> tabs with spaces. That decision was based on there being no known use
>> of tabs in SIP messages that means something semantically other than
>> whitespace. The discussion should point that out, and note that you
>> will not be able to use the logged information to reconstruct a
>> signature over the logged fields, and that any future (or proprietary)
>> SIP header fields that include tabs with semantic meaning (carrying a
>> blob perhaps) will lose this meaning while being logged. It should also
>> be made brutally explicit whether you want this substitution to happen
>> when you are logging the entire message, or a body (that might happen
>> to have tabs in it).
>
> We will add a new paragraph that goes something
> like this (the new paragraph is added at the end of S4.2 paragraph
> "This data MUST appear ... (0x20) prior to being logged."):
>
>  The decision to replace tabs with spaces was based on there
>  being no known use of tabs in SIP messages to convey any other
>  meaning than whitespace.  Two consequences of the decision to
>  replace tab with a space character are: (a) it will become
>  impossible to reconstruct a signature over the logged field that
>  matches the signature over fields in the original SIP message,
>  and (b) any future SIP header fields that include tabs with a
>  different semantic meaning than simply signifying whitespace
>  will loose this meaning when logged.  And finally, the tabs to
>  spaces substitution MUST occur when logging mandatory fields;
>  it SHOULD occur when when logging either the entire message or
>  simply a SIP body.
This leaves out optional fields. It's also not clear why you use a SHOULD
for the requirement over bodies.

You should also make it clear (as part of escaping above) whether any
escaping is done before encoding binary bodies.

>> 6) There are a couple of places where you talk about escaping '-' and '?'
>> characters. The document should note that this could produce a logged 
>> record
>> that is not syntactically parsible with a SIP parser (quoted-pair is only
>> used in certain places in SIP messages). There should also be some 
>> discussion
>> about whether a log reader unescapes anything.
>
> It was never the intent for a SIP CLF to be read by a SIP parser as it 
> would be
> entirely nonsensical to it.
I think it _was_ the intent to be able to reuse parser code.
Several early implementers were handing individual fields to the part of 
their parser that
would handle a framed header field.

> The belief is that the reader of the SIP CLF would know
> what to do with it. I'll add a statement to the draft indicating this 
> explicitly.
It will surprise me if a simple statement will fully address this point 
- be sure to
capture whether the reader unescapes anyththing.
>
>> 7) Why does Tag -01 restrict message bodies to just the listed types, 
>> especially
>> with the open ended "miscellaneous text content" type category? If 
>> there's a real
>> restriction here, it needs to be much more clearly stated. Why are 
>> you allowing
>> any LWS for the separator between the content type header field value 
>> and the
>> message body itself? Why isn't this simply a space? It is not clear what
>> "byte encoded" means for representing binary bodies - do you mean 
>> expanding
>> it into a hex representation, a uuencode representation, or soemthing 
>> else?
>
> The decision to restrict logging of bodies to the specific types 
> listed in the draft
> was a decision made by the WG [1].
The document needs to explain _why_ this particular subset was chosen.
What basis did the group use for that choice? My read of the thread was 
that Vijay
threw an initial list together and that just ended up in the document. 
It's not clear
that the list was intended to be exclusive, just that whatever 
subsequent decisions
that were made allowed at least those types.

WHY are you not allowing images? Why are you not allowing multipart/mime?
(If you ever did actually see an S/MIME protected body, you would not be 
able to log it
the way the document stands now.)
> The usage of a LWS separator was specified
> as part of that proposal for logging bodies. Making the separator a space
> instead of LWS is fine, in fact, it is preferable. I'll update the 
> draft to reflect that
> change.
>
> I'll also draw a line in the sand and update the text indicating that 
> "byte encoded"
> in this context means that bodies with binary data MUST be base64 
> encoded for
> logging (since MIME uses base64 encoding).
Please be careful to be specific in the instructions on when to encode
(for example, would you re-encode a binary format that was already 
base64 encoded?)

<snip/>
>
>
>> 9) When describing repeating tag-01 or tag-02 multiple times to deal with
>> large entities, the document should be clear that the fragments get 
>> logged
>> in order, and how to reassemble the actual entity. It should also say 
>> whether
>> it's ok to interleave (for example) tag-01 fragements with other 
>> optional fields.
>
> This is a great callout. There is text in the description of Tag=01 
> that needs to
> be updated to clearly state that if your body or the entire SIP
> message is > 4096 bytes, then tough luck. We can't chunk this as it 
> would open
> up a can of worms that we don't want to deal with; like indexing the 
> chunks
> and a reassembly scheme. Better not to go there. I will update the text
> to clearly state this point.

Now I'm confused. Why do you have the ability to repeat these tags?

<snip/>
>
>> 11) Where does the document talk about how to log fields that are 
>> line folded
>> in the original message? Is it ok for the logged field to contain 
>> CRLFs, or is
>> the intent that the field be unfolded before logging?
>
> I will update the text to indicate that:
>
> For mandatory fields:
>
>  The mandatory fields MUST NOT contain CRLF when logged.
>
> For optional fields:
>
> If an optional field contains a CRLF as part of its normal
> production rule, the CRLF MUST be escaped by using the
> URI encoded equivalent value of "%0D%0A".
Why are you treating optional fields and mandatory fields differently here?
If there's a good reason, fine, but be sure to include the reason in the 
document.
>
> For logging bodies:
>
> All CRLF digraphs in the SIP body MUST be escaped by using the
> URI encoded equivalent value of "%0D%0A"
>
> For logging the entire SIP message:
>
> All instances of CRLF digraphs, whether they appear in the SIP
> headers or the body MUST be escaped by using the URI encoded
> equivalent value of "%0D%0A".
Even when the body is binary (see previous discussion about escaping)?
The group should weigh in on whether this is what they were expecting for
the usual application/sdp case.

>> 12) It's not clear if the \r\n used in the examples of logging the 
>> SDP body
>> are there as text-conventions for showing the example in an RFC, or 
>> if you
>> intend that CRLFs be replaced by \r\n's when logging. If it's just a 
>> convention,
>> please make it clear that the actual log record will have CRLFs. If 
>> you were
>> expecting the CRLFs to be escaped in the log, there needs to be text that
>> says that normatively somewhere, and you need to consider escaping 
>> the escape
>> character.
>
> As per my comments for point #11 above, the \r\n in that example needs 
> to be
> escaped with "%0D%0A". I'll update the examples to match with this new
> text that I will be adding.
>
>> Minor
>>
>> There is a maintenance risk in having all the detail of each block 
>> duplicated
>> in Figure 1. Please consider taking that figure up a level, showing 
>> the three
>> blocks and how they're separated, and let the other figures show the 
>> internal
>> detail.
>
> This is something that we have discussed and have not found an elegant 
> solution for.
> It was thought to be very valuable to display the pertinent portion of 
> the complete
> CLF within the section where that was discussed. I'll do my best to 
> ensure that they
> are all copied over properly from Figure 1.
This has been a common theme in the IESG reviews of many recent 
documents. The issue
is not the care you take while editing this document. It's the burden 
you put on whoever
ends up with the pen on an update.
>
>
>> There is a lot of descriptive text in 4.2's sections on the server 
>> and client
>> transaction that is a copy of what's in the 
>> problem-statement/information model
>> document. It would be better to just point to the description in the 
>> other document.
>
> This is something that we debated about quite a bit. In the end we 
> thought it more
> useful for implementers to have the description of all mandatory 
> fields right in the
> format draft.
Again, this is something that IESG review has put pressure against in 
many recent
documents. Expect to defend this during that review. At the very least, 
make it clear
which document is authoritative should they conflict.
>
>
>> Please consider scrubbing the message used in example 5 back to the 
>> minimal
>> number of octets you need to make the example useful. In particular, 
>> consider
>> removing elements that have not already been standardized. I'm 
>> surprised to see
>> the literal strings "server-tx" and "client-tx" in the example log 
>> message. Was
>> that what you really intended?
>
> I'll look to trim down the sample message in Section 5 to a minimal 
> set of fields.
> The use of server-txn and client-txn was intentional and done to 
> parallel the usage in
> the data model draft. If you think a specific value is best, then I'll 
> make one up.
It's confusing as is - the example is supposed to be an actual log 
message, not a
pseudo-code description of one isn't it?
>
>>
>

RjS