Re: [sip-clf] AD review: draft-ietf-sipclf-format-05

Gonzalo Salgueiro <gsalguei@cisco.com> Thu, 09 February 2012 22:53 UTC

Return-Path: <gsalguei@cisco.com>
X-Original-To: sip-clf@ietfa.amsl.com
Delivered-To: sip-clf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id D28FA21E8044 for <sip-clf@ietfa.amsl.com>; Thu, 9 Feb 2012 14:53:49 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -10.434
X-Spam-Level:
X-Spam-Status: No, score=-10.434 tagged_above=-999 required=5 tests=[AWL=0.164, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-8]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id M53NG748G3p1 for <sip-clf@ietfa.amsl.com>; Thu, 9 Feb 2012 14:53:48 -0800 (PST)
Received: from av-tac-rtp.cisco.com (hen.cisco.com [64.102.19.198]) by ietfa.amsl.com (Postfix) with ESMTP id 01FA811E8073 for <sip-clf@ietf.org>; Thu, 9 Feb 2012 14:53:47 -0800 (PST)
X-TACSUNS: Virus Scanned
Received: from chook.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-rtp.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id q19Mrlkb017549 for <sip-clf@ietf.org>; Thu, 9 Feb 2012 17:53:47 -0500 (EST)
Received: from dhcp-64-102-210-22.cisco.com (dhcp-64-102-210-22.cisco.com [64.102.210.22]) by chook.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id q19Mrjst029004; Thu, 9 Feb 2012 17:53:45 -0500 (EST)
Mime-Version: 1.0 (Apple Message framework v1084)
Content-Type: multipart/alternative; boundary=Apple-Mail-40-480990469
From: Gonzalo Salgueiro <gsalguei@cisco.com>
In-Reply-To: <4F33E3D0.7000605@nostrum.com>
Date: Thu, 9 Feb 2012 17:53:44 -0500
Message-Id: <02D40CF4-CDB3-4962-A5DE-832DA1E0C457@cisco.com>
References: <4F21DD3E.7000002@nostrum.com> <31A5C897-B767-4527-9346-905A80977F35@cisco.com> <4F33E3D0.7000605@nostrum.com>
To: Robert Sparks <rjsparks@nostrum.com>
X-Mailer: Apple Mail (2.1084)
Cc: draft-ietf-sipclf-format@tools.ietf.org, sipclf-chairs@tools.ietf.org, "sip-clf@ietf.org Mailing" <sip-clf@ietf.org>
Subject: Re: [sip-clf] AD review: draft-ietf-sipclf-format-05
X-BeenThere: sip-clf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: SIP Common Log File format discussion list <sip-clf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-clf>
List-Post: <mailto:sip-clf@ietf.org>
List-Help: <mailto:sip-clf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 09 Feb 2012 22:53:49 -0000

Hi Robert - 

Thanks for your responsiveness on this. I'll close the loop on just those points 
that Vijay didn't already touch on in his response.

<snip>

>>> 2) The description of Timestamp and Fractional Seconds is very confusing.
>>> What does it mean to be represented in big-endian fashion with a most
>>> significant octet and to be decimal encoded? It's very far from clear that
>>> you intend this to be an ascii string representing an integer, and it's
>>> not clear that you intend that it to be prefix-padded with 0s if necessary.
>>> Did the group consider using ABNF to describe these fields (or the whole
>>> log record) to avoid this kind of ambiguity?
>> 
>> Good point. The text needs clarifying. I will remove the portion of the text 
>> that states "Represented in big-endian fashion with most significant octet 
>> first from zero starting at the left, or high-order, position.  Decimal encoded." 
>> from the description of both Timestamp and Fractional Seconds fields. 
>> I will also supplement the text with the following summary text:
>> 
>> The timestamp is represented in the log file as a UTF-8 string
>> representing the date and time of the request or response
>> represented as the number of seconds and milliseconds since the
>> Unix epoch.  The number of milliseconds MUST be separated by a
>> "." from the number of seconds.  It is not required to zero-pad
>> the seconds and milliseconds, although if they are zero padded,
>> a SIP CLF reader MUST be able to interpret them by discarding
>> leading zeroes.
> Actually, I was under the impression that the zero padding was
> expected. The examples have them consistently - why did that happen?
> Either way, group members should say something about what the intent was.

The examples will be updated to reflect the updated text regarding the 
zero padding.

<snip/> 
<snip>

>>> 7) Why does Tag -01 restrict message bodies to just the listed types, especially
>>> with the open ended "miscellaneous text content" type category? If there's a real
>>> restriction here, it needs to be much more clearly stated. Why are you allowing
>>> any LWS for the separator between the content type header field value and the
>>> message body itself? Why isn't this simply a space? It is not clear what
>>> "byte encoded" means for representing binary bodies - do you mean expanding
>>> it into a hex representation, a uuencode representation, or soemthing else?
>> 
>> The decision to restrict logging of bodies to the specific types listed in the draft 
>> was a decision made by the WG [1].
> The document needs to explain _why_ this particular subset was chosen.
> What basis did the group use for that choice? My read of the thread was that Vijay
> threw an initial list together and that just ended up in the document. It's not clear
> that the list was intended to be exclusive, just that whatever subsequent decisions
> that were made allowed at least those types.
> 
> WHY are you not allowing images? Why are you not allowing multipart/mime?
> (If you ever did actually see an S/MIME protected body, you would not be able to log it
> the way the document stands now.)

We floated some text on the list that contained a list of those body types the WG thought 
MUST be logged.  Nobody responded with an alternative, neither did anyone respond 
against the proposal.  So we went with it.  I'm happy to reopen this discussion again on the 
list, but if no one else answers than we are no better off than before.

I share your concern that we are being needlessly restrictive so I'll put forth a proposal 
to the WG in the next version of the draft that removes the constraint of logging bodies 
of those specified types. The logging mechanism will remain unchanged but without 
body type restrictions.

>> The usage of a LWS separator was specified 
>> as part of that proposal for logging bodies. Making the separator a space 
>> instead of LWS is fine, in fact, it is preferable. I'll update the draft to reflect that 
>> change.
>> 
>> I'll also draw a line in the sand and update the text indicating that "byte encoded" 
>> in this context means that bodies with binary data MUST be base64 encoded for 
>> logging (since MIME uses base64 encoding).
> Please be careful to be specific in the instructions on when to encode
> (for example, would you re-encode a binary format that was already base64 encoded?)

Sounds good. I'll add text to address this.

>>> 9) When describing repeating tag-01 or tag-02 multiple times to deal with
>>> large entities, the document should be clear that the fragments get logged
>>> in order, and how to reassemble the actual entity. It should also say whether
>>> it's ok to interleave (for example) tag-01 fragements with other optional fields.
>> 
>> This is a great callout. There is text in the description of Tag=01 that needs to 
>> be updated to clearly state that if your body or the entire SIP
>> message is > 4096 bytes, then tough luck. We can't chunk this as it would open 
>> up a can of worms that we don't want to deal with; like indexing the chunks 
>> and a reassembly scheme. Better not to go there. I will update the text 
>> to clearly state this point.
> 
> Now I'm confused. Why do you have the ability to repeat these tags?

It *is* currently confusing. The updated text will remove this ability to have multiple 
Tag -01 and -02 entries. The only repeated Tag that will be allowed is -00 ONLY when 
logging an optional field that occurs more than once in a SIP message (e.g.  Contact, 
Route, Record-Route, etc.).

>>> 11) Where does the document talk about how to log fields that are line folded
>>> in the original message? Is it ok for the logged field to contain CRLFs, or is
>>> the intent that the field be unfolded before logging?
>> 
>> I will update the text to indicate that:
>> 
>> For mandatory fields:
>> 
>>  The mandatory fields MUST NOT contain CRLF when logged.
>> 
>> For optional fields:
>> 
>> If an optional field contains a CRLF as part of its normal 
>> production rule, the CRLF MUST be escaped by using the 
>> URI encoded equivalent value of "%0D%0A".
> Why are you treating optional fields and mandatory fields differently here?
> If there's a good reason, fine, but be sure to include the reason in the document.

I'm simply laying out the escape rules for the various CLF log elements.

>> For logging bodies:
>> 
>> All CRLF digraphs in the SIP body MUST be escaped by using the 
>> URI encoded equivalent value of "%0D%0A"
>> 
>> For logging the entire SIP message:
>> 
>> All instances of CRLF digraphs, whether they appear in the SIP
>> headers or the body MUST be escaped by using the URI encoded 
>> equivalent value of "%0D%0A".
> Even when the body is binary (see previous discussion about escaping)?
> The group should weigh in on whether this is what they were expecting for
> the usual application/sdp case.

I think _ANY_ time a CRLF is present in a body it is escaped. If it isn't there in a 
binary body, then there is no need to escape.

>>> Minor
>>> 
>>> There is a maintenance risk in having all the detail of each block duplicated
>>> in Figure 1. Please consider taking that figure up a level, showing the three
>>> blocks and how they're separated, and let the other figures show the internal
>>> detail.
>> 
>> This is something that we have discussed and have not found an elegant solution for. 
>> It was thought to be very valuable to display the pertinent portion of the complete 
>> CLF within the section where that was discussed. I'll do my best to ensure that they 
>> are all copied over properly from Figure 1.
> This has been a common theme in the IESG reviews of many recent documents. The issue
> is not the care you take while editing this document. It's the burden you put on whoever
> ends up with the pen on an update.

I certainly understand. I just don't have a better solution at this time.

>>> There is a lot of descriptive text in 4.2's sections on the server and client
>>> transaction that is a copy of what's in the problem-statement/information model
>>> document. It would be better to just point to the description in the other document.
>> 
>> This is something that we debated about quite a bit. In the end we thought it more 
>> useful for implementers to have the description of all mandatory fields right in the 
>> format draft.
> Again, this is something that IESG review has put pressure against in many recent
> documents. Expect to defend this during that review. At the very least, make it clear
> which document is authoritative should they conflict.

I'll do my best to defend this because I think it is what is best for this document and its 
'one stop shop' readability for implementers. If this turns out to be a battle not worth 
waging with the IESG then I will concede and add a reference.

>>> Please consider scrubbing the message used in example 5 back to the minimal
>>> number of octets you need to make the example useful. In particular, consider
>>> removing elements that have not already been standardized. I'm surprised to see
>>> the literal strings "server-tx" and "client-tx" in the example log message. Was
>>> that what you really intended?
>> 
>> I'll look to trim down the sample message in Section 5 to a minimal set of fields. 
>> The use of server-txn and client-txn was intentional and done to parallel the usage in 
>> the data model draft. If you think a specific value is best, then I'll make one up.
> It's confusing as is - the example is supposed to be an actual log message, not a
> pseudo-code description of one isn't it?

Yes, it should be an actual log message. I'll update as requested.

Thanks,

Gonzalo

>> 
>>> 
>> 
> 
> RjS