Re: [sip-clf] AD review: draft-ietf-sipclf-format-06

Gonzalo Salgueiro <gsalguei@cisco.com> Fri, 08 June 2012 19:02 UTC

Return-Path: <gsalguei@cisco.com>
X-Original-To: sip-clf@ietfa.amsl.com
Delivered-To: sip-clf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2721C11E810E for <sip-clf@ietfa.amsl.com>; Fri, 8 Jun 2012 12:02:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.599
X-Spam-Level:
X-Spam-Status: No, score=-2.599 tagged_above=-999 required=5 tests=[AWL=0.000, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5Ze9oTfhw-ms for <sip-clf@ietfa.amsl.com>; Fri, 8 Jun 2012 12:02:00 -0700 (PDT)
Received: from av-tac-rtp.cisco.com (av-tac-rtp.cisco.com [64.102.19.209]) by ietfa.amsl.com (Postfix) with ESMTP id 9DF0411E8111 for <sip-clf@ietf.org>; Fri, 8 Jun 2012 12:01:57 -0700 (PDT)
X-TACSUNS: Virus Scanned
Received: from chook.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-rtp.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id q58J1oJS013322 for <sip-clf@ietf.org>; Fri, 8 Jun 2012 15:01:50 -0400 (EDT)
Received: from rtp-gsalguei-8717.cisco.com (rtp-gsalguei-8717.cisco.com [10.116.61.56]) by chook.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id q58J1n6k029624; Fri, 8 Jun 2012 15:01:50 -0400 (EDT)
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset="iso-8859-1"
From: Gonzalo Salgueiro <gsalguei@cisco.com>
In-Reply-To: <4F9EE0A4.2000905@nostrum.com>
Date: Fri, 08 Jun 2012 15:01:49 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <BABA3C82-D90C-422D-A285-9E2902334573@cisco.com>
References: <4F9EE0A4.2000905@nostrum.com>
To: Robert Sparks <rjsparks@nostrum.com>
X-Mailer: Apple Mail (2.1278)
Cc: "sip-clf@ietf.org Mailing" <sip-clf@ietf.org>
Subject: Re: [sip-clf] AD review: draft-ietf-sipclf-format-06
X-BeenThere: sip-clf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: SIP Common Log File format discussion list <sip-clf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-clf>
List-Post: <mailto:sip-clf@ietf.org>
List-Help: <mailto:sip-clf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 08 Jun 2012 19:02:01 -0000

Robert - 

I apologize for my delayed response. Thanks for the continued detailed review and feedback. I'll respond to each of your points inline...

On Apr 30, 2012, at 2:57 PM, Robert Sparks wrote:

> Summary: there are still a few remaining clarifications to capture before IETFLC on this document.
> 
> This is an update to the AD review on -05. Thanks for addressing the majority of the concerns in that review. There are a few that remain, and the changes introduced some new ones.
> 

> 1) Thanks for adding the additional description motivating replacing tabs with spaces when creating log entries. The new text says that this replacement SHOULD occur when logging bodies or entire messages. Why is this a SHOULD? When would you not do this replacement?

I thought through this and can't recall any good reason why it isn't mandated for messages and bodies as well. We can change it to a MUST.

> The new text says the decision to do this substitution was based on there being "no known use of tabs in SIP messages". What possible unknown uses are there? It would be better to say "No standardized use of tabs". I also think you mean to scope that claim to SIP header fields. We allow logging arbitrary bodies, and I don't think we have done the research to claim that tabs are meaningless (other than as whitespace) in all possible body types.

I'm fine with your suggested text. I think we can even amplify it a bit to address your comments around scope. I propose something like:

The decision to replace tabs with spaces was based on there being
 no standardized use of tabs in SIP headers to convey any other
 meaning than whitespace.  Tabs may appear in message bodies, and
 in the event that the bodies are logged, the conversion to space
 may cause problems when reconstructing the body from the corresponding
 log entry.  The two consequences of the decision to replace tabs ...

Does that sound reasonable?

> 2) The description of escaping and encoding in Tag=01 is still ambiguous. You say you must base64 encode any binary body. You also say you must escape CRLFs. I suspect you intend for those to be mutually exclusive? What are you expecting the implementer to use to decide if the body is binary or not? We should be making much more precise use of the terms defined in the media type specifications to make this clear (to avoid things like encoding a body that's already encoded). 

Our intent is to be clear that CRLFs are to be escaped for ANY body type. Is your question about order of operations in regards to escaping CRLFs and base64 encoding a binary body  (something like MIME types of application/ISUP and application/QSIG)? I think we may need a larger discussion to see how the MIME framework specified binary bodies. I am not a MIME expert so I'd appreciate your guidance here on what you think would be the best approach to address your concern. Do we need to discuss with MIME folks? 

> I don't remember a response to my question about whether log readers should unescape anything (apologies if I'm just not finding a response as I'm rereading the threads).
> I think from the context of the conversations, the intent was that such a reader would never unescape (since you want to read these with grep and the like). Someone reading a logged body isn't going to be able to tell that the logging system re-encoded a body into base64 unless you're leaving evidence of having done something somewhere else. Are you?

The log reader does not unescape anything. Sure, some information may be lost along the lines of something like converting a binary body to base64 to log it. The reader may not know that the body was originally binary; however, the human user behind the log reader can infer more semantics if the need be.  The intent of logging has never been to faithfully recreate the bit-exact SIP message that was logged in the first place.

> 3) This sentence needs adjusting: " It should be noted that as a result of the escaping mechanisms used in this document ('-' and '?') a field that would normally be able to parse if it appeared in a SIP header (as opposed to a log file) may not be syntactically parsable by a SIP parser." I suggest this replacement: "It should be noted that any field value that is modified by the escaping mechanisms defined in this document before logging ('-','?', and CRLF) is likely no longer well-formed SIP and will fail when given to such a parser".

I'm perfectly fine with the suggested update. This issue and (2) above have me thinking that maybe we need to explicitly state in the draft that the intent of logging has never been to
faithfully recreate the bit-exact SIP message that was logged in the first place. Let me know your thoughts.

> 4) Section 4.2's description of To and From tags still doesn't say what to do when the tags aren't present (or am I missing that discussion)?

Later in that same section 4.2 we state:

An element will not always have an appropriate value to provide for
one of these fields, even when the field is required to appear in the
SIP CLF record. In such circumstances, when a given mandatory field
is not present then that empty field MUST be encoded as a single
horizontal dash ("-").

It was my impression that would be enough. Do you think we also need to add something in the description of the the To and From Tags? 

> 5) (Nit: the change of text from 'Internet-Draft' to 'RFC' highlighted a problem for me that was already in -05:
> I suggest this change:)
> OLD:
>    A Base64 encoded version of this log entry (without the changes
>    required to format it for an RFC) is shown below.
> NEW:
>    A Base64 encoded version of this log entry (modified as
>    required to format it for an RFC) is shown below.

I'd typically not have a problem with changing a nit like this. The new text seems to indicate that the Base64 encoded version is what has been modified to meet RFC formatting standards when that is just the opposite of what we are stating. We are stating that the base64 version is what is unencumbered by the formatting restrictions of an RFC unlike the non-Base64 version of the example. Do you see my point? Or am I being daft? Perhaps the following is better:

OLD:
   A Base64 encoded version of this log entry (without the changes
   required to format it for an RFC) is shown below.
NEW:
   A Base64 encoded version of this log entry (unaffected by the formatting restrictions imposed by an RFC) is shown below.

Does that meet the need?

Cheers,

Gonzalo