Re: [sip-clf] AD review: draft-ietf-sipclf-format-06

Gonzalo Salgueiro <gsalguei@cisco.com> Thu, 12 July 2012 21:40 UTC

Return-Path: <gsalguei@cisco.com>
X-Original-To: sip-clf@ietfa.amsl.com
Delivered-To: sip-clf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5FF5A11E80E3 for <sip-clf@ietfa.amsl.com>; Thu, 12 Jul 2012 14:40:19 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.742
X-Spam-Level:
X-Spam-Status: No, score=-2.742 tagged_above=-999 required=5 tests=[AWL=-0.143, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QkJSE+0Dwisy for <sip-clf@ietfa.amsl.com>; Thu, 12 Jul 2012 14:40:18 -0700 (PDT)
Received: from av-tac-rtp.cisco.com (av-tac-rtp.cisco.com [64.102.19.209]) by ietfa.amsl.com (Postfix) with ESMTP id 2878311E80D2 for <sip-clf@ietf.org>; Thu, 12 Jul 2012 14:40:16 -0700 (PDT)
X-TACSUNS: Virus Scanned
Received: from chook.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-rtp.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id q6CLen5I014028 for <sip-clf@ietf.org>; Thu, 12 Jul 2012 17:40:49 -0400 (EDT)
Received: from dhcp-10-150-53-182.cisco.com (dhcp-10-150-53-182.cisco.com [10.150.53.182]) by chook.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id q6CLenAi007028; Thu, 12 Jul 2012 17:40:49 -0400 (EDT)
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset=iso-8859-1
From: Gonzalo Salgueiro <gsalguei@cisco.com>
In-Reply-To: <4FDA2807.6080802@nostrum.com>
Date: Thu, 12 Jul 2012 17:40:49 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <066F0992-9BFD-4CC2-81FF-0697529E179F@cisco.com>
References: <4F9EE0A4.2000905@nostrum.com> <BABA3C82-D90C-422D-A285-9E2902334573@cisco.com> <4FDA2807.6080802@nostrum.com>
To: Robert Sparks <rjsparks@nostrum.com>
X-Mailer: Apple Mail (2.1278)
Cc: "sip-clf@ietf.org Mailing" <sip-clf@ietf.org>
Subject: Re: [sip-clf] AD review: draft-ietf-sipclf-format-06
X-BeenThere: sip-clf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: SIP Common Log File format discussion list <sip-clf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-clf>
List-Post: <mailto:sip-clf@ietf.org>
List-Help: <mailto:sip-clf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jul 2012 21:40:19 -0000

Robert - 

Inline responses to your comments/questions that required follow up.


On Jun 14, 2012, at 2:05 PM, Robert Sparks wrote:

> 
> 
> On 6/8/12 2:01 PM, Gonzalo Salgueiro wrote:
>> Robert -
>> 
>> I apologize for my delayed response. Thanks for the continued detailed review and feedback. I'll respond to each of your points inline...
>> 
>> On Apr 30, 2012, at 2:57 PM, Robert Sparks wrote:
>> 
>>> Summary: there are still a few remaining clarifications to capture before IETFLC on this document.
>>> 
>>> This is an update to the AD review on -05. Thanks for addressing the majority of the concerns in that review. There are a few that remain, and the changes introduced some new ones.
>>> 
>>> 1) Thanks for adding the additional description motivating replacing tabs with spaces when creating log entries. The new text says that this replacement SHOULD occur when logging bodies or entire messages. Why is this a SHOULD? When would you not do this replacement?
>> I thought through this and can't recall any good reason why it isn't mandated for messages and bodies as well. We can change it to a MUST.
>> 
>>> The new text says the decision to do this substitution was based on there being "no known use of tabs in SIP messages". What possible unknown uses are there? It would be better to say "No standardized use of tabs". I also think you mean to scope that claim to SIP header fields. We allow logging arbitrary bodies, and I don't think we have done the research to claim that tabs are meaningless (other than as whitespace) in all possible body types.
>> I'm fine with your suggested text. I think we can even amplify it a bit to address your comments around scope. I propose something like:
>> 
>> The decision to replace tabs with spaces was based on there being
>>  no standardized use of tabs in SIP headers to convey any other
>>  meaning than whitespace.  Tabs may appear in message bodies, and
>>  in the event that the bodies are logged, the conversion to space
>>  may cause problems when reconstructing the body from the corresponding
>>  log entry.  The two consequences of the decision to replace tabs ...
>> 
>> Does that sound reasonable?
> Yes
>> 
>>> 2) The description of escaping and encoding in Tag=01 is still ambiguous. You say you must base64 encode any binary body. You also say you must escape CRLFs. I suspect you intend for those to be mutually exclusive? What are you expecting the implementer to use to decide if the body is binary or not? We should be making much more precise use of the terms defined in the media type specifications to make this clear (to avoid things like encoding a body that's already encoded).
>> Our intent is to be clear that CRLFs are to be escaped for ANY body type. Is your question about order of operations in regards to escaping CRLFs and base64 encoding a binary body  (something like MIME types of application/ISUP and application/QSIG)?
> application/jpg. Are you going to escape bits of a compressed picture that just happen to contain the CRLF sequence or not? What part of the text makes that clear?

I think I get what you are hinting at but I need to play it back to you for verification. The current text states:

=====
...Note that binary bodies MUST be base64encoded to render them in the SIP CLF log file.

If an optionally logged SIP message body contains any CRLFs they MUST be escaped by using the URI encoded equivalent value of "%0D%0A".  This escaping mechanism applies to all body  types.
=====

So we don't make any distinction in treatment between the various possible body types. I don't believe that we should. I think what the document may be missing to make this escaping mechanism clear is the order of operation. I believe I need to explicitly state that the translation to base64 must occur before the escaping. This would eliminate any ambiguity about the possibility of ever having the escaped CRLF sequence of %0D%0A. 

To your specific point, if a binary body (like an image) is present then it would have to be base64 encoded first and that base64 character stream could never include the CRLF escape sequence of %0D%0A because '%' is not a valid base64 character. Would this clarification in the text around order of operation address the ambiguity in escaping base64 encoded binary bodies?

> Look at 3.1.1.11 in RFC4475 - how would that body be logged?

Looking at the message in S3.1.1.11 of RFC4475, the binary body (represented in hex) is:

3082015206092A864886F70D010702A08201433082013F0201013109300706052B0E03021A300B06092A864886F70D010701318201203082011C020101307C3070310B3009060355040613025553311330110603550408130A43616C69666F726E69613111300F0603550407130853616E204A6F7365310E300C060355040A1305736970697431293027060355040B13205369706974205465737420436572746966696361746520417574686F7269747902080195007102330113300706052B0E03021A300D06092A864886F70D01010105000481808EF466F948F0522DD2E5978E9D95AAE9F2FE15A06659716292E8DA2AA8D8350A68CEFFAE3CBD2BFF1675DDD5648E593DD64728F26220F7E941749E330D9A15EDABDB93D10C42102E7B7289D29CC0C9AE2EFBC7C0CFF9172F3B027E4FC027E1546DE4B6AA3ABB3E66CCCB5DD6C64B8383149CB8E6FF182D944FE57B65BC99D005

If I base64 encode this it produces six lines. When escaped according to this draft they become:

MIIBUgYJKoZIhvcNAQcCoIIBQzCCAT8CAQExCTAHBgUrDgMCGjALBgkqhkiG9w0BBwExggEgMIIB%0D%0AHAIBATB8MHAxCzAJBgNVBAYTAlVTMRMwEQYDVQQIEwpDYWxpZm9ybmlhMREwDwYDVQQHEwhTYW4g%0D%0ASm9zZTEOMAwGA1UEChMFc2lwaXQxKTAnBgNVBAsTIFNpcGl0IFRlc3QgQ2VydGlmaWNhdGUgQXV0%0D%0AaG9yaXR5AggBlQBxAjMBEzAHBgUrDgMCGjANBgkqhkiG9w0BAQEFAASBgI70ZvlI8FIt0uWXjp2V%0D%0Aquny/hWgZllxYpLo2iqo2DUKaM7/rjy9K/8Wdd3VZI5ZPdZHKPJiIPfpQXSeMw2aFe2r25PRDEIQ%0D%0ALntyidKcwMmuLvvHwM/5Fy87An5PwCfhVG3ktqo6uz5mzMtd1sZLg4MUnLjm/xgtlE/le2W8mdAF%0D%0A

> How would it be different if the 0D0A appeared in the binary body part?

If I insert the two bytes '0D0A' in this binary stream I get:

3082015206092A864886F70D010702A08201433082013F0201013109300706052B0E03021A300B06092A864886F70D010701318201203082011C020101307C3070310B3009060355040613025553311330110603550408130A43616C69666F726E69613111300F0603550407130853616E204A6F7365310E300C060355040A1305736970697431293027060355040B13205369706974205465737420436572746966696361746520417574686F7269747902080195007102330113300706052B0E03021A300D06092A864886F70D01010105000481808EF466F948F0522DD2E5978E9D95AAE9F2FE15A06659716292E8DA2AA8D8350A68CEFFAE3CBD2BFF1675DDD5648E593DD64728F26220F7E941749E330D9A15EDABDB93D10C42102E7B7289D29CC0C9AE2EFBC7C0CFF9172F3B027E4FC027E1546DE4B6AA3ABB3E66CCC0D0AB5DD6C64B8383149CB8E6FF182D944FE57B65BC99D005

Which when base64 encoded and the CRLF's escaped you would get:

MIIBUgYJKoZIhvcNAQcCoIIBQzCCAT8CAQExCTAHBgUrDgMCGjALBgkqhkiG9w0BBwExggEgMIIB%0D%0AHAIBATB8MHAxCzAJBgNVBAYTAlVTMRMwEQYDVQQIEwpDYWxpZm9ybmlhMREwDwYDVQQHEwhTYW4g%0D%0ASm9zZTEOMAwGA1UEChMFc2lwaXQxKTAnBgNVBAsTIFNpcGl0IFRlc3QgQ2VydGlmaWNhdGUgQXV0%0D%0AaG9yaXR5AggBlQBxAjMBEzAHBgUrDgMCGjANBgkqhkiG9w0BAQEFAASBgI70ZvlI8FIt0uWXjp2V%0D%0Aquny/hWgZllxYpLo2iqo2DUKaM7/rjy9K/8Wdd3VZI5ZPdZHKPJiIPfpQXSeMw2aFe2r25PRDEIQ%0D%0ALntyidKcwMmuLvvHwM/5Fy87An5PwCfhVG3ktqo6uz5mzMDQq13WxkuDgxScuOb/GC2UT+V7ZbyZ%0D%0A0AU=%0D%0A


So the '0D0A' appearing in a binary body would pose no issues so long as we base64 encode first and subsequently escape the CRLF's.  

>>  I think we may need a larger discussion to see how the MIME framework specified binary bodies. I am not a MIME expert so I'd appreciate your guidance here on what you think would be the best approach to address your concern. Do we need to discuss with MIME folks?
> Start with the questions above. Creating a few worked examples would help find the places we need to be clear.

The CRLFs in each multipart body that delimit the entity from the header will be escaped appropriately. I think this bit of clarification of the text on order of operation settles the issue but I would need your confirmation.

>>> I don't remember a response to my question about whether log readers should unescape anything (apologies if I'm just not finding a response as I'm rereading the threads).
>>> I think from the context of the conversations, the intent was that such a reader would never unescape (since you want to read these with grep and the like). Someone reading a logged body isn't going to be able to tell that the logging system re-encoded a body into base64 unless you're leaving evidence of having done something somewhere else. Are you?
>> The log reader does not unescape anything.
> The document should say that.
>> Sure, some information may be lost along the lines of something like converting a binary body to base64 to log it. The reader may not know that the body was originally binary; however, the human user behind the log reader can infer more semantics if the need be.  The intent of logging has never been to faithfully recreate the bit-exact SIP message that was logged in the first place.
> The document should say that something using the log can't determine the original encoding.

OK, I'll add something for this.

>> 
>>> 3) This sentence needs adjusting: " It should be noted that as a result of the escaping mechanisms used in this document ('-' and '?') a field that would normally be able to parse if it appeared in a SIP header (as opposed to a log file) may not be syntactically parsable by a SIP parser." I suggest this replacement: "It should be noted that any field value that is modified by the escaping mechanisms defined in this document before logging ('-','?', and CRLF) is likely no longer well-formed SIP and will fail when given to such a parser".
>> I'm perfectly fine with the suggested update. This issue and (2) above have me thinking that maybe we need to explicitly state in the draft that the intent of logging has never been to
>> faithfully recreate the bit-exact SIP message that was logged in the first place. Let me know your thoughts.
> Yep. That has ramifications on what these logs can get used for.

Ditto.

>> 
>>> 4) Section 4.2's description of To and From tags still doesn't say what to do when the tags aren't present (or am I missing that discussion)?
>> Later in that same section 4.2 we state:
>> 
>> An element will not always have an appropriate value to provide for
>> one of these fields, even when the field is required to appear in the
>> SIP CLF record. In such circumstances, when a given mandatory field
>> is not present then that empty field MUST be encoded as a single
>> horizontal dash ("-").
>> 
>> It was my impression that would be enough. Do you think we also need to add something in the description of the the To and From Tags?
> The turnaround time was long enough that I had to go back to the -05 review for context.
> 
> I still trip over the text in section 4.2 that you quote - the second sentence is really fuzzy.
> Is the field empty or not present? Even in context, the reader has to think about whether
> you mean "mandatory to log" or "mandatory in SIP" while reading "mandatory field" in that
> sentence. It is confusing.

I'll add some clarifying text indicating that "mandatory field" references the mandatory fields described in Section 4.2 of this document.

> This is compounded by the text in the To tag field discussion going out of its way to say
> "(if present)", but the text in the From tag field being silent about it. (That's the context
> that was missing from _this_ email message). That leads to people wondering what you
> meant by mandatory. Possible misinterpretations include "logging the To tag isn't mandatory
> if it's not present".
> 
> Please consider rewriting what you quote above, and treat To and From the same when you talk
> about their tags.

Good point. To and From should be synced as it could lead to misinterpretation. I'll do that.

>> 
>> 
>>> 5) (Nit: the change of text from 'Internet-Draft' to 'RFC' highlighted a problem for me that was already in -05:
>>> I suggest this change:)
>>> OLD:
>>>    A Base64 encoded version of this log entry (without the changes
>>>    required to format it for an RFC) is shown below.
>>> NEW:
>>>    A Base64 encoded version of this log entry (modified as
>>>    required to format it for an RFC) is shown below.
>> I'd typically not have a problem with changing a nit like this. The new text seems to indicate that the Base64 encoded version is what has been modified to meet RFC formatting standards when that is just the opposite of what we are stating. We are stating that the base64 version is what is unencumbered by the formatting restrictions of an RFC unlike the non-Base64 version of the example. Do you see my point? Or am I being daft? Perhaps the following is better:
>> 
>> OLD:
>>    A Base64 encoded version of this log entry (without the changes
>>    required to format it for an RFC) is shown below.
>> NEW:
>>    A Base64 encoded version of this log entry (unaffected by the formatting restrictions imposed by an RFC) is shown below.
> I see the disconnect now.
> How about this instead:
> 
> A bit-exact version of the actual log entry is provided here, Base64 encoded.

I think this is perfectly fine and will add it.

Thanks,

Gonzalo

>> 
>> Does that meet the need?
>> 
>> Cheers,
>> 
>> Gonzalo
>> 
>> 
>> 
>> 
>