Re: [sip-clf] AD review: draft-ietf-sipclf-format-06

Gonzalo Salgueiro <gsalguei@cisco.com> Fri, 13 July 2012 00:09 UTC

Return-Path: <gsalguei@cisco.com>
X-Original-To: sip-clf@ietfa.amsl.com
Delivered-To: sip-clf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id B374411E80BD for <sip-clf@ietfa.amsl.com>; Thu, 12 Jul 2012 17:09:05 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.724
X-Spam-Level:
X-Spam-Status: No, score=-2.724 tagged_above=-999 required=5 tests=[AWL=-0.125, BAYES_00=-2.599]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id rVkUDxsKnhOU for <sip-clf@ietfa.amsl.com>; Thu, 12 Jul 2012 17:09:04 -0700 (PDT)
Received: from av-tac-rtp.cisco.com (av-tac-rtp.cisco.com [64.102.19.209]) by ietfa.amsl.com (Postfix) with ESMTP id AE56A11E80A2 for <sip-clf@ietf.org>; Thu, 12 Jul 2012 17:09:04 -0700 (PDT)
X-TACSUNS: Virus Scanned
Received: from chook.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-rtp.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id q6D09c3t027322 for <sip-clf@ietf.org>; Thu, 12 Jul 2012 20:09:38 -0400 (EDT)
Received: from dhcp-10-150-53-182.cisco.com (dhcp-10-150-53-182.cisco.com [10.150.53.182]) by chook.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id q6D09ZmJ001643; Thu, 12 Jul 2012 20:09:38 -0400 (EDT)
Mime-Version: 1.0 (Apple Message framework v1278)
Content-Type: text/plain; charset=iso-8859-1
From: Gonzalo Salgueiro <gsalguei@cisco.com>
In-Reply-To: <4FFF5169.6040108@nostrum.com>
Date: Thu, 12 Jul 2012 20:09:32 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <61DC09AD-2C4A-4A8A-B6C7-7314C42C1E52@cisco.com>
References: <4F9EE0A4.2000905@nostrum.com> <BABA3C82-D90C-422D-A285-9E2902334573@cisco.com> <4FDA2807.6080802@nostrum.com> <066F0992-9BFD-4CC2-81FF-0697529E179F@cisco.com> <4FFF5169.6040108@nostrum.com>
To: Robert Sparks <rjsparks@nostrum.com>
X-Mailer: Apple Mail (2.1278)
Cc: "sip-clf@ietf.org Mailing" <sip-clf@ietf.org>
Subject: Re: [sip-clf] AD review: draft-ietf-sipclf-format-06
X-BeenThere: sip-clf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: SIP Common Log File format discussion list <sip-clf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-clf>
List-Post: <mailto:sip-clf@ietf.org>
List-Help: <mailto:sip-clf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Jul 2012 00:09:05 -0000

On Jul 12, 2012, at 6:36 PM, Robert Sparks wrote:

> Trimming to one point and replying inline. (Your proposals for everything else are fine).
> 
> On 7/12/12 4:40 PM, Gonzalo Salgueiro wrote:
>> 
>>>>> 2) The description of escaping and encoding in Tag=01 is still ambiguous. You say you must base64 encode any binary body. You also say you must escape CRLFs. I suspect you intend for those to be mutually exclusive? What are you expecting the implementer to use to decide if the body is binary or not? We should be making much more precise use of the terms defined in the media type specifications to make this clear (to avoid things like encoding a body that's already encoded).
>>>> Our intent is to be clear that CRLFs are to be escaped for ANY body type. Is your question about order of operations in regards to escaping CRLFs and base64 encoding a binary body  (something like MIME types of application/ISUP and application/QSIG)?
>>> application/jpg. Are you going to escape bits of a compressed picture that just happen to contain the CRLF sequence or not? What part of the text makes that clear?
>> I think I get what you are hinting at but I need to play it back to you for verification. The current text states:
>> 
>> =====
>> ...Note that binary bodies MUST be base64encoded to render them in the SIP CLF log file.
>> 
>> If an optionally logged SIP message body contains any CRLFs they MUST be escaped by using the URI encoded equivalent value of "%0D%0A".  This escaping mechanism applies to all body  types.
>> =====
>> 
>> So we don't make any distinction in treatment between the various possible body types. I don't believe that we should.
> You want to base64 application/jpg, but (generally) not application/sdp (It's possible to have non-ASCII range UTF-8 characters in SDP - would you encode such a body?)

No. I would only encode bodies that are decidedly "binary".
> 
>> I think what the document may be missing to make this escaping mechanism clear is the order of operation. I believe I need to explicitly state that the translation to base64 must occur before the escaping. This would eliminate any ambiguity about the possibility of ever having the escaped CRLF sequence of %0D%0A.
>> 
>> To your specific point, if a binary body (like an image) is present then it would have to be base64 encoded first and that base64 character stream could never include the CRLF escape sequence of %0D%0A because '%' is not a valid base64 character. Would this clarification in the text around order of operation address the ambiguity in escaping base64 encoded binary bodies?
>> 
>> 
> Making it clear that these are performed in that order is good.
> Including an example in the draft that shows the result of applying that ordering will help.

OK. I'll try and come up with a basic example that gets the concept across. Maybe I'll run through the exact one I did in this email thread and reference the message in S3.1.1.11 in RFC4474.
> 
> But I don't think we've discussed the part of my original question that had to do with how an implementation decides whether the type it's looking at is "binary" or not.
> What is the thing to look at that tells you to treat application/jpg different from application/sdp, and that different from an sdp body that contains some thing out of the ascii range? Just looking at the media type isn't going to be enough.
> 
> Is what you're really trying to say is "base64 encode any body that contains an octet outside <some limited range>"?  If so, you need something in the message that tells you that you've done this encoding.

I agree that a complete list of "binary" Content-Types gets tricky and trying to only base64 encode portions of a multipart message becomes even more unwieldy. I think this is a problem left to the implementer and not one we need to be tackling. There are a myriad of different ways to skin this cat.  For example, the quick and dirty approach used by gnu grep, gnu diff is a quick check for the NUL byte to declare whether a file is "binary" or not:

not_text = (((binary_files == BINARY_BINARY_FILES && !out_quiet)
               || binary_files == WITHOUT_MATCH_BINARY_FILES)
              && memchr (bufbeg, eol ? '\0' : '\200', buflim - bufbeg));

Here are a variety of different checks using python (with varying levels of efficacy): 

http://stackoverflow.com/questions/898669/how-can-i-detect-if-a-file-is-binary-non-text-in-python/3002505#3002505

The verification of non-UTF-8 characters is something that should fall to the SIP CLF implementation IMO. Can we take this approach?

Thanks,

Gonzalo

> 
> RjS
> 
> 
>