Re: [sip-clf] I-D Action: draft-ietf-sipclf-format-07.txt
Peter Musgrave <musgravepj@gmail.com> Wed, 24 October 2012 17:01 UTC
Return-Path: <musgravepj@gmail.com>
X-Original-To: sip-clf@ietfa.amsl.com
Delivered-To: sip-clf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8762721F85DA for <sip-clf@ietfa.amsl.com>; Wed, 24 Oct 2012 10:01:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.598
X-Spam-Level:
X-Spam-Status: No, score=-3.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id TzDWISMg23I5 for <sip-clf@ietfa.amsl.com>; Wed, 24 Oct 2012 10:01:11 -0700 (PDT)
Received: from mail-vb0-f44.google.com (mail-vb0-f44.google.com [209.85.212.44]) by ietfa.amsl.com (Postfix) with ESMTP id 4F3DD21F8498 for <sip-clf@ietf.org>; Wed, 24 Oct 2012 10:01:11 -0700 (PDT)
Received: by mail-vb0-f44.google.com with SMTP id fc26so858551vbb.31 for <sip-clf@ietf.org>; Wed, 24 Oct 2012 10:01:10 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=0zr+OIEkOW1qbuLXrjzDjW1oHKu5IUM2d6JbAmOdlJU=; b=FD5Smr8esuHvX1Xw/HIXAgqll/PPi59e3Mk58i9s7HbCoQMCsS6lKHIMdeIOj00hZP O61gYSJA84temO34DwjJ2/T067C+NLCTByiN+W2Dx9D3YSu1wwS4MhnK96QG6XuEpegk 6hsD+gFw8/K+mRLaCS5nP8j4zRTXyzM0pFR84tkoOahgDsWgbOnfirlGpAixwF8+hZFp Tnu8wgcA4xoI0hiyfeklhYcmx1oUQf8xbL/TD/HNVqwzWTm/51vgTH8XvL2q/j8CTy0G nNPj2oLW2K7KffzI47fAxZ8iwUk9AAwasn1/j6N/PxSE5OVIhc9CbsjDEKYI6FGgd/AT 3N0Q==
MIME-Version: 1.0
Received: by 10.220.205.200 with SMTP id fr8mr8996511vcb.34.1351098070698; Wed, 24 Oct 2012 10:01:10 -0700 (PDT)
Received: by 10.58.216.72 with HTTP; Wed, 24 Oct 2012 10:01:10 -0700 (PDT)
In-Reply-To: <2771A52D-9AE5-460F-A896-888AB334153C@cisco.com>
References: <20121005015620.22856.1399.idtracker@ietfa.amsl.com> <869FCF91-1032-4411-A7D5-85CEE6F120E5@cisco.com> <50870CB8.40908@nostrum.com> <5A63A1D1-5D2A-4EA8-9E7A-CDA3C9668DE5@cisco.com> <508719B1.4090108@nostrum.com> <2771A52D-9AE5-460F-A896-888AB334153C@cisco.com>
Date: Wed, 24 Oct 2012 13:01:10 -0400
Message-ID: <CAJH01taV7oGn8DX6sRR26nt8JfX7J7WBZkfxG9ujTm-6JG=7TQ@mail.gmail.com>
From: Peter Musgrave <musgravepj@gmail.com>
To: Gonzalo Salgueiro <gsalguei@cisco.com>
Content-Type: multipart/alternative; boundary="14dae9ccd4b68f8b8904ccd108d7"
Cc: "sip-clf@ietf.org Mailing" <sip-clf@ietf.org>
Subject: Re: [sip-clf] I-D Action: draft-ietf-sipclf-format-07.txt
X-BeenThere: sip-clf@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: SIP Common Log File format discussion list <sip-clf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-clf>
List-Post: <mailto:sip-clf@ietf.org>
List-Help: <mailto:sip-clf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-clf>, <mailto:sip-clf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 Oct 2012 17:01:12 -0000
Sorry to be so absent. Great that we're gonna get this puppy done. I will read through front to back this weekend. Can I ask others to look at the diffs and comment by Monday Oct 29th? Then it seems we'll need a minor fix up to -08 and we're good to write this up again. If anyone wants more time, please let me know. Regards, Peter Musgrave (as WG chair) On Wed, Oct 24, 2012 at 2:30 AM, Gonzalo Salgueiro <gsalguei@cisco.com>wrote: > Adam - > > Let me first state that I'm certainly no expert in UTF-8 and will happily > defer to you on this. I certainly get your point about octets versus > characters, especially since UTF-8 allows for multi-byte characters. When > I made the statement "outside the UTF-8 character range" I was referring to > the discrete set of characters from U+0000 to U+7FFFFFFF (or U+10FFFF if > using the restricted RFC 3629 definition). I assumed this was a clearly > bounded set. > > To be clear, my only intent was to expand your original definition of > 'unprintable' beyond the C0 control codes (U+0000 to U+001F, U+007F) to > also include the well known C1 control codes (U+0080 to U+009F). To avoid > confusion, I should have used the Unicode code point notation. I certainly > don't want to try and go down the road of specifying what characters are > 'printable', but I thought C1 control codes were worth stating explicitly > in the same way you did C0 control codes. Your original definition seemed > unnecessarily restrictive since the C1 control codes, for example, are not > printable but are certainly a valid UTF-8 sequence of octets greater than > or equal to 128. Does that make sense? > > If you think your original definition of 'unprintable' is broad enough, > then I will use that. > > Cheers, > > Gonzalo > > On Oct 23, 2012, at 6:26 PM, Adam Roach wrote: > > > On 10/23/12 16:56, Oct 23, Gonzalo Salgueiro wrote: > >> "For the purposes of this document, we define 'unprintable' to mean a > string of octets that: (a) contains an octet with a value in the range of 0 > to 31, inclusive; (b) contains an octet with a value of 127, (c) contains > any octet greater than or equal to 128 which is a formatting or control > character (such as 128 to 159) within the UTF-8 character set; or (d) falls > outside the UTF-8 character range, as specified by [UNICODE]." > >> > >> Does that sound ok? > > > > I think we're still talking past each other here. > > > > "Outside the UTF-8 character range" simply isn't a sensible thing to > say. What we're talking about putting into a log record is a series of > *octets*, not a series of *characters*. UTF-8 is an encoding that defines > how octets are put together to make characters. > > > > Once you start talking about the octets as if they *are* characters, > you're conflating two very different things. So, for example, you can't > talk about "a string of octets that... falls outside the UTF-8 character > range." > > > > You can talk about a string of bytes that does not form a valid UTF-8 > sequence, and that's almost certainly what you want to say here. > > > > I'm also getting a bit lost in what you mean when you say "which is a > formatting or control character (such as 128 to 159)." Keep in mind that > we're still talking about *octets* here, not characters. In UTF-8, there's > nothing special about an octet with a value of 128. There's nothing special > about an octet with a value of 159. Both can appear as the second octet in > a two-octet character. Or the second or third octet in a three-octet > character. And so on. The same goes for everything between 128 and 191. > > > > Now, octet values of 192, 193, and 245-255 won't appear in valid UTF-8. > If we wanted to be abundantly careful, we could call those out as being > invalid. But I think we catch those just fine if we talk about octets that > form valid UTF-8 sequences. > > > > Or are you meaning to call out UTF-8 code points like U+0080 (the > Latin-1 padding character)? Because that has nothing to do with an *octet* > with a value of 128. It would be encoded as a two-octet sequence starting > with 194. However, if we're intending to go down the rabbit hole of making > decisions about whether to Base-64 encode based on which UTF-8 codepoints > we want to consider "printable," then we've got years of draft refinement > ahead of us (I can already imagine the right-to-left mark arguments). That > way lies madness. > > > > All of which is a very long winded way to say: octets are not characters > and characters are not octets; and you need to write the text in a way that > does not mix them with each other. > > > > /a > > > > _______________________________________________ > sip-clf mailing list > sip-clf@ietf.org > https://www.ietf.org/mailman/listinfo/sip-clf >
- [sip-clf] I-D Action: draft-ietf-sipclf-format-07… internet-drafts
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Robert Sparks
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Adam Roach
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Adam Roach
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Peter Musgrave
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Adam Roach
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Peter Musgrave
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
- Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Peter Musgrave