Re: [sip-clf] I-D Action: draft-ietf-sipclf-format-07.txt

Adam Roach <adam@nostrum.com> Tue, 23 October 2012 22:27 UTC

Message-ID: <508719B1.4090108@nostrum.com>
Date: Tue, 23 Oct 2012 17:26:57 -0500
From: Adam Roach <adam@nostrum.com>
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:14.0) Gecko/20120713 Thunderbird/14.0
MIME-Version: 1.0
To: Gonzalo Salgueiro <gsalguei@cisco.com>
References: <20121005015620.22856.1399.idtracker@ietfa.amsl.com> <869FCF91-1032-4411-A7D5-85CEE6F120E5@cisco.com> <50870CB8.40908@nostrum.com> <5A63A1D1-5D2A-4EA8-9E7A-CDA3C9668DE5@cisco.com>
In-Reply-To: <5A63A1D1-5D2A-4EA8-9E7A-CDA3C9668DE5@cisco.com>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Received-SPF: pass (nostrum.com: 99.152.144.32 is authenticated by a trusted mechanism)
Cc: "sip-clf@ietf.org Mailing" <sip-clf@ietf.org>
Subject: Re: [sip-clf] I-D Action: draft-ietf-sipclf-format-07.txt
Precedence: list

On 10/23/12 16:56, Oct 23, Gonzalo Salgueiro wrote:
> "For the purposes of this document, we define 'unprintable' to mean a string of octets that: (a) contains an octet with a value in the range of 0 to 31, inclusive; (b) contains an octet with a value of 127, (c) contains any octet greater than or equal to 128 which is a formatting or control character (such as 128 to 159) within the UTF-8 character set; or (d) falls outside the UTF-8 character range, as specified by [UNICODE]."
>
> Does that sound ok?

I think we're still talking past each other here.

"Outside the UTF-8 character range" simply isn't a sensible thing to
say. What we're talking about putting into a log record is a series of
*octets*, not a series of *characters*. UTF-8 is an encoding that
defines how octets are put together to make characters.

Once you start talking about the octets as if they *are* characters,
you're conflating two very different things. So, for example, you can't
talk about "a string of octets that... falls outside the UTF-8 character
range."

You can talk about a string of bytes that does not form a valid UTF-8
sequence, and that's almost certainly what you want to say here.

I'm also getting a bit lost in what you mean when you say "which is a
formatting or control character (such as 128 to 159)." Keep in mind that
we're still talking about *octets* here, not characters. In UTF-8,
there's nothing special about an octet with a value of 128. There's
nothing special about an octet with a value of 159. Both can appear as
the second octet in a two-octet character. Or the second or third octet
in a three-octet character. And so on. The same goes for everything
between 128 and 191.

Now, octet values of 192, 193, and 245-255 won't appear in valid UTF-8.
If we wanted to be abundantly careful, we could call those out as being
invalid. But I think we catch those just fine if we talk about octets
that form valid UTF-8 sequences.

Or are you meaning to call out UTF-8 code points like U+0080 (the
Latin-1 padding character)? Because that has nothing to do with an
*octet* with a value of 128. It would be encoded as a two-octet sequence
starting with 194. However, if we're intending to go down the rabbit
hole of making decisions about whether to Base-64 encode based on which
UTF-8 codepoints we want to consider "printable," then we've got years
of draft refinement ahead of us (I can already imagine the right-to-left
mark arguments). That way lies madness.

All of which is a very long winded way to say: octets are not characters
and characters are not octets; and you need to write the text in a way
that does not mix them with each other.

[sip-clf] I-D Action: draft-ietf-sipclf-format-07… internet-drafts
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Robert Sparks
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Adam Roach
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Adam Roach
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Peter Musgrave
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Adam Roach
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Peter Musgrave
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Gonzalo Salgueiro
Re: [sip-clf] I-D Action: draft-ietf-sipclf-forma… Peter Musgrave