RE: Protocol Action: 'UTF-8, a transformation format of ISO 10646' to Standard (fwd)
Anton Okmianski <aokmians@cisco.com> Thu, 14 August 2003 22:19 UTC
Date: Thu, 14 Aug 2003 18:19:47 -0400
From: Anton Okmianski <aokmians@cisco.com>
Subject: RE: Protocol Action: 'UTF-8, a transformation format of ISO 10646' to Standard (fwd)
X-Message-ID:
Message-ID: <20140418112159.2560.26771.ARCHIVE@ietfa.amsl.com>
Rainer et al: I don't claim to have definitive answers here -- just some thoughts. Can we define a new syslog standard that is UTF-8 based? It will be backwards compatible for US-ASCII. If somebody fires US-ASCII only message encoded in UTF-8, it will be the same 7-bit stuff, right? So, this will be compliant with older syslog implementations. If somebody wants to fire a message with non US-ASCII characters in the syslog message, then they should only be fired to a syslog daemon implementation that supports the new standard and UTF-8. Is not this ok? Or do we want to state it is our goal that the legacy syslog implementation should be able to receive and store internationalized messages? Is it strictly necessary? Yes, it would reduce the need to upgrade infrastructure, but it would tie us to a less compact UTF-7. One other bad thing about UTF-7 is that it represents all US-ASCII unchanged *except* for "+" character because it is used as an escape character. I also think it has some restrictions on "\", "~" and a trailing "-". So, UTF-7 is not actually fully US-ASCII compatible, while UTF-8 is. Right? Other nastiness/annoyance is that I think UTF-7 allows multiple ways to encode the same thing. Because of the history of syslog where implementations appeared before the standard, I think it may be acceptable to eventually try to standardize things instead of supporting legacy application which are not even known to follow the standard anyway. The UTF-7 IETF standard itself suggests that UTF-8 should be followed anywhere where possible: "UTF-7 should normally be used only in the context of 7 bit transports, such as mail. In other contexts, straight Unicode or UTF-8 is preferred." I think we can afford to say that legacy syslog implementations do not have to deal with internationalized messages since they were not intended to. Then, it follows that we can afford to support UTF-8. Right? Supporting multiple encoding could be an answer, but not an elegant one. It would actually mean that new implementations need to support both UTF-7 and UTF-8. If we supported just UTF-8, then we may not even need any new headers in the message (except for maybe a language which is debatable). Just as anecdotal evidence... There is definitely momentum behind supporting just UTF-8 in new protocols. I was giving a related presentation today and people were somewhat skeptical when I mentioned potentially using UTF-7. Adopting UTF-7 does seems to people as a forward-looking move. I understand your concerns though. Just my 2.5 cents. Thanks for investigating all this! Anton. > -----Original Message----- > From: owner-syslog-sec@employees.org > [mailto:owner-syslog-sec@employees.org]On Behalf Of Rainer Gerhards > Sent: Thursday, August 14, 2003 4:41 PM > To: Chris Lonvick; syslog-sec@employees.org > Subject: RE: Protocol Action: 'UTF-8, a transformation format of ISO > 10646' to Standard (fwd) > > > Chris and all, > > I am still strugling with UTF-8 & ALL syslog RFCs. > > http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis > -05.txt, in > 4. says: > > " For the convenience of implementors using ABNF, a definition of > UTF-8 > in ABNF syntax is given here. > > A UTF-8 string is a sequence of octets representing a > sequence of UCS > characters. An octet sequence is valid UTF-8 only if it > matches the > following syntax, which is derived from the rules for > encoding UTF-8 > and is expressed in the ABNF of [RFC2234]. > > UTF8-octets = *( UTF8-char ) > UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4 > UTF8-1 = %x00-7F > UTF8-2 = %xC2-DF UTF8-tail > UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / > %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail ) > UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( > UTF8-tail ) / > %xF4 %x80-8F 2( UTF8-tail ) > UTF8-tail = %x80-BF > " > > If you look at this definition, 8 bit characters are > required. All of > the current RFCs/Ids describe 7 bit US-ASCII only. So I > don't see any > way to use UTF-8 in the current framework. > > Am I missing something? > > Rainer > > > > -----Original Message----- > > From: Chris Lonvick [mailto:clonvick@cisco.com] > > Sent: Thursday, August 14, 2003 3:48 PM > > To: syslog-sec@employees.org > > Subject: Protocol Action: 'UTF-8, a transformation format of > > ISO 10646' to Standard (fwd) > > > > > > Since we're on the subject. > > > > Thanks, > > Chris > > > > ---------- Forwarded message ---------- > > Date: Mon, 11 Aug 2003 16:17:04 -0400 > > From: The IESG <iesg-secretary@ietf.org> > > To: IETF-Announce: ; > > Cc: Internet Architecture Board <iab@iab.org>, > > RFC Editor <rfc-editor@rfc-editor.org> > > Subject: Protocol Action: 'UTF-8, > > a transformation format of ISO 10646' to Standard > > > > The IESG has approved the Internet-Draft 'UTF-8, a > > transformation format of ISO 10646' > > <draft-yergeau-rfc2279bis-05.txt> as a Standard. This > > document has been reviewed in the IETF but is not the product > > of an IETF Working Group. The IESG contact person is Ted Hardie. > > > > Technical Summary > > > > This document updates the specification of UTF-8, > > an encoding of the UCS which is designed to be > > compatible with many current applications and protocols. > > UTF-8 has the characteristic of preserving the full US-ASCII > > range, providing compatibility with file systems, parsers and > > other software that rely on US-ASCII values but are > > transparent to other values. This memo obsoletes and replaces > > RFC 2279. > > > > > > Working Group Summary > > > > This draft and the interoperability reports associated with > > it were discussed on the IETF-charsets@iana.org mailing list. > > Archives may be found at > > http://lists.w3.org/Archives/Public/ietf-> charsets/ among other > places. > > > > > > Protocol Quality > > > > This specification was reviewed for the IESG by Patrik Falstrom. > > > > > > > > > > > > ------------------------------
- Protocol Action: 'UTF-8, a transformation format … Chris Lonvick
- RE: Protocol Action: 'UTF-8, a transformation for… Rainer Gerhards
- RE: Protocol Action: 'UTF-8, a transformation for… Anton Okmianski
- RE: Protocol Action: 'UTF-8, a transformation for… Glen Zorn
- RE: Protocol Action: 'UTF-8, a transformation for… Chris Lonvick
- RE: Protocol Action: 'UTF-8, a transformation for… Rainer Gerhards
- RE: Protocol Action: 'UTF-8, a transformation for… Rainer Gerhards
- RE: Protocol Action: 'UTF-8, a transformation for… Anton Okmianski