RE: Protocol Action: 'UTF-8, a transformation format of ISO 10646' to Standard (fwd)
Rainer Gerhards <rgerhards@hq.adiscon.com> Fri, 15 August 2003 19:57 UTC
Date: Fri, 15 Aug 2003 21:57:37 +0200
From: Rainer Gerhards <rgerhards@hq.adiscon.com>
Subject: RE: Protocol Action: 'UTF-8, a transformation format of ISO 10646' to Standard (fwd)
X-Message-ID:
Message-ID: <20140418112159.2560.67045.ARCHIVE@ietfa.amsl.com>
Hi all, I am replying to this message as one of the many great feedbacks I received. If I sum it up, I think I am convinced now that I need to move things toward UTF-8 support which means that the syslog packt format needs to be re-specified. I am prepared to do so. However, just let us step back a little and look why I was so hesistant not to touch the current frame format. I strongly believe in the fact that protocols should be done in layers. As such, I saw RFC 3195 & syslog-sign which define the syslog message. I intended to add *a layer* on top of them which supports international characters. However, I also intended not to re-specify the complete syslog protocol with a new ID. I have the impression that by doing so, we are mixing things that do not necessariy belong together. Keep in mind, flexible protocols - and expandable ones - are almost every time implemented in layers. I do not like the idea that whenever I specify a new feature on top of the existing protocol, I need to respecify ist lower layer workings. In the current example, support for UTF-8 means (and again, I am willing to take that route) that we can't just add it on top of the current protocol but need to re-write/specify the syslog format itself. As I have said, I am perfectly willing to take that route and will change the ID in this regard. However, I would like to propose a change in the way we look at the syslog protocol series. How about (trying to define) a basic syslog protocol format that does not need to be changed with each new feature and THEN map this format both to the lower lever transports as well as upper-level issues like signing and internationalization. I think we are now ready to take this route, thanks to the great work acomplished so far. We have a good understanding of the reliability, security, backward compatibility and internationalization features. If the WG follows that route, we probably had to darft some more Ids... Let me provide a sample: Syslog-format - the "core" standard describing the packet format of a syslog message. Maybe a large part from rfc 3195 could be the basis together with syslog-sign and -international. Syslog-sign, -international - added as upper layers on top of syslog-format. They extend the format by new feature but do not change it. Other, payload oriented, standards could follow here once the time is ready for them. Transport mappings - like rfc3164 and rfc 3195. They describe how the syslog-format is mapped onto a specific transport. So this is the lower-layer series. Again, these could be extended without changing the upper layers. For example, I could imagine a mapping to use SNMP trap/inform message to transport syslog-format messages. As we are right now all concerned with BEEP, let me compare this approach with what BEEP does. Syslog-format would be much like RFC 3080, specifying the basic workings. Syslog-sign, -international and so on would be much like BEEP profiles. And 3164 & 3195 would be transport mappings like RFC 3081 is for BEEP. Please note that not only BEEP follows this approach - many successful protocol families do. I know I am proposing quite a change in the current RFC series. I also have probably overlooked something. I for sure know there are better and more informed opinions then mine in the WG. Anyhow, I thought I drop this concern, which was the basis of my sticking to UTF-7 in - -international. I sincerely hope it is helpful and will serve a good need. And as I said, I am ready to re-specify the complete syslog format in -international. Looking forward to your comments, Rainer > -----Original Message----- > From: Anton Okmianski [mailto:aokmians@cisco.com] > Sent: Friday, August 15, 2003 12:20 AM > To: Rainer Gerhards; Chris Lonvick; syslog-sec@employees.org > Subject: RE: Protocol Action: 'UTF-8, a transformation format > of ISO 10646' to Standard (fwd) > > > Rainer et al: > > I don't claim to have definitive answers here -- just some thoughts. > > Can we define a new syslog standard that is UTF-8 based? It > will be backwards compatible for US-ASCII. > > If somebody fires US-ASCII only message encoded in UTF-8, it > will be the same 7-bit stuff, right? So, this will be > compliant with older syslog implementations. > > If somebody wants to fire a message with non US-ASCII > characters in the syslog message, then they should only be > fired to a syslog daemon > implementation that supports the new standard and UTF-8. Is not this > ok? > > Or do we want to state it is our goal that the legacy syslog > implementation should be able to receive and store > internationalized messages? Is it strictly necessary? Yes, > it would reduce the need to upgrade infrastructure, but it > would tie us to a less compact UTF-7. > > One other bad thing about UTF-7 is that it represents all > US-ASCII unchanged *except* for "+" character because it is > used as an escape character. I also think it has some > restrictions on "\", "~" and a trailing "-". So, UTF-7 is > not actually fully US-ASCII compatible, while UTF-8 is. > Right? Other nastiness/annoyance is that I think UTF-7 > allows multiple ways to encode the same thing. > > Because of the history of syslog where implementations > appeared before the standard, I think it may be acceptable to > eventually try to standardize things instead of supporting > legacy application which are not even known to follow the > standard anyway. > > The UTF-7 IETF standard itself suggests that UTF-8 should be > followed anywhere where possible: "UTF-7 should normally be > used only in the context of 7 bit transports, such as mail. > In other contexts, straight Unicode or UTF-8 is preferred." > I think we can afford to say that legacy syslog > implementations do not have to deal with internationalized > messages since they were not intended to. Then, it follows > that we can afford to support UTF-8. Right? > > Supporting multiple encoding could be an answer, but not an > elegant one. It would actually mean that new implementations > need to support both UTF-7 and UTF-8. If we supported just > UTF-8, then we may not even need any new headers in the > message (except for maybe a language which is debatable). > > Just as anecdotal evidence... There is definitely momentum > behind supporting just UTF-8 in new protocols. I was giving > a related presentation today and people were somewhat > skeptical when I mentioned potentially using UTF-7. Adopting > UTF-7 does seems to people as a forward-looking move. > > I understand your concerns though. Just my 2.5 cents. > > Thanks for investigating all this! > > Anton. > > > > > > > -----Original Message----- > > From: owner-syslog-sec@employees.org > > [mailto:owner-syslog-sec@employees.org]On Behalf Of Rainer Gerhards > > Sent: Thursday, August 14, 2003 4:41 PM > > To: Chris Lonvick; syslog-sec@employees.org > > Subject: RE: Protocol Action: 'UTF-8, a transformation > format of ISO > > 10646' to Standard (fwd) > > > > > > Chris and all, > > > > I am still strugling with UTF-8 & ALL syslog RFCs. > > > > http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis > > -05.txt, in > > 4. says: > > > > " For the convenience of implementors using ABNF, a definition of > > UTF-8 > > in ABNF syntax is given here. > > > > A UTF-8 string is a sequence of octets representing a > sequence of > > UCS > > characters. An octet sequence is valid UTF-8 only if it > matches the > > following syntax, which is derived from the rules for > > encoding UTF-8 > > and is expressed in the ABNF of [RFC2234]. > > > > UTF8-octets = *( UTF8-char ) > > UTF8-char = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4 > > UTF8-1 = %x00-7F > > UTF8-2 = %xC2-DF UTF8-tail > > UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) / > > %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail ) > > UTF8-4 = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3( > > UTF8-tail ) / > > %xF4 %x80-8F 2( UTF8-tail ) > > UTF8-tail = %x80-BF > > " > > > > If you look at this definition, 8 bit characters are > required. All of > > the current RFCs/Ids describe 7 bit US-ASCII only. So I > > don't see any > > way to use UTF-8 in the current framework. > > > > Am I missing something? > > > > Rainer > > > > > > > -----Original Message----- > > > From: Chris Lonvick [mailto:clonvick@cisco.com] > > > Sent: Thursday, August 14, 2003 3:48 PM > > > To: syslog-sec@employees.org > > > Subject: Protocol Action: 'UTF-8, a transformation format of ISO > > > 10646' to Standard (fwd) > > > > > > > > > Since we're on the subject. > > > > > > Thanks, > > > Chris > > > > > > ---------- Forwarded message ---------- > > > Date: Mon, 11 Aug 2003 16:17:04 -0400 > > > From: The IESG <iesg-secretary@ietf.org> > > > To: IETF-Announce: ; > > > Cc: Internet Architecture Board <iab@iab.org>, > > > RFC Editor <rfc-editor@rfc-editor.org> > > > Subject: Protocol Action: 'UTF-8, > > > a transformation format of ISO 10646' to Standard > > > > > > The IESG has approved the Internet-Draft 'UTF-8, a transformation > > > format of ISO 10646' <draft-yergeau-rfc2279bis-05.txt> as a > > > Standard. This document has been reviewed in the IETF but > is not the > > > product of an IETF Working Group. The IESG contact person is Ted > > > Hardie. > > > > > > Technical Summary > > > > > > This document updates the specification of UTF-8, > > > an encoding of the UCS which is designed to be > > > compatible with many current applications and protocols. > UTF-8 has > > > the characteristic of preserving the full US-ASCII range, > providing > > > compatibility with file systems, parsers and other software that > > > rely on US-ASCII values but are transparent to other values. This > > > memo obsoletes and replaces RFC 2279. > > > > > > > > > Working Group Summary > > > > > > This draft and the interoperability reports associated > with it were > > > discussed on the IETF-charsets@iana.org mailing list. > Archives may > > > be found at http://lists.w3.org/Archives/Public/ietf-> charsets/ > > > among other > > places. > > > > > > > > > Protocol Quality > > > > > > This specification was reviewed for the IESG by Patrik Falstrom. > > > > > > > > > > > > > > > > > > > > > > ------------------------------
- Protocol Action: 'UTF-8, a transformation format … Chris Lonvick
- RE: Protocol Action: 'UTF-8, a transformation for… Rainer Gerhards
- RE: Protocol Action: 'UTF-8, a transformation for… Anton Okmianski
- RE: Protocol Action: 'UTF-8, a transformation for… Glen Zorn
- RE: Protocol Action: 'UTF-8, a transformation for… Chris Lonvick
- RE: Protocol Action: 'UTF-8, a transformation for… Rainer Gerhards
- RE: Protocol Action: 'UTF-8, a transformation for… Rainer Gerhards
- RE: Protocol Action: 'UTF-8, a transformation for… Anton Okmianski