RE: Protocol Action: 'UTF-8, a transformation format of ISO 10646' to Standard (fwd)

Rainer Gerhards <rgerhards@hq.adiscon.com> Fri, 15 August 2003 19:57 UTC

Date: Fri, 15 Aug 2003 21:57:37 +0200
From: Rainer Gerhards <rgerhards@hq.adiscon.com>
Subject: RE: Protocol Action: 'UTF-8, a transformation format of ISO 10646' to Standard (fwd)
X-Message-ID:
Message-ID: <20140418112159.2560.67045.ARCHIVE@ietfa.amsl.com>

Hi all,

I am replying to this message as one of the many great feedbacks I
received. If I sum it up, I think I am convinced now that I need to move
things toward UTF-8 support which means that the syslog packt format
needs to be re-specified. I am prepared to do so.

However, just let us step back a little and look why I was so hesistant
not to touch the current frame format. I strongly believe in the fact
that protocols should be done in layers. As such, I saw RFC 3195 &
syslog-sign which define the syslog message. I intended to add *a layer*
on top of them which supports international characters. However, I also
intended not to re-specify the complete syslog protocol with a new ID. I
have the impression that by doing so, we are mixing things that do not
necessariy belong together.

Keep in mind, flexible protocols - and expandable ones - are almost
every time implemented in layers. I do not like the idea that whenever I
specify a new feature on top of the existing protocol, I need to
respecify ist lower layer workings. In the current example, support for
UTF-8 means (and again, I am willing to take that route) that we can't
just add it on top of the current protocol but need to re-write/specify
the syslog format itself.

As I have said, I am perfectly willing to take that route and will
change the ID in this regard.

However, I would like to propose a change in the way we look at the
syslog protocol series. How about (trying to define) a basic syslog
protocol format that does not need to be changed with each new feature
and THEN map this format both to the lower lever transports as well as
upper-level issues like signing and internationalization. I think we
are now ready to take this route, thanks to the great work acomplished
so far. We have a good understanding of the reliability, security,
backward compatibility and internationalization features. If the WG
follows that route, we probably had to darft some more Ids... Let me
provide a sample:

Syslog-format - the "core" standard describing the packet format of a
syslog message. Maybe a large part from rfc 3195 could be the basis
together with syslog-sign and -international.

Syslog-sign, -international - added as upper layers on top of
syslog-format. They extend the format by new feature but do not change
it. Other, payload oriented, standards could follow here once the time
is ready for them.

Transport mappings - like rfc3164 and rfc 3195. They describe how the
syslog-format is mapped onto a specific transport. So this is the
lower-layer series. Again, these could be extended without changing the
upper layers. For example, I could imagine a mapping to use SNMP
trap/inform message to transport syslog-format messages.

As we are right now all concerned with BEEP, let me compare this
approach with what BEEP does. Syslog-format would be much like RFC 3080,
specifying the basic workings. Syslog-sign, -international and so on
would be much like BEEP profiles. And 3164 & 3195 would be transport
mappings like RFC 3081 is for BEEP.

Please note that not only BEEP follows this approach - many successful
protocol families do.

I know I am proposing quite a change in the current RFC series. I also
have probably overlooked something. I for sure know there are better and
more informed opinions then mine in the WG. Anyhow, I thought I drop
this concern, which was the basis of my sticking to UTF-7 in
- -international. I sincerely hope it is helpful and will serve a good
need. And as I said, I am ready to re-specify the complete syslog format
in -international.

Looking forward to your comments,
Rainer

> -----Original Message-----
> From: Anton Okmianski [mailto:aokmians@cisco.com]
> Sent: Friday, August 15, 2003 12:20 AM
> To: Rainer Gerhards; Chris Lonvick; syslog-sec@employees.org
> Subject: RE: Protocol Action: 'UTF-8, a transformation format
> of ISO 10646' to Standard (fwd)
>
>
> Rainer et al:
>
> I don't claim to have definitive answers here -- just some thoughts.
>
> Can we define a new syslog standard that is UTF-8 based?  It
> will be backwards compatible for US-ASCII.
>
> If somebody fires US-ASCII only message encoded in UTF-8, it
> will be the same 7-bit stuff, right?  So, this will be
> compliant with older syslog implementations.
>
> If somebody wants to fire a message with non US-ASCII
> characters in the syslog message, then they should only be
> fired to a syslog daemon
> implementation that supports the new standard and UTF-8.   Is not this
> ok?
>
> Or do we want to state it is our goal that the legacy syslog
> implementation should be able to receive and store
> internationalized messages?  Is it strictly necessary?  Yes,
> it would reduce the need to upgrade infrastructure, but it
> would tie us to a less compact UTF-7.
>
> One other bad thing about UTF-7 is that it represents all
> US-ASCII unchanged *except* for "+" character because it is
> used as an escape character. I also think it has some
> restrictions on "\", "~" and a trailing "-".  So, UTF-7 is
> not actually fully US-ASCII compatible, while UTF-8 is.
> Right?  Other nastiness/annoyance is that I think UTF-7
> allows multiple ways to encode the same thing.
>
> Because of the history of syslog where implementations
> appeared before the standard, I think it may be acceptable to
> eventually try to standardize things instead of supporting
> legacy application which are not even known to follow the
> standard anyway.
>
> The UTF-7 IETF standard itself suggests that UTF-8 should be
> followed anywhere where possible: "UTF-7 should normally be
> used only in the context of 7 bit transports, such as mail.
> In other contexts, straight Unicode or UTF-8 is preferred."
> I think we can afford to say that legacy syslog
> implementations do not have to deal with internationalized
> messages since they were not intended to. Then, it follows
> that we can afford to support UTF-8. Right?
>
> Supporting multiple encoding could be an answer, but not an
> elegant one.  It would actually mean that new implementations
> need to support both UTF-7 and UTF-8.  If we supported just
> UTF-8, then we may not even need any new headers in the
> message (except for maybe a language which is debatable).
>
> Just as anecdotal evidence... There is definitely momentum
> behind supporting just UTF-8 in new protocols.  I was giving
> a related presentation today and people were somewhat
> skeptical when I mentioned potentially using UTF-7.  Adopting
> UTF-7 does seems to people as a forward-looking move.
>
> I understand your concerns though.  Just my 2.5 cents.
>
> Thanks for investigating all this!
>
> Anton.
>
>
>
>
>
> > -----Original Message-----
> > From: owner-syslog-sec@employees.org
> > [mailto:owner-syslog-sec@employees.org]On Behalf Of Rainer Gerhards
> > Sent: Thursday, August 14, 2003 4:41 PM
> > To: Chris Lonvick; syslog-sec@employees.org
> > Subject: RE: Protocol Action: 'UTF-8, a transformation
> format of ISO
> > 10646' to Standard (fwd)
> >
> >
> > Chris and all,
> >
> > I am still strugling with UTF-8 & ALL syslog RFCs.
> >
> > http://www.ietf.org/internet-drafts/draft-yergeau-rfc2279bis
> > -05.txt, in
> > 4. says:
> >
> > "   For the convenience of implementors using ABNF, a definition of
> > UTF-8
> >    in ABNF syntax is given here.
> >
> >    A UTF-8 string is a sequence of octets representing a
> sequence of
> > UCS
> >    characters. An octet sequence is valid UTF-8 only if it
> matches the
> >    following syntax, which is derived from the rules for
> > encoding UTF-8
> >    and is expressed in the ABNF of [RFC2234].
> >
> >    UTF8-octets = *( UTF8-char )
> >    UTF8-char   = UTF8-1 / UTF8-2 / UTF8-3 / UTF8-4
> >    UTF8-1      = %x00-7F
> >    UTF8-2      = %xC2-DF UTF8-tail
> >    UTF8-3      = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2( UTF8-tail ) /
> >                  %xED %x80-9F UTF8-tail / %xEE-EF 2( UTF8-tail )
> >    UTF8-4      = %xF0 %x90-BF 2( UTF8-tail ) / %xF1-F3 3(
> > UTF8-tail ) /
> >                  %xF4 %x80-8F 2( UTF8-tail )
> >    UTF8-tail   = %x80-BF
> > "
> >
> > If you look at this definition, 8 bit characters are
> required. All of
> > the current RFCs/Ids describe 7 bit US-ASCII only. So I
> > don't see any
> > way to use UTF-8 in the current framework.
> >
> > Am I missing something?
> >
> > Rainer
> >
> >
> > > -----Original Message-----
> > > From: Chris Lonvick [mailto:clonvick@cisco.com]
> > > Sent: Thursday, August 14, 2003 3:48 PM
> > > To: syslog-sec@employees.org
> > > Subject: Protocol Action: 'UTF-8, a transformation format of ISO
> > > 10646' to Standard (fwd)
> > >
> > >
> > > Since we're on the subject.
> > >
> > > Thanks,
> > > Chris
> > >
> > > ---------- Forwarded message ----------
> > > Date: Mon, 11 Aug 2003 16:17:04 -0400
> > > From: The IESG <iesg-secretary@ietf.org>
> > > To: IETF-Announce:  ;
> > > Cc: Internet Architecture Board <iab@iab.org>,
> > >      RFC Editor <rfc-editor@rfc-editor.org>
> > > Subject: Protocol Action: 'UTF-8,
> > >      a transformation format of ISO          10646' to Standard
> > >
> > > The IESG has approved the Internet-Draft 'UTF-8, a transformation
> > > format of ISO 10646' <draft-yergeau-rfc2279bis-05.txt> as a
> > > Standard. This document has been reviewed in the IETF but
> is not the
> > > product of an IETF Working Group. The IESG contact person is Ted
> > > Hardie.
> > >
> > > Technical Summary
> > >
> > > This document updates the specification of UTF-8,
> > > an encoding of the UCS which is designed to be
> > > compatible with many current applications and protocols.
> UTF-8 has
> > > the characteristic of preserving the full US-ASCII range,
> providing
> > > compatibility with file systems, parsers and other software that
> > > rely on US-ASCII values but are transparent to other values. This
> > > memo obsoletes and replaces RFC 2279.
> > >
> > >
> > > Working Group Summary
> > >
> > > This draft and the interoperability reports associated
> with it were
> > > discussed on the IETF-charsets@iana.org mailing list.
> Archives may
> > > be found at http://lists.w3.org/Archives/Public/ietf-> charsets/
> > > among other
> > places.
> > >
> > >
> > > Protocol Quality
> > >
> > > This specification was reviewed for the IESG by Patrik Falstrom.
> > >
> > >
> > >
> > >
> > >
> >
> >
>
>
>

------------------------------