Re: drums2?

Chris Newman <Chris.Newman@Sun.COM> Thu, 22 August 2002 17:18 UTC

Received: from localhost (localhost [[UNIX: localhost]]) by above.proper.com (8.11.6/8.11.3) id g7MHITv12344 for ietf-822-bks; Thu, 22 Aug 2002 10:18:29 -0700 (PDT)
Received: from kathmandu.sun.com (kathmandu.sun.com [192.18.98.36]) by above.proper.com (8.11.6/8.11.3) with ESMTP id g7MHIR212332 for <ietf-822@imc.org>; Thu, 22 Aug 2002 10:18:27 -0700 (PDT)
Received: from esunmail ([129.147.58.122]) by kathmandu.sun.com (8.9.3+Sun/8.9.3) with ESMTP id LAA22986 for <ietf-822@imc.org>; Thu, 22 Aug 2002 11:18:29 -0600 (MDT)
Received: from xpa-fe1 ([129.147.58.122]) by edgemail1.Central.Sun.COM (iPlanet Messaging Server 5.2 HotFix 0.8 (built Jul 12 2002)) with ESMTP id <0H19008LD9ESG3@edgemail1.Central.Sun.COM> for ietf-822@imc.org; Thu, 22 Aug 2002 11:18:29 -0600 (MDT)
Received: from dsl108-043.brandx.net ([209.55.108.43]) by mail.sun.net (iPlanet Messaging Server 5.2 HotFix 0.2 (built Apr 26 2002)) with ESMTPSA id <0H1900J8O9EO3B@mail.sun.net> for ietf-822@imc.org; Thu, 22 Aug 2002 11:18:28 -0600 (MDT)
Date: Thu, 22 Aug 2002 10:17:17 -0700
From: Chris Newman <Chris.Newman@Sun.COM>
Subject: Re: drums2?
In-reply-to: <H1953w.CIC@clw.cs.man.ac.uk>
To: Charles Lindsey <chl@clw.cs.man.ac.uk>, ietf-822@imc.org
Message-id: <2147483647.1030011437@dsl108-043.brandx.net>
MIME-version: 1.0
X-Mailer: Mulberry/3.0.0a3 (Mac OS X)
Content-type: text/plain; charset="us-ascii"; format="flowed"
Content-transfer-encoding: 7bit
Content-disposition: inline
X-message-flag: Outlook: the best virus distribution system around
References: <200208211545.g7LFjL019594@astro.cs.utk.edu> <H1953w.CIC@clw.cs.man.ac.uk>
Sender: owner-ietf-822@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-822/mail-archive/>
List-ID: <ietf-822.imc.org>
List-Unsubscribe: <mailto:ietf-822-request@imc.org?body=unsubscribe>

begin  quotation by Charles Lindsey on 2002/8/22 15:45 +0000:
> In <200208211545.g7LFjL019594@astro.cs.utk.edu> Keith Moore
> <moore@cs.utk.edu> writes:
>> 2. invalidate a huge number of existing user agents by creating a new
>> syntax for local parts that is incompatible with the old one.
>
> Do we actually know what part of the huge range of possibilities within
> the present syntax of local-part are actually in regular use in existing
> user agents or in actual email addresses?

No.  Although in 2822 we deprecated the most problematic constructs 
(particularly the ones permitted in 822 but not in 821).  It's risky to 
retroactively declare previously compliant constructs to be incompliant, 
even if we didn't know of any specific implementations.  So this is a case 
where what's legal by the standard is a superset including constructs which 
are not a good idea to use.

If you're curious, here's a writeup I did of special characters in email 
addresses:

Special Characters in Email Addresses

Summary: This discusses why it's safest to stick to alphanumerics when
creating email addresses on a system.  A set of five "conditionally safe"
characters (_.-=+) are reasonable to use in the middle of an address on
most systems, although if the system's primary subaddress delimiter is one
of these, it should be forbidden.

RFC 822 permits all ASCII characters in email local-parts. RFC 2822
permits all but NUL (ASCII 0x00).  But just because a character is
permitted doesn't mean it will interoperate in practice.  The morass of
escape characters, quoting schemes and encoding schemes which impact
various parts of real-world email systems make most US-ASCII punctuation
characters problematic in practice.

Always Safe Characters: a-z A-Z 0-9
  US-ASCII alphanumerics are always safe in email address local-parts.
  Systems are permitted to be case-sensitive, but most are case-insensitive
  in practice.

Conditionally Safe Characters (avoid at beginning of local part): _.-=+
  '_' is almost always safe, but I've seen it have special meaning at the
      beginning of a local part.  Many sites use a First_Last@domain
      naming scheme so it's likely to work.
  '.' is safe between words, but requires quoting at beginning, end or if
      doubled.  Many sites use a First.Last@domain naming scheme, so it's
      likely to work.  The CMU Cyrus server uses it as a mailbox hierarchy
      delimiter, so it's forbidden in user names on that system.
  '-' is safe on systems which don't use it as a subaddress delimiter.  On
      systems which do use it as a subaddress delimiter (qmail), it'd be
      safer not to use it in user names.  Many sites use it in non-human
      mailbox names (e.g. mailing lists).  Primary subaddress use is for
      mailing list "-request" and "-owner" suffixes (RFC 2142).
  '=' is safe on most systems which don't use it as a subaddress delimiter.
      But it is a URL reserved character (see below).
  '+' is safe on most systems which don't use it as a subaddress delimiter.
      It is the primary subaddress delimiter for iPlanet Messaging Server,
      SIMS, PMDF and CMU Cyrus (among others).  It shouldn't be used when
      creating an email address for a user on these systems, but should be
      permitted in address lists and general-purpose email address entry
      forms.  Note that it is a URL reserved character and I've seen it
      cause problems on some ill-designed web forms with an email address
      entry field.  These web forms should be fixed.

Confusing: ~`
  When email addresses are written down, these characters are likely to
  be confused with other characters (~ with - and ` with ').  Thus they
  are best avoided for human factors reasons.  In addition, not everyone
  is familiar with the word 'tilde'.  While '+' and 't' can also
  be confused, I haven't had any problems in practice since I started
  accentuating the horizontal line and reducing the vertical line when
  writing '+'.

Path Delimiter: /
  May be safe as long as it is not the first character (it sometimes
  indicates direct-file delivery as a leading character).  Many Unix
  systems forbid it in user mailbox names since the names are stored
  unquoted in the filesystem and '/' is the delimiter.  Sometimes used
  as a subaddress delimiter (e.g. RFC 2846), but not by iMS.

Shell Metacharacters: $&*?!~^()[]{}"'\|
  Internet mail is often stored in Unix directories whose name is the
  unquoted user name.  Thus characters which have special meanings to Unix
  shells are discouraged because they raise security issues on some
  Unix-based mail systems.  '*', '?', and '|' are most dangerous, while
  some of the others ('~') are not a big deal when not used at the
  beginning of a user name.

URL Reserved Characters (RFC 2396): ;/?:@&=+$,
  These have special syntactic meaning in various portions of generic URL
  syntax and thus some web systems will choose to encode them when they're
  used for something other than their URL-specific meaning.  Thus they
  might introduce errors.

URL Excluded (RFC 2396): {}|\^[]`<>#%" Controls (0x00-0x1F, 0x7F), SPACE 
(0x20)
  The standard requires these to be encoded in URLs, and many webmail
  services or email address entry forms are likely to make mistakes when
  its necessary to encode or decode these characters.

Email quoting required (RFC 821/822): ()<>[]@,:;"\
  Also Controls (ASCII 0x00-1F, 0x7F) and SPACE (ASCII 0x20)
  These all require quoted local-parts which some systems don't
  implement according to the standards (e.g. Exchange), so their use
  is discouraged.  @, " and \ are particularly problematic.

Eight-bit (RFC 821, RFC 2821):
  Characters with the high bit set are not permitted by current email
  standards. In the future, standards may be changed to permit _only_
  UTF-8 when negotiated, likely with a downconversion to a 7-bit
  encoding scheme yet to be determined (likely the same one used for
  international domain names).

C: NUL (ASCII 0x00)
  The NUL character is used to terminate C/C++ strings.  Since the majority
  of Internet software is written in C/C++, NUL won't work on most mail
  systems.  RFC 822 permits it, but RFC 2822 forbids it.

Local Routing character conventions: !%@
  These have been used to express routing on local systems.  Their use in
  user email addresses is thus discouraged to be extra safe given the
  concerns they raise.

IMAP modified UTF-7 (RFC 2060): &
  The '&' character is the escape character for IMAP modified UTF-7.
  Thus it is likely to cause problems in user names on an IMAP-based
  mailstore.

LDAP search filter specials (RFC 2254): *()\ and NUL
  More systems, including iPlanet Messaging Server, are using LDAP to
  locally route email.  These five characters need to be quoted in LDAP
  search filters, but the traditional LDAP C SDK makes this step easy to
  forget.

Javascript: '"\
  Most client-side web form validation is done using Javascript.  Thus the
  primary quoting characters in Javascript may be problematic in email
  addresses.