Re: empty quoted strings and other oddities

Keith Moore <moore@cs.utk.edu> Thu, 03 October 2002 13:16 UTC

Received: (from majordomo@localhost) by above.proper.com (8.11.6/8.11.3) id g93DGPF28213 for ietf-822-bks; Thu, 3 Oct 2002 06:16:25 -0700 (PDT)
Received: from astro.cs.utk.edu (astro.cs.utk.edu [160.36.58.43]) by above.proper.com (8.11.6/8.11.3) with ESMTP id g93DGNv28206 for <ietf-822@imc.org>; Thu, 3 Oct 2002 06:16:23 -0700 (PDT)
Received: from astro.cs.utk.edu (localhost [127.0.0.1]) by astro.cs.utk.edu (cf 8.9.3) with ESMTP id g93DGJ002707; Thu, 3 Oct 2002 09:16:19 -0400 (EDT)
Message-Id: <200210031316.g93DGJ002707@astro.cs.utk.edu>
X-URI: http://www.cs.utk.edu/~moore/
From: Keith Moore <moore@cs.utk.edu>
To: Gary Feldman <gaf@ziplink.net>
cc: ietf-822@imc.org
Subject: Re: empty quoted strings and other oddities
In-reply-to: (Your message of "Wed, 02 Oct 2002 22:28:34 EDT.") <000e01c26a84$9358e8a0$0201010a@alice>
Date: Thu, 03 Oct 2002 09:16:19 -0400
Sender: owner-ietf-822@mail.imc.org
Precedence: bulk
List-Archive: <http://www.imc.org/ietf-822/mail-archive/>
List-ID: <ietf-822.imc.org>
List-Unsubscribe: <mailto:ietf-822-request@imc.org?body=unsubscribe>

> The main problem with this analysis is that it ignores the comparison with
> semantic side catching of the problem.  Also, the issue at hand is a
> boundary condition, and hence the test cases need to be there, regardless 
> of how it's defined.

first of all, it's really not a boundary condition - it's just one example 
of an invalid address.   second, your conclusion doesn't follow even if
it is a boundary condition - all that is necessary is that it not cause
any problems.

(""@example.com is not inherently invalid, nor should we restrict strings
to having 1 or more characters - because there are cases where some strings 
need to be zero length)

> Finally, whether or not it actually causes that extra effort and cost
> of learning depends on how it's implemented in the first place.  

well, it's not like those thousands of existing implementations of email
software are going to change their implementation techniques just to
accomodate this case.  

as for use of lex, I've tried it.  it's not a good fit.  2822 syntax analysis 
is somewhat context-dependent (is "Mon" a day or an atom?),  good error 
recovery is difficult, and you need to have the lexical analyzer recover from
common syntax errors (such as putting dot in a phrase) in situations where
look-ahead is required to disambiguate that kind of error from other 
errors.

Having done both, I can confidently say that it's easier to write correct 
C code to do the scanning than to write correct lex code to do it.

> Granted, many implementations won't be so eloquent, but again, the 
> extent to which that should be allowed to hamper progress should be limited.

adding cruft to the standard (or to code) to detect cases which
never happen is not progress.  progress is increasing the reliability
of mail delivery.  if you want to make that kind of progress, you
need to do something about stupid spam filtering, DNS misconfiguration,
and MTA misconfiguration.  of course, it's much easier to think of 
solutions to problems that don't exist.

Keth