Re: Opaque right hand sides (was: Re: revised "generic syntax" internet draft)

Gary Adams - Sun Microsystems Labs BOS <> Thu, 24 April 1997 15:00 UTC

Received: from cnri by id aa25707; 24 Apr 97 11:00 EDT
Received: from services.Bunyip.Com by CNRI.Reston.VA.US id aa12589; 24 Apr 97 11:00 EDT
Received: (from daemon@localhost) by (8.8.5/8.8.5) id KAA11671 for uri-out; Thu, 24 Apr 1997 10:20:36 -0400 (EDT)
Received: from (mocha.Bunyip.Com []) by (8.8.5/8.8.5) with SMTP id KAA11664 for <>; Thu, 24 Apr 1997 10:20:32 -0400 (EDT)
Received: from mercury.Sun.COM by with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA20033 (mail destined for; Thu, 24 Apr 97 10:20:30 -0400
Received: from East.Sun.COM ([]) by mercury.Sun.COM (SMI-8.6/mail.byaddr) with SMTP id HAA00074; Thu, 24 Apr 1997 07:29:18 -0700
Received: from suneast.East.Sun.COM by East.Sun.COM (SMI-8.6/SMI-5.3) id KAA00009; Thu, 24 Apr 1997 10:20:21 -0400
Received: from zeppo.East.Sun.COM by suneast.East.Sun.COM (SMI-8.6/SMI-SVR4) id KAA26383; Thu, 24 Apr 1997 10:20:22 -0400
Received: from zeppo by zeppo.East.Sun.COM (SMI-8.6/SMI-SVR4) id KAA00985; Thu, 24 Apr 1997 10:14:32 -0400
Date: Thu, 24 Apr 1997 10:14:32 -0400
From: Gary Adams - Sun Microsystems Labs BOS <>
Reply-To: Gary Adams - Sun Microsystems Labs BOS <>
Subject: Re: Opaque right hand sides (was: Re: revised "generic syntax" internet draft)
Message-Id: <libSDtMail.9704241014.18562.gra@zeppo>
Mime-Version: 1.0
Content-Type: TEXT/plain; charset="us-ascii"
Content-Md5: GwfEuH95sLB7M4XA2TvRiA==
X-Mailer: dtmail 1.1.0 CDE Version 1.1 SunOS 5.5.1 sun4u sparc
Precedence: bulk

> From: John C Klensin <>
> Edward Cherlin wrote:
> >...
> > If I am going to create an ftp: site, and I don't check what version of
> > what ftp server I'm using, I'm a fool, and likewise for gopher: and telnet:
> > and the others. If I put out an https: URL and I don't have a secure server
> > to receive it, I'm a fool. If I intend to accept encoded UTF-8, I need to
> > find out how my server can deal with it. If I don't intend to accept it, I
> > can regard encoded UTF-8 in URLs as plain ASCII, without breaking any
> > process that is not already broken.
> >...
> I think, with the help of the above, I've finally figured 
> out what is going on here and why we have a seemingly- 
> insurmountable communications disconnect.  In case others 
> have been as confused as I have been and in the hope that 
> this might help, let's step away from character sets for a 
> moment and look at a broader question.

Good idea!

> With email, we've made a very careful distinction between 
> the "local-part" and the "domain-name".    The latter has 
> to be resolvable by the DNS and must obey its rules, 
> whatever they are.   The local part is defined as opaque to 
> everything but the target system -- the one named in the 
> "domain-name" (or indirected via an MX record, which has no 
> analogy here).   There are some very low level syntax rules 
> to which it must conform --e.g., seven bits and quoting if 
> certain classes of characters appear-- but the 
> oft-repeated, and very important, rule is that _nothing_ 
> besides a delivery host gets to interpret or revise the 
> local part.  So, for example, sometimes a percent sign 
> denotes routing, and sometimes it is just part of an 
> address, and, in principle, sometimes it might introduce 
> encoding of something that, by prior agreement, sending and 
> receiving MUAs (but not anything in the intermediate 
> transport system) might construe as encoding for non-ASCII 
> characters.  As long as the "don't mess with the local 
> part" rules are strictly observed while the message is in 
> the transport system, everything works fine.

In practice the local-part can be open or closed in the information 
it conveys. e.g. first.lastname@system.domain or 1234.5678@service.domain .
For truely opaque handles, additional applications have recorded out
of band "metadata", such as address book utilities for "real name", etc.

> That is, more or less, the position I think Edward and 
> others are taking -- we can safely treat all of the URL 
> that follows the domain name as opaque and as something 
> that will be interpreted, like the local-part, only by 
> systems that --by prearrangement or good sense-- will know 
> how to interpret it correctly.

I'd be happier to hear "prearrangement" defined in terms of some directory
service mechanism, and "good sense" in terms of an "algorithm" for catching
exceptions (e.g.  Martin posted a "try utf8, else try native encoding" scenario
recently).  Best of all would be a self describing respresentation (wishful

> FTP, for example, is actually pretty similar: the form and 
> syntax of file names is that of the server and it is the 
> responsibity of the client to figure out, out of band if 
> necessary, what form the server uses and to adapt to it.  
> The protocol was carefully designed so that the arguments 
> to, e.g., RETR and STOR, could be treated as completely 
> opaque.
> Fortunately or unfortunately, URLs haven't been defined as
>       <protocol>://<domain>/<opaque-part>
> but, instead, with considerable syntax and semantics 
> attached to the RHS (after the domain-part -- I am 
> deliberately not using standard URI terminology here). 

Today in practice we actually have a <translucent-part> in the URL.  It must
obey a hierarchical component lookup mechanism to support relative URLs (e.g.,
/a/b/c, ../d/e/f,etc.). And where latin-1 characters are in use today, the URLs 
may also contain user friendly characters. If URLs were truely opaque, then
the perceived inequity in representation would not be considered a problem.