Illegality of '~' in URLs

Olle Jarnefors <ojarnef@admin.kth.se> Mon, 15 January 1996 23:32 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa20334; 15 Jan 96 18:32 EST
Received: from CNRI.Reston.VA.US by IETF.CNRI.Reston.VA.US id aa20329; 15 Jan 96 18:32 EST
Received: from [192.77.55.2] by CNRI.Reston.VA.US id aa21961; 15 Jan 96 18:32 EST
Received: (from daemon@localhost) by services.bunyip.com (8.6.10/8.6.9) id QAA27055 for uri-out; Mon, 15 Jan 1996 16:22:49 -0500
Received: from mocha.bunyip.com (mocha.Bunyip.Com [192.197.208.1]) by services.bunyip.com (8.6.10/8.6.9) with SMTP id QAA27050 for <uri@services.bunyip.com>; Mon, 15 Jan 1996 16:22:46 -0500
Received: from othello.admin.kth.se by mocha.bunyip.com with SMTP (5.65a/IDA-1.4.2b/CC-Guru-2b) id AA04359 (mail destined for uri@services.bunyip.com); Mon, 15 Jan 96 16:22:43 -0500
Received: from mercutio.admin.kth.se by othello.admin.kth.se (5.65+bind 1.8+ida 1.4.2/4.0b) id AA07673; Mon, 15 Jan 96 22:22:37 +0100
Received: by mercutio.admin.kth.se (5.65+bind 1.8+ida 1.4.2/4.0) id AA05415; Mon, 15 Jan 96 21:46:47 +0100
Date: Mon, 15 Jan 96 21:46:47 +0100
Message-Id: <9601152046.AA05415@mercutio.admin.kth.se>
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Olle Jarnefors <ojarnef@admin.kth.se>
To: uri@bunyip.com
Cc: Olle Jarnefors <ojarnef@admin.kth.se>
Subject: Illegality of '~' in URLs
X-Orig-Sender: owner-uri@bunyip.com
Precedence: bulk

Repost of a message that probably got lost due to list manager
problems at bunyip.com. The phrase "this month" refers to the
time span 1 - 21 Dec 1995.

Date: Fri, 22 Dec 95 15:18:17 +0100
Message-Id: <9512221418.AA15880@mercutio.admin.kth.se>
Mime-Version: 1.0
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
From: Olle Jarnefors <ojarnef@admin.kth.se>
To: uri@bunyip.com
Cc: Peter Svanberg <psv@nada.kth.se>se>, Olle Jarnefors <ojarnef@admin.kth.se>
Subject: Illegality of '~' in URLs

14 % of the http: URLs used in messages to the html-wg mailing
list this month includes the character tilde, '~'. (See the list
at the bottom of this message.) Most or all of these originate
from the Unix operating system, I assume.

RFC 1738 doesn't allow this character in http: URLs, though:

> httpurl        = "http://" hostport [ "/" hpath [ "?" search ]]
> hpath          = hsegment *[ "/" hsegment ]
> hsegment       = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
> uchar          = unreserved | escape
> unreserved     = alpha | digit | safe | extra
> safe           = "$" | "-" | "_" | "." | "+"
> extra          = "!" | "*" | "'" | "(" | ")" | ","

Would it hurt to remove this restriction on URL syntax?
In http: URLs? In all URLs?

/Olle

--
Olle Jarnefors, Royal Institute of Technology, Stockholm <ojarnef@admin.kth.se>


Tilde-illegal URLs found in messages on the html-wg list 951201/951222
----------------------------------------------------------------------

http://homepage.interaccess.com/~driscoll/
http://infomatch.com/~haibeck
http://www.acl.lanl.gov/~rdaniel/
http://www.cs.columbia.edu/~william
http://www.cs.princeton.edu/~burchard/www/interactive/
http://www.dsv.su.se/~jpalme
http://www.spyglass.com/~eric/
http://www.ucc.ie/~pflynn/books/wwwbook.html

49 of the URLs did _not_ contain '~'.