Re: [idn] conflicts with ACE and STD13

David Hopwood <david.hopwood@zetnet.co.uk> Sat, 10 November 2001 06:04 UTC

Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id BAA09237 for <idn-archive@lists.ietf.org>; Sat, 10 Nov 2001 01:04:44 -0500 (EST)
Received: from lserv by psg.com with local (Exim 3.33 #1) id 162Quf-000CXY-00 for idn-data@psg.com; Fri, 09 Nov 2001 21:42:41 -0800
Received: from irwell.zetnet.co.uk ([194.247.47.48] helo=zetnet.co.uk) by psg.com with esmtp (Exim 3.33 #1) id 162Que-000CXR-00 for idn@ops.ietf.org; Fri, 09 Nov 2001 21:42:40 -0800
Received: from zetnet.co.uk (man-s011.dialup.zetnet.co.uk [194.247.41.138]) by zetnet.co.uk (8.11.3/8.11.3/Debian 8.11.2-1) with ESMTP id fAA5fWE31565; Sat, 10 Nov 2001 05:41:33 GMT
Message-ID: <3BEC9947.29826266@zetnet.co.uk>
Date: Sat, 10 Nov 2001 03:04:39 +0000
From: David Hopwood <david.hopwood@zetnet.co.uk>
X-Mailer: Mozilla 4.7 [en] (WinNT; I)
X-Accept-Language: en-GB,en,fr-FR,fr,de-DE,de,ru
MIME-Version: 1.0
To: "Eric A. Hall" <ehall@ehsco.com>, idn@ops.ietf.org
Subject: Re: [idn] conflicts with ACE and STD13
References: <3BEBCC4E.17E6216C@ehsco.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNED MESSAGE-----

"Eric A. Hall" wrote:
> Three potential conflicts with ACE and STD13 labels:
> 
> 1) Easy one first. There is a potential security problem with ACE
> encodings of legacy LDH domains, in that it may be possible for a user to
> manually encode an LDH label and provide false glue by providing
> "bq--ehsco.com." which gets decoded as "ehsco.com.", particularly if a
> delegating entity doesn't prevent it. idna-02 says this is illegal for
> zones in particular, but it needs to happen anywhere that ACE is processed
> as rich data rather than LDH. We should just declare any ACE encoded LDH
> label as illegal to be rejected with extreme prejudice by any entity which
> encodes OR decodes ACE. I'm putting this in the next UDNS spec, btw.

This is also in the spec for my proposal. (Bear with me; it is very nearly
ready.)

> 2) ACE precludes certain characters from being stored, and delegates some
> of this process to idna's incoming filters.

ACE doesn't preclude any characters from being stored (if by ACE you mean
the encoding and decoding functions on labels). AFAICS neither does idna-04,
unless the name is explicitly required to be a hostname.

> However, idna is only
> concerned with host names, and some of the excluded values can be provided
> as binary domain names (hyphen at the beginning and end of a binary domain
> name is legal, for example). Will such strings blow up ACE? If so, these
> labels need to be excluded from ACE conversion the same way that LDH
> labels are. Can we get an enumeration on these?

All ASCII-only labels, i.e. *(U+0000..007F).

> 3) Similar problem exists with domain names that contain eight-bit
> characters outside LDH.

There is no such thing as an "eight-bit character". Domain names (in the
sense you mean here) are octet strings, not character strings. U+00xx is not
the same as the octet with hex value xx. The latter is allowed in a domain
name according to the current specs, but it has no defined interpretation
as a character.

An IDN proposal is therefore free to define any mapping it likes between
an octet string domain name, and the corresponding character string, as
long as it preserves the current mapping (i.e. US-ASCII) as a subset.
Actually, it may even change the current mapping, as long as this change
only affects names that are known not to be used in practice (for example,
names that start with an ACE prefix).

> However, whenever a UCS character in the range of U+0000 through U+00FF is
> provided, the software has to generate two output formats: one for the
> STD13 octet encoding required for binary domain names, and another to
> provide the ACE encoded representation of the canonical UCS character.

You're assuming that the current domain name <-> character string mapping
is defined by ISO-Latin-1. No such assumption is warranted: not all
octet string domain names will correspond to a Unicode string (just as not
all octet string domain names correspond to an ASCII string now).

The current requirements draft has a similar confusion between code points
and octets, as I pointed out in my suggested changes to it:

=====
draft-ietf-idn-requirements-08.txt:
# [3] The DNS protocol (the packet formats that go on the wire) MUST
# NOT limit the codepoints that can be used. A service defined on top of
# the DNS, for instance the IDN-to-address function, MAY limit the
# codepoints that can be used. The service descriptions MUST describe
# what limitations are imposed.

The packet formats that go on the wire use octet strings, not strings
of codepoints. In order to maintain compatibility with the requirements
of RFC 2181, it is the set of octet strings that must not be limited.

Also, there may be other restrictions on host names besides the set of
allowed codepoints (for example relating to mixing of left-to-right and
right-to-left scripts, or names that start with an ACE prefix).

=> [3] The DNS protocol (the packet formats that go on the wire) MUST
=> NOT limit, apart from in length, the set of octet strings that can be
=> used as an encoded domain name. A service defined on top of the DNS,
=> for instance the IDN-to-address function, MUST define a mapping between
=> host name strings and these octet string encodings, and MAY impose
=> limitations on host names, for example by restricting the set of
=> allowed codepoints. The service descriptions MUST describe what
=> limitations are imposed.

- -- 
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO+yZITkCAxeYt5gVAQERuAgAsiEo2h1YuQbI5chEaoFOaR8L1wufT0iK
O8CJIHRQ2zTZWO5uJWLtEy+PtBaepEODR4ewTRoRwU8ZIAcZJrefpM8GAjYAhFTC
bcmH/A5U3DN5ujBwj3Mc/OImzAgx0iqDbFNKEqIT4h7y2vP3UXBN/pGxe5QnSXbN
4S//fEDu5NvKS+Epd5hDlmtXt2ShSyrja9CGzQcFYYJtL3YJ2/cgtkfGX8anMsLL
pjoLPR68XDt7pIX64JY2fihlmSgmkqfLCFPnV2H2/+hTAidZiCBbBobDALrXES+s
6og/wqCWUR+aBpm2lgLjq0lZjCfOwSucAVTiPXJDb30z8rsVI3F//g==
=P825
-----END PGP SIGNATURE-----