Re: [idn] Overspecifications in draft-ietf-idn-requirements-08

David Hopwood <david.hopwood@zetnet.co.uk> Sat, 03 November 2001 01:43 UTC

Received: from psg.com (exim@psg.com [147.28.0.62]) by ietf.org (8.9.1a/8.9.1a) with ESMTP id UAA22894 for <idn-archive@lists.ietf.org>; Fri, 2 Nov 2001 20:43:44 -0500 (EST)
Received: from lserv by psg.com with local (Exim 3.33 #1) id 15zpUj-000F18-00 for idn-data@psg.com; Fri, 02 Nov 2001 17:21:09 -0800
Received: from irwell.zetnet.co.uk ([194.247.47.48] helo=zetnet.co.uk) by psg.com with esmtp (Exim 3.33 #1) id 15zpUi-000F12-00 for idn@ops.ietf.org; Fri, 02 Nov 2001 17:21:08 -0800
Received: from zetnet.co.uk (man-s204.dialup.zetnet.co.uk [194.247.45.75]) by zetnet.co.uk (8.11.3/8.11.3/Debian 8.11.2-1) with ESMTP id fA31KqL32654; Sat, 3 Nov 2001 01:20:52 GMT
Message-ID: <3BE321E9.C208EE34@zetnet.co.uk>
Date: Fri, 02 Nov 2001 22:44:57 +0000
From: David Hopwood <david.hopwood@zetnet.co.uk>
X-Mailer: Mozilla 4.7 [en] (WinNT; I)
X-Accept-Language: en-GB,en,fr-FR,fr,de-DE,de,ru
MIME-Version: 1.0
To: James Seng/Personal <jseng@pobox.org.sg>, idn@ops.ietf.org
Subject: Re: [idn] Overspecifications in draft-ietf-idn-requirements-08
References: <3BC28B99.E4E7CFB0@zetnet.co.uk> <01e001c163c3$768f9b60$0201000a@jamessonyvaio>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Sender: owner-idn@ops.ietf.org
Precedence: bulk
Content-Transfer-Encoding: 7bit

-----BEGIN PGP SIGNED MESSAGE-----

James Seng/Personal wrote:
> Some general comments:
> 
> 1. It is important for requirements to be simple and clear. The
> difficulties lies in striking a right balance between too much detail
> (i.e. too restrictive) and too little information (i.e. too vague).

In no case did my changes make a requirement either too restrictive or
too vague, AFAICS. They intentionally gave about the same level of
detail as the originals; they just correct some things that I consider
to be mistakes and overspecifications.

> 2. When the requirements was drafted, we work on a concept that *any*
> solutions, client or server, applications or infrastructure, any
> character set or multiples character set, any TES, any encodings etc are
> possible so long it meets these requirements.

Of course.

> OTOH, your comments made one basic assumption that IDNA-NAMEPREP-ACE is
> the solution. If IDN Protocol is IDNA-NAMEPREP-ACE, then obviously a lot
> of your comments would be right but that is hindsight.

The comments did not make that assumption. (For what it's worth, I very
much hope that IDNA-NAMEPREP-ACE is not the adopted solution, at least in
its current form.) They used IDNA as an example or counterexample in
some cases.

> 3. Incidently, the suggestion (not yours David), that requirements is
> biased to IDNA is groundless.

Since I didn't make that suggestion, it's not relevant here.
As I said in another post, IDNA violates some of the requirements as
stated in the current draft, as do other proposals that should not be
excluded.

> > > 6. A transfer encoding syntax (TES) is a reversible transform of
> > >    encoded data which may (or may not) include textual data
> > >    represented in one or more character encoding schemes. Examples:
> > >    8bit, Quoted-Printable, BASE64, UTF-7 (defunct), UTF-5, and RACE.
> >
> > This definition is never used.
> 
> ACE is an form of TES.

An ACE is not a TES, because the decoding function of a TES maps
restricted octet strings to arbitrary octet strings [*]. The decoding
function of an ACE maps restricted character strings to arbitrary
character strings. An ACE could be viewed as a specialised CES by
composing it with the US-ASCII decoding function (i.e.
ACE-as-CES-decode = ACE-decode o US-ASCII-decode), but it cannot
be viewed as a TES. UTF-7, UTF-5, and RACE are not TESes.

[*] 8bit, 7bit, quoted-printable, and base64 satisfy this definition,
    which is the definition used in MIME, where the term "transfer
    encoding syntax" comes from.

In any case, "transfer encoding syntax" and "TES" only appear once in
the document, in the above definition. The concept of a TES is not
important in IDN; we're trying to represent names that are character
strings, not octet strings. The DNS protocol can already handle opaque
octet strings in labels/names (except for folding of octets that
correspond to code units of uppercase and lowercase letters in US-ASCII).

> > > HTTP use the old service, it is a matter of great concern how the
> > > new and old services work together, and how other protocols can take
> > > advantage of the new service.
> >
> > IDN is not a new service; it makes more sense to consider it as an
> > extension of all the existing services.

I should have said here, "IDN is not necessarily a new service; it makes
at least as much sense to consider it as an extension of all the existing
services.".

> > For example, in IDNA, the
> > existing IP-to-hostname service can return an (ACE-encoded) IDN, or a
> > non-IDN query can follow a DNAME record that points to an IDN. These
> > cases wouldn't be possible if IDN was a separate service.
> 
> IDN is not a new service only if you assumed it is IDNA.

IDNA is a counterexample to the assertion that IDN is a new service.
Only one counterexample is needed.

> There are other proposal which make it a new service, e.g. IDNE, UDNS.

And there are still others for which it is not (e.g. just-send-UTF-8).
If it isn't a new service in all potential proposals, then it shouldn't
be stated as such in the requirements document.

> > > [1] The DNS is essential to the entire Internet. Therefore, the
> > > service MUST NOT damage present DNS protocol interoperability. It
> > > MUST make the minimum number of changes to existing protocols on
> > > all layers of the stack.
> >
> > Requiring the "minimum number of changes" fails to consider the cost
> > or feasibility of any change; it is requiring an absolute, which is
> > always a bad idea.
> >
> > > It MUST continue to allow any system anywhere that implements
> > > the IDN specification to resolve any internationalized domain name.
> >
> > "continue to" should be deleted. Obviously no system can resolve an
> > IDN at the moment.
> 
> Sound reasonable to me.
> 
> > > [3] The DNS protocol (the packet formats that go on the wire) MUST
> > > NOT limit the codepoints that can be used. A service defined on top
> > > of the DNS, for instance the IDN-to-address function, MAY limit the
> > > codepoints that can be used. The service descriptions MUST describe
> > > what limitations are imposed.
> >
> > The packet formats that go on the wire use octet strings, not strings
> > of codepoints. In order to maintain compatibility with the
> > requirements of RFC 2181, it is the set of octet strings that must
> > not be limited.
> 
> But the "string of codepoints" did get send over-the-wire.

If the name represents a character string, yes, but that's not the point:
RFC 2181 allows protocols that define DNS services to assume that the set
of octet strings that can be used in a label is not restricted (except for
length) [*]. This includes octet strings that are not valid in whatever
encoding is chosen for hostnames - just as octet strings that are not valid
in US-ASCII can be used now.

[*] The octet values 0x00 and 0x2E would cause lots of problems in practice,
    but that's not the point either.

> > > [4] The protocol MUST work for all features of DNS, IPv4, and
> > > IPv6. The protocol MUST NOT allow an IDN to be returned to a
> > > requestor that requests the IP-to-(old)-domain-name mapping service.
> >
> > This is unclear. Returning an ACE name to an "old" requestor will
> > clearly not break anything, and an ACE name is an (encoded) IDN. It
> > also doesn't take into account that some resolver interfaces are already
> > Unicode-aware, in which case they would not require any distinction
> > between old and new requests (this is true for InetAddress.getHostName
> > in the Java API, for example, or for getipnodebyaddr, etc. in Plan-9).
> >
> > => [4] The proposal MUST work for all features of DNS, IPv4, and IPv6.
> > => The proposal MUST ensure that the responses to requests for an IP
> > => to domain name mapping will not break existing requestors.
> 
> Again, you assumed IDNA as the solution.

I didn't. Other solutions may return ACE names to the requestor as well,
and the point about existing Unicode-aware APIs is independent of which
solution is used.

> But I like your wordings.
> 
> > > [11] The protocol should handle with care new revisions of the CCS.
> > > Undefined codepoints should not be allowed unless a new revision of
> > > the protocol can handle it. Protocol revisions should be tagged.
> >
> > The current version of nameprep allows unassigned code points in
> > queries without revision tagging, for good reasons.
> >
> > => [11] The proposal should handle with care new revisions of the CCS.
> > => Proposals MUST discuss how undefined codepoints are handled.
                                  ^^^^^^^^^ I meant "currently unassigned"
> 
> This is hindsight based on some agreements we have now but not a
> requirement.

The text currently says "Protocol revisions should be tagged.".
That is not a genuine requirement. Adequate handling of currently
unassigned codepoints, OTOH, *is* a genuine requirement, and always
was.

> > The overspecification here is "at a *single* ... place". For example,
> > if canonicalization is specified by nameprep, it is idempotent, i.e.
> > nameprep(nameprep(x)) = x.

Thinko; I meant nameprep(nameprep(x)) = nameprep(x).

> > So doing it more than once only hurts
> > efficiency, not interoperability or any other requirement. It doesn't
> > even hurt efficiency very much, since the common case where a name is
> > already in the correct form can be optimised.
> 
> True but not significant.

It is significant: the proposal for which I intend to submit a draft in
the next week relies on it.

> > > ... The protocol MUST specify canonicalization; ...
> >
> > This is meaningless without specifying what the goal of canonicalization
> > is. The minimum requirement is to ensure that characters that are
> > indistinguishable to users are treated the same, and so that is what
> > should be stated:
> 
> It is left to be vague so it can be defined or argued later. But of
> course, Nameprep have later made this much clearer.

I'll concede this point.

> > > [23] If other canonicalization is done, it MUST be done before the
> > > domain name is resolved.
> >
> > It makes perfect sense to do canonicalization as part of resolution,
> > not before it. Also, canonicalizing after resolution is certainly
> > feasible, even if it is inefficient.
> 
> Again, this is hindsight, but not at the time we draft the requirements.

I don't think it is hindsight; if I'd been involved with the WG at the
time the requirements were drafted, I would have seen this. The fact
remains that this sentence does not describe a genuine requirement.

> > > 3. Security Considerations
> > >
> > > Any solution that meets the requirements in this document MUST NOT
> > > be less secure than the current DNS.
> >
> > That is not necessarily achievable. The main issue is name spoofing
> > using look-alike characters: even if a proposal specifically tries to
> > address that (by registration procedures, for example), it can't
> > absolutely guarantee that there will not be cases of this that rely
> > on IDNs.
> 
> name spoofing of "look-alive" characters already exists in the DNS. IDN
> introduce more of the same problem but no more.

Characters that are exact look-alikes in almost all fonts do not exist in
the current DNS, and there are far fewer near look-alikes. This makes the
problem qualitatively more serious, and it means that we will have to rely
to some extent on registration procedures to try to prevent it. There is no
technical means, AFAICS, to guarantee that this will not result in spoofing
being easier than it is now. (If there is, I'd be happy to hear about it.)

- -- 
David Hopwood <david.hopwood@zetnet.co.uk>

Home page & PGP public key: http://www.users.zetnet.co.uk/hopwood/
RSA 2048-bit; fingerprint 71 8E A6 23 0E D3 4C E5  0F 69 8C D4 FA 66 15 01
Nothing in this message is intended to be legally binding. If I revoke a
public key but refuse to specify why, it is because the private key has been
seized under the Regulation of Investigatory Powers Act; see www.fipr.org/rip


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: noconv

iQEVAwUBO+MgcDkCAxeYt5gVAQEJQAf/eDvEJ2ml7oE/QfR05udW2FuVt5m5NW9O
wDigWyS68MwAPvMjOyePcW96zzxEt/EGWlA+P7jP6nWMrsqxZtcBiGaCZJsBleNZ
LCrlV+aDr37PJ6cYq9Ui2HaPzXhLNhT/yg749HzTgTpUR7H5J0zUqcJHbAoLcRSn
EeW9t1qyQc/hkKhYI0KNgUaTFa51mIsWUGJUj5mZAJgeCw7LEp5v8TIIQg/uiXmu
hbVLjtqtFw+V10MoWCRebjTlhfkYKNInGdUtGFt+JD5pTVc8Mze8OaFLjTMilxMe
VhYXT/QE9tbAOVJsLmdHiXw0NxhcXmUa31ZFhCMt1aRpucJm12tiAg==
=sXHI
-----END PGP SIGNATURE-----