RE: IDNA and getnameinfo() and getaddrinfo()
Dave Thaler <dthaler@microsoft.com> Mon, 14 June 2010 20:21 UTC
Return-Path: <dthaler@microsoft.com>
X-Original-To: idna-update@alvestrand.no
Delivered-To: idna-update@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 7A20439E1AB for <idna-update@alvestrand.no>; Mon, 14 Jun 2010 22:21:14 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m88cP9tlmSeY for <idna-update@alvestrand.no>; Mon, 14 Jun 2010 22:21:00 +0200 (CEST)
X-Greylist: domain auto-whitelisted by SQLgrey-1.6.8
Received: from smtp.microsoft.com (smtp.microsoft.com [131.107.115.212]) by eikenes.alvestrand.no (Postfix) with ESMTPS id 4DC5939E0A9 for <idna-update@alvestrand.no>; Mon, 14 Jun 2010 22:21:00 +0200 (CEST)
Received: from TK5EX14MLTC102.redmond.corp.microsoft.com (157.54.79.180) by TK5-EXGWY-E801.partners.extranet.microsoft.com (10.251.56.50) with Microsoft SMTP Server (TLS) id 8.2.176.0; Mon, 14 Jun 2010 13:20:58 -0700
Received: from TK5EX14MLTW651.wingroup.windeploy.ntdev.microsoft.com (157.54.71.39) by TK5EX14MLTC102.redmond.corp.microsoft.com (157.54.79.180) with Microsoft SMTP Server (TLS) id 14.1.160.7; Mon, 14 Jun 2010 13:20:58 -0700
Received: from TK5EX14MBXW604.wingroup.windeploy.ntdev.microsoft.com ([169.254.4.101]) by TK5EX14MLTW651.wingroup.windeploy.ntdev.microsoft.com ([157.54.71.39]) with mapi; Mon, 14 Jun 2010 13:20:59 -0700
From: Dave Thaler <dthaler@microsoft.com>
To: Nicolas Williams <Nicolas.Williams@oracle.com>
Subject: RE: IDNA and getnameinfo() and getaddrinfo()
Thread-Topic: IDNA and getnameinfo() and getaddrinfo()
Thread-Index: AQHLC+cdbBSiMDA3JEK4I6MZeeMKYJKB0T0QgAB8sYD//5SPYA==
Date: Mon, 14 Jun 2010 20:20:55 +0000
Message-ID: <9B57C850BB53634CACEC56EF4853FF652C060349@TK5EX14MBXW604.wingroup.windeploy.ntdev.microsoft.com>
References: <20100614172631.GQ9605@oracle.com> <9B57C850BB53634CACEC56EF4853FF652C05FF86@TK5EX14MBXW604.wingroup.windeploy.ntdev.microsoft.com> <20100614193133.GS9605@oracle.com>
In-Reply-To: <20100614193133.GS9605@oracle.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "cheshire@apple.com" <cheshire@apple.com>, "john+ietf@jck.com" <john+ietf@jck.com>, "idna-update@alvestrand.no" <idna-update@alvestrand.no>
X-BeenThere: idna-update@alvestrand.no
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: IDNA update work <idna-update.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/options/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/idna-update>
List-Post: <mailto:idna-update@alvestrand.no>
List-Help: <mailto:idna-update-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Mon, 14 Jun 2010 20:21:14 -0000
> -----Original Message----- > From: Nicolas Williams [mailto:Nicolas.Williams@oracle.com] > Sent: Monday, June 14, 2010 12:32 PM > To: Dave Thaler > Cc: idna-update@alvestrand.no; john+ietf@jck.com; cheshire@apple.com > Subject: Re: IDNA and getnameinfo() and getaddrinfo() > > On Mon, Jun 14, 2010 at 07:14:12PM +0000, Dave Thaler wrote: > > > Over in the NFSv4 WG we're discussing how to fix NFSv4.1 to properly > > > handle IDNA. In the process of doing so I ran into draft-iab-idn- > > > encoding, which has a cogent discussion of name service switches (pictured > in figure 2). > > > > > > draft-iab-idn-encoding aims for Informational status. I'm wondering > > > if we could publish a Standards-Track document describing how > > > getnameinfo() and > > > getaddrinfo() should handle IDNA. > > > > > > For example, one could say that when using DNS getnameinfo() should: > > > > Be careful not to confuse getnameinfo() with DNS. > > I explicitly pointed out the name service switch architecture usually > implemented. I thought that'd suffice to clarify that I really meant "the DNS > plug-in to the getnameinfo() entry point in the name service switch" -- I just > didn't want to be too redundant. > > > As noted in draft-iab-idn-encoding and RFC 3493, DNS is just one of a > > number of mechanisms used under getnameinfo(). > > Right, and I believe the failure to acknowledge this in the original IDNA > architecture was a significant failure. I'm disappointed that though this is being > acknowledged now, it's not in a standards-track document. > > > > - perform the DNS lookup > > > - apply ToUnicode() to the resulting domainname > > > - attempt to convert the address' name to the caller's locale's codeset > > > if that codeset is not UTF-8 > > > - if failure, then return the A-label as the canonical hostname > > > - if success return the U-label (in the caller's locale's codeset) > > > as the canonical hostname and the A-label as an alias > > > > > > And that when using DNS getaddrinfo() should: > > > > > > - convert the given host/domainname from the caller's locale's codeset > > > to UTF-8 if necessary > > > - apply ToASCII(), perform DNS lookups > > > > As discussed in draft-iab-idn-encoding section 3, it's not that simple. > > The ACE form applies in the public DNS but does not apply in many > > private DNS clouds. > > I'm not sure I care about those, but one could always implement lists of domains > below which to apply alternative algorithms. You may not care about them but unfortunately people who provide getaddrinfo/getnameinfo libraries for applications in general need to care about them. > > I was specifically interested in what name should be returned as canonical and > what name should be returned as an alias, if any. > > > > - if success, return the IP address(es) found, the given name as the > > > canonical hostname, the A-label form of the hostname as an alias, > > > and the U-label form (converted to the caller's locale's codeset) > > > as an alias if different from the given hostname. > > > > The addrinfo structure returned by getaddrinfo() does not return > > "aliases" per se. It can return a single string which is: > > char *ai_canonname; /* canonical name for nodename */ > > Oh, right. How depressing. I'd for some reason thought them similar enough to > gethostbyname/gethostbyaddr(). Unfortunately they're not. > > So remove all references to aliases from my previous post; instead these > functions should return the A-label as the canon name only when the U-label > cannot be converted to the caller's locale's codeset losslessly, else they should > return the U-label (in the caller's locale's codeset) as the canon name. I'd argue that the "canon name" should be the form in which it was resolved over the wire. So the A-label form if it was resolved in the public DNS, and another form (typically the U-label form) if it was resolved via something else (e.g., mDNS or DNS in a private namespace using UTF-8 or whatever else). Also note that Windows treats "char *" as ANSI (which has no guarantee of interoperability) and hence deprecates getaddrinfo/getnameinfo, and defines UTF-16 versions (GetAddrInfoW/GetNameInfoW). MacOS on the other hand treats "char *" as UTF-8. RFC 3493 doesn't say either way whether "char *" is ANSI or UTF-8 or whatever else, and as far as I know, neither does POSIX (http://www.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html). Hence this is an issue for anyone proposing to make a standards-track RFC for getaddrinfo/getnameinfo. > > > In my view, yes you're on the right track in having NFSv4 not want to > > do encoding conversion itself for name resolution but in expecting it > > to be done under getaddrinfo/getnameinfo. > > Would more advice to protocol designers be appropriate then? When should > application protocols (ingoring domainname registration related > protocols) care to specify A-labels-only, U-labels-only, both, or un-pre- > processed Unicode? > > If we could assume IDNA-aware getnameinfo()/getaddrinfo() then is there any > reason for application protocols [that don't involve domainname registration] to > do anything other than allow all three forms (A-label, U-label and un-pre- > processed Unicode) on the wire? I'd argue any new application protocol ought to specify the encoding rather than allowing multiple. Specifying UTF-8 would be good :-) -Dave > > > > Unfortunately we probably cannot rely on getnameinfo()/getaddrinfo() > > > doing the Right Thing. A Standards-Track RFC on this would probably help. > > > > Well API RFCs (like RFC 3493 for getnameinfo/getaddrinfo) are > > Informational, not Standards-Track. But yes an RFC would probably help. > > We have plenty of Standards-Track API RFCs. (Yes, we really do.) I think it'd be > entirely appropriate to have a Standards-Track RFC specifying how these two > functions (or new variants thereof) should handle IDNA. Indeed, I think it's > necessary, and a major, perhaps the only serious shortcoming of IDNAbis. > > Nico > --
- RE: IDNA and getnameinfo() and getaddrinfo() Dave Thaler
- RE: IDNA and getnameinfo() and getaddrinfo() Dave Thaler
- Re: IDNA and getnameinfo() and getaddrinfo() Simon Josefsson
- Re: Distributed configuration of "private" IDNA (… Nicolas Williams
- Re: Distributed configuration of "private" IDNA (… Nicolas Williams
- RE: Distributed configuration of "private" IDNA (… Dave Thaler
- Re: Distributed configuration of "private" IDNA (… Nicolas Williams
- Re: Distributed configuration of "private" IDNA (… Nicolas Williams
- RE: Distributed configuration of "private" IDNA (… Dave Thaler
- Re: Distributed configuration of "private" IDNA (… Nicolas Williams
- RE: Distributed configuration of "private" IDNA (… Shawn Steele