RE: IDNA and getnameinfo() and getaddrinfo()

Dave Thaler <dthaler@microsoft.com> Mon, 14 June 2010 20:21 UTC

Return-Path: <dthaler@microsoft.com>
X-Original-To: idna-update@alvestrand.no
Delivered-To: idna-update@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 7A20439E1AB for <idna-update@alvestrand.no>; Mon, 14 Jun 2010 22:21:14 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m88cP9tlmSeY for <idna-update@alvestrand.no>; Mon, 14 Jun 2010 22:21:00 +0200 (CEST)
X-Greylist: domain auto-whitelisted by SQLgrey-1.6.8
Received: from smtp.microsoft.com (smtp.microsoft.com [131.107.115.212]) by eikenes.alvestrand.no (Postfix) with ESMTPS id 4DC5939E0A9 for <idna-update@alvestrand.no>; Mon, 14 Jun 2010 22:21:00 +0200 (CEST)
Received: from TK5EX14MLTC102.redmond.corp.microsoft.com (157.54.79.180) by TK5-EXGWY-E801.partners.extranet.microsoft.com (10.251.56.50) with Microsoft SMTP Server (TLS) id 8.2.176.0; Mon, 14 Jun 2010 13:20:58 -0700
Received: from TK5EX14MLTW651.wingroup.windeploy.ntdev.microsoft.com (157.54.71.39) by TK5EX14MLTC102.redmond.corp.microsoft.com (157.54.79.180) with Microsoft SMTP Server (TLS) id 14.1.160.7; Mon, 14 Jun 2010 13:20:58 -0700
Received: from TK5EX14MBXW604.wingroup.windeploy.ntdev.microsoft.com ([169.254.4.101]) by TK5EX14MLTW651.wingroup.windeploy.ntdev.microsoft.com ([157.54.71.39]) with mapi; Mon, 14 Jun 2010 13:20:59 -0700
From: Dave Thaler <dthaler@microsoft.com>
To: Nicolas Williams <Nicolas.Williams@oracle.com>
Subject: RE: IDNA and getnameinfo() and getaddrinfo()
Thread-Topic: IDNA and getnameinfo() and getaddrinfo()
Thread-Index: AQHLC+cdbBSiMDA3JEK4I6MZeeMKYJKB0T0QgAB8sYD//5SPYA==
Date: Mon, 14 Jun 2010 20:20:55 +0000
Message-ID: <9B57C850BB53634CACEC56EF4853FF652C060349@TK5EX14MBXW604.wingroup.windeploy.ntdev.microsoft.com>
References: <20100614172631.GQ9605@oracle.com> <9B57C850BB53634CACEC56EF4853FF652C05FF86@TK5EX14MBXW604.wingroup.windeploy.ntdev.microsoft.com> <20100614193133.GS9605@oracle.com>
In-Reply-To: <20100614193133.GS9605@oracle.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "cheshire@apple.com" <cheshire@apple.com>, "john+ietf@jck.com" <john+ietf@jck.com>, "idna-update@alvestrand.no" <idna-update@alvestrand.no>
X-BeenThere: idna-update@alvestrand.no
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: IDNA update work <idna-update.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/options/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/idna-update>
List-Post: <mailto:idna-update@alvestrand.no>
List-Help: <mailto:idna-update-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Mon, 14 Jun 2010 20:21:14 -0000

> -----Original Message-----
> From: Nicolas Williams [mailto:Nicolas.Williams@oracle.com]
> Sent: Monday, June 14, 2010 12:32 PM
> To: Dave Thaler
> Cc: idna-update@alvestrand.no; john+ietf@jck.com; cheshire@apple.com
> Subject: Re: IDNA and getnameinfo() and getaddrinfo()
> 
> On Mon, Jun 14, 2010 at 07:14:12PM +0000, Dave Thaler wrote:
> > > Over in the NFSv4 WG we're discussing how to fix NFSv4.1 to properly
> > > handle IDNA.  In the process of doing so I ran into draft-iab-idn-
> > > encoding, which has a cogent discussion of name service switches (pictured
> in figure 2).
> > >
> > > draft-iab-idn-encoding aims for Informational status.  I'm wondering
> > > if we could publish a Standards-Track document describing how
> > > getnameinfo() and
> > > getaddrinfo() should handle IDNA.
> > >
> > > For example, one could say that when using DNS getnameinfo() should:
> >
> > Be careful not to confuse getnameinfo() with DNS.
> 
> I explicitly pointed out the name service switch architecture usually
> implemented.  I thought that'd suffice to clarify that I really meant "the DNS
> plug-in to the getnameinfo() entry point in the name service switch" -- I just
> didn't want to be too redundant.
> 
> > As noted in draft-iab-idn-encoding and RFC 3493, DNS is just one of a
> > number of mechanisms used under getnameinfo().
> 
> Right, and I believe the failure to acknowledge this in the original IDNA
> architecture was a significant failure.  I'm disappointed that though this is being
> acknowledged now, it's not in a standards-track document.
> 
> > >  - perform the DNS lookup
> > >  - apply ToUnicode() to the resulting domainname
> > >  - attempt to convert the address' name to the caller's locale's codeset
> > >    if that codeset is not UTF-8
> > >     - if failure, then return the A-label as the canonical hostname
> > >     - if success return the U-label (in the caller's locale's codeset)
> > >       as the canonical hostname and the A-label as an alias
> > >
> > > And that when using DNS getaddrinfo() should:
> > >
> > >  - convert the given host/domainname from the caller's locale's codeset
> > >    to UTF-8 if necessary
> > >  - apply ToASCII(), perform DNS lookups
> >
> > As discussed in draft-iab-idn-encoding section 3, it's not that simple.
> > The ACE form applies in the public DNS but does not apply in many
> > private DNS clouds.
> 
> I'm not sure I care about those, but one could always implement lists of domains
> below which to apply alternative algorithms.

You may not care about them but unfortunately people who provide 
getaddrinfo/getnameinfo libraries for applications in general need to
care about them.

> 
> I was specifically interested in what name should be returned as canonical and
> what name should be returned as an alias, if any.
> 
> > >     - if success, return the IP address(es) found, the given name as the
> > >       canonical hostname, the A-label form of the hostname as an alias,
> > >       and the U-label form (converted to the caller's locale's codeset)
> > >       as an alias if different from the given hostname.
> >
> > The addrinfo structure returned by getaddrinfo() does not return
> > "aliases" per se.  It can return a single string which is:
> > 	char   *ai_canonname; /* canonical name for nodename */
> 
> Oh, right.  How depressing.  I'd for some reason thought them similar enough to
> gethostbyname/gethostbyaddr().

Unfortunately they're not.

> 
> So remove all references to aliases from my previous post; instead these
> functions should return the A-label as the canon name only when the U-label
> cannot be converted to the caller's locale's codeset losslessly, else they should
> return the U-label (in the caller's locale's codeset) as the canon name.

I'd argue that the "canon name" should be the form in which it was 
resolved over the wire.  So the A-label form if it was resolved in the public DNS,
and another form (typically the U-label form) if it was resolved via something 
else (e.g., mDNS or DNS in a private namespace using UTF-8 or whatever else).
Also note that Windows treats "char *" as ANSI (which has no guarantee of
interoperability) and hence deprecates getaddrinfo/getnameinfo, and defines
UTF-16 versions (GetAddrInfoW/GetNameInfoW).
MacOS on the other hand treats "char *" as UTF-8.  

RFC 3493 doesn't say either way whether "char *" is ANSI or UTF-8 or whatever
else, and as far as I know, neither does POSIX 
(http://www.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html).

Hence this is an issue for anyone proposing to make a standards-track RFC for 
getaddrinfo/getnameinfo.

> 
> > In my view, yes you're on the right track in having NFSv4 not want to
> > do encoding conversion itself for name resolution but in expecting it
> > to be done under getaddrinfo/getnameinfo.
> 
> Would more advice to protocol designers be appropriate then?  When should
> application protocols (ingoring domainname registration related
> protocols) care to specify A-labels-only, U-labels-only, both, or un-pre-
> processed Unicode?
> 
> If we could assume IDNA-aware getnameinfo()/getaddrinfo() then is there any
> reason for application protocols [that don't involve domainname registration] to
> do anything other than allow all three forms (A-label, U-label and un-pre-
> processed Unicode) on the wire?

I'd argue any new application protocol ought to specify the encoding rather than 
allowing multiple.   Specifying UTF-8 would be good :-)

-Dave

> 
> > > Unfortunately we probably cannot rely on getnameinfo()/getaddrinfo()
> > > doing the Right Thing.  A Standards-Track RFC on this would probably help.
> >
> > Well API RFCs (like RFC 3493 for getnameinfo/getaddrinfo) are
> > Informational, not Standards-Track.  But yes an RFC would probably help.
> 
> We have plenty of Standards-Track API RFCs.  (Yes, we really do.)  I think it'd be
> entirely appropriate to have a Standards-Track RFC specifying how these two
> functions (or new variants thereof) should handle IDNA.  Indeed, I think it's
> necessary, and a major, perhaps the only serious shortcoming of IDNAbis.
> 
> Nico
> --