Re: IDNA and getnameinfo() and getaddrinfo()

Nicolas Williams <Nicolas.Williams@oracle.com> Mon, 14 June 2010 23:42 UTC

Return-Path: <Nicolas.Williams@oracle.com>
X-Original-To: idna-update@alvestrand.no
Delivered-To: idna-update@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id 273AD39E1AB for <idna-update@alvestrand.no>; Tue, 15 Jun 2010 01:42:48 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8ZzHqtlGvwJj for <idna-update@alvestrand.no>; Tue, 15 Jun 2010 01:42:38 +0200 (CEST)
X-Greylist: from auto-whitelisted by SQLgrey-1.6.8
Received: from rcsinet10.oracle.com (rcsinet10.oracle.com [148.87.113.121]) by eikenes.alvestrand.no (Postfix) with ESMTPS id 188EA39E0A9 for <idna-update@alvestrand.no>; Tue, 15 Jun 2010 01:42:37 +0200 (CEST)
Received: from acsinet15.oracle.com (acsinet15.oracle.com [141.146.126.227]) by rcsinet10.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id o5ENgXXO011410 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 14 Jun 2010 23:42:35 GMT
Received: from acsmt354.oracle.com (acsmt354.oracle.com [141.146.40.154]) by acsinet15.oracle.com (Switch-3.4.2/Switch-3.4.1) with ESMTP id o5EMsY9m009643; Mon, 14 Jun 2010 23:42:30 GMT
Received: from abhmt021.oracle.com by acsmt353.oracle.com with ESMTP id 345296451276558944; Mon, 14 Jun 2010 16:42:24 -0700
Received: from oracle.com (/129.153.128.104) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 14 Jun 2010 16:42:23 -0700
Date: Mon, 14 Jun 2010 18:42:19 -0500
From: Nicolas Williams <Nicolas.Williams@oracle.com>
To: Dave Thaler <dthaler@microsoft.com>
Subject: Re: IDNA and getnameinfo() and getaddrinfo()
Message-ID: <20100614234218.GB24077@oracle.com>
References: <20100614172631.GQ9605@oracle.com> <9B57C850BB53634CACEC56EF4853FF652C05FF86@TK5EX14MBXW604.wingroup.windeploy.ntdev.microsoft.com> <20100614193133.GS9605@oracle.com> <9B57C850BB53634CACEC56EF4853FF652C060349@TK5EX14MBXW604.wingroup.windeploy.ntdev.microsoft.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <9B57C850BB53634CACEC56EF4853FF652C060349@TK5EX14MBXW604.wingroup.windeploy.ntdev.microsoft.com>
User-Agent: Mutt/1.5.20 (2010-03-02)
X-Auth-Type: Internal IP
X-Source-IP: acsinet15.oracle.com [141.146.126.227]
X-CT-RefId: str=0001.0A090208.4C16BE6C.0015:SCFMA922111,ss=1,fgs=0
Cc: "cheshire@apple.com" <cheshire@apple.com>, "john+ietf@jck.com" <john+ietf@jck.com>, "idna-update@alvestrand.no" <idna-update@alvestrand.no>
X-BeenThere: idna-update@alvestrand.no
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: IDNA update work <idna-update.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/options/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/idna-update>
List-Post: <mailto:idna-update@alvestrand.no>
List-Help: <mailto:idna-update-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Mon, 14 Jun 2010 23:42:48 -0000

On Mon, Jun 14, 2010 at 08:20:55PM +0000, Dave Thaler wrote:
> > > As discussed in draft-iab-idn-encoding section 3, it's not that simple.
> > > The ACE form applies in the public DNS but does not apply in many
> > > private DNS clouds.
> > 
> > I'm not sure I care about those, but one could always implement lists of domains
> > below which to apply alternative algorithms.
> 
> You may not care about them but unfortunately people who provide 
> getaddrinfo/getnameinfo libraries for applications in general need to
> care about them.

For the matter of this discussion, I don't care.  If I were implementing
I'd consider providing a local administrative configuration interface by
which to provide lists of private cloud domains that use alternative IDN
schemes.  (Actually, I'd probably want a distributed configuration
method for that, preferably using DNS itself, but really, that's a
tangent I don't want to go on because it's a distraction from the
purpose of this thread.)

> > So remove all references to aliases from my previous post; instead these
> > functions should return the A-label as the canon name only when the U-label
> > cannot be converted to the caller's locale's codeset losslessly, else they should
> > return the U-label (in the caller's locale's codeset) as the canon name.
> 
> I'd argue that the "canon name" should be the form in which it was 
> resolved over the wire.  So the A-label form if it was resolved in the public DNS,
> and another form (typically the U-label form) if it was resolved via something 
> else (e.g., mDNS or DNS in a private namespace using UTF-8 or whatever else).
> Also note that Windows treats "char *" as ANSI (which has no guarantee of
> interoperability) and hence deprecates getaddrinfo/getnameinfo, and defines
> UTF-16 versions (GetAddrInfoW/GetNameInfoW).
> MacOS on the other hand treats "char *" as UTF-8.  

Better yet, Simon's poposal allows the caller to decide which name
should be returned as canonical.  That works for me.

> RFC 3493 doesn't say either way whether "char *" is ANSI or UTF-8 or whatever
> else, and as far as I know, neither does POSIX 
> (http://www.opengroup.org/onlinepubs/9699919799/functions/getaddrinfo.html).

See Simon's reply.

> Hence this is an issue for anyone proposing to make a standards-track RFC for 
> getaddrinfo/getnameinfo.

I'd be willing to specify new functions with different names if need be.
But it seems me that the between getaddrinfo()'s hints and
getnameinfo()'s flags arguments we have enough room for extensibility
without resorting to new function names.

> > If we could assume IDNA-aware getnameinfo()/getaddrinfo() then is there any
> > reason for application protocols [that don't involve domainname registration] to
> > do anything other than allow all three forms (A-label, U-label and un-pre-
> > processed Unicode) on the wire?
> 
> I'd argue any new application protocol ought to specify the encoding rather than 
> allowing multiple.   Specifying UTF-8 would be good :-)

Just UTF-8, un-pre-processed, raw user input?  Or did you mean U-labels?

Also, with respect to deployed protocols that have protocol elements for
carrying domainnames, where those protocol elements are defined as
carrying UTF-8, but where in practice most implementors did not actually
code those slots as IDN-aware, wouldn't it be a strong presumption that
the slots are IDN-unaware?

Nico
--