Re: [Technical Errata Reported] RFC5890 (4695)

Vint Cerf <vint@google.com> Wed, 28 September 2016 22:53 UTC

Return-Path: <vint@google.com>
X-Original-To: idna-update@alvestrand.no
Delivered-To: idna-update@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by mork.alvestrand.no (Postfix) with ESMTP id D32FA7CAC83 for <idna-update@alvestrand.no>; Thu, 29 Sep 2016 00:53:09 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at alvestrand.no
Authentication-Results: mork.alvestrand.no (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mork.alvestrand.no ([127.0.0.1]) by localhost (mork.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id klcY5-jp8sTc for <idna-update@alvestrand.no>; Thu, 29 Sep 2016 00:53:07 +0200 (CEST)
X-Greylist: delayed 02:14:13 by SQLgrey-1.8.0
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) by mork.alvestrand.no (Postfix) with ESMTPS id 7A3697CAC82 for <idna-update@alvestrand.no>; Thu, 29 Sep 2016 00:53:07 +0200 (CEST)
Received: by mail-wm0-x22c.google.com with SMTP id l132so262647404wmf.0 for <idna-update@alvestrand.no>; Wed, 28 Sep 2016 15:53:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=pNcl1hT3CzNPIGFWrh9V6w/xVKsshM5a1oRO+2BYhOM=; b=OfBqi5uOWGHUk2i93ZPqifGaxl6R7YQiWb3FbKt7XTAZWnimRm30UlObjXRXF1a4ZT RFa34Di9+vUio1ThFOuvHV/8zBcRMWyRlC183USrkuy8Snzja8VGy9K7rhfxkQONdlcR jjSi43hbMUV3LSVXSmkquWFvbbFvwT6yCSbT7qiJHU8xuTSv66yxpbAvwkJFrCtIHZht MGYOD55PpEGMp488EOpIvSXj+baij6YKQ1vUmRl3+KOwEkyIRKC137H2XLjnkoudfiT/ bIk9lZp3Vd7HP+ql8YJdepbV8K4lJgzXFoJ3WAXORevC3sCGg0FPDXrwqgblEy+8Vxm2 jLPw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=pNcl1hT3CzNPIGFWrh9V6w/xVKsshM5a1oRO+2BYhOM=; b=IwT9TeJ8GgOk9FUT3/uY1oMx5EFkarT3hLiiaIRMMkql/E1g8Ib947x4WYOck8tMLa 63/hwWy06UqB14wyFoyeZNBjneC23mnJm7scXuKr0O3cN5YtTNZqndBDuzwWAxJLfp3A oCNnfVuO3efN64JePmxnYcs+92r2a//SYhjrzcLCP/nYCFQXZu1IBCxD/xze4RtOgyYm i0d4pcobuOp/KXnWuH9f22rvBT0qnS1H9t7mb/WIDBDJh1gJbmV1k14+kXx+iZAayGIM 7Ahg/0grJTXipTOA+Ag+WzNOJVK5ba5y9nWhpjEiQ3+j2x8jSJ+Vpfl/YVfLEYEufHYE jyww==
X-Gm-Message-State: AE9vXwP6FzEh27sMfFGWbc5S4bKK+jX+atEUGrfo1eWnY1H2G//EFS4dMzpbk7Lo55ps2X3w3zO4H9SQR2+qa8TA
X-Received: by 10.194.234.69 with SMTP id uc5mr26983030wjc.76.1475070540939; Wed, 28 Sep 2016 06:49:00 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.80.166.38 with HTTP; Wed, 28 Sep 2016 06:49:00 -0700 (PDT)
In-Reply-To: <058F1697-DF1D-40EC-9C88-3A79E55F1A21@sparkpost.com>
References: <20160517155800.74B3D180004@rfc-editor.org> <d3c26d83-15bb-4a8c-25d2-72a69d5a7669@it.aoyama.ac.jp> <058F1697-DF1D-40EC-9C88-3A79E55F1A21@sparkpost.com>
From: Vint Cerf <vint@google.com>
Date: Wed, 28 Sep 2016 09:49:00 -0400
Message-ID: <CAHxHggeEmKFA3t0DvosY0ROzB0cMbCW=Ke0nVwk9+xbBHecSqA@mail.gmail.com>
Subject: Re: [Technical Errata Reported] RFC5890 (4695)
To: Juan Altmayer Pizzorno <juan@sparkpost.com>
Content-Type: multipart/alternative; boundary="089e014941629cc76e053d919f53"
Cc: ben@nostrum.com, aamelnikov@fastmail.fm, alissa@cooperw.in, Harald Tveit Alvestrand <harald@alvestrand.no>, john+ietf@jck.com, "idna-update@alvestrand.no" <idna-update@alvestrand.no>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, RFC Errata System <rfc-editor@rfc-editor.org>
X-BeenThere: idna-update@alvestrand.no
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: IDNA update work <idna-update.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/options/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/idna-update/>
List-Post: <mailto:idna-update@alvestrand.no>
List-Help: <mailto:idna-update-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Wed, 28 Sep 2016 22:53:10 -0000

Juan, "move ahead" means what? published a revised RFC? other?

v


On Wed, Sep 28, 2016 at 9:44 AM, Juan Altmayer Pizzorno <juan@sparkpost.com>
wrote:

> Hi!
>
> I think it would be good to move these errata ahead, be it as
> “verified” or with alternate wording:  the issue of the required
> buffer sizes came up while my team implemented SMTPUTF8, and these
> (incorrect) sizes given in the RFC created confusion.
>
> .. Juan
>
> > On May 22, 2016, at 2:40 AM, Martin J. Dürst <duerst@it.aoyama.ac.jp>
> wrote:
> >
> > [What I say below also applies to erratum 4696. If it's desirable to
> reply to that with the same comment, please let me know.]
> >
> > I believe that Juan is essentially right.
> >
> > This has come up before before, and possibly already noted by John
> Klensin for fixing in an eventual update.
> >
> > I provided some more detailed calculations with examples in the mail to
> idna-update@alvestrand.no with the following identifying details:
> > Message-ID: <4AACA7E6.1070503@it.aoyama.ac.jp>
> > Date: Sun, 13 Sep 2009 17:05:58 +0900
> >
> > Unfortunately, when I currently try to access the archive at
> > http://www.alvestrand.no/pipermail/idna-update/ from
> https://www.ietf.org/wg/concluded/idnabis.html, I get the following:
> >
> > ----
> > Forbidden
> >
> > You don't have permission to access /pipermail/idna-update/ on this
> server.
> >
> > Apache/2.4.7 (Ubuntu) Server at www.alvestrand.no Port 80
> > ----
> >
> > I have cc'ed Harald in the hope that the archive can be fixed soon.
> >
> >
> > I'm coping the relevant part of that mail here:
> >
> > >>>>>>>>
> > Here are my calculations. After a few tests, one finds out that punycode
> > uses a single 'a' to express 'one more of the same character'. The
> > question is then how many characters it takes punycode to express the
> > first character. Expressing that first character takes more and more
> > punycode characters as its Unicode number gets higher, so one has to
> > test with the smallest Unicode character that needs a certain number of
> > bytes in UTF-8. Going through lengths 1,2,3, and 4 per character in
> > UTF-8, we find:
> >
> > 1 octet per character in UTF-8:
> > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org
> gives
> > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org
> > and has 63 characters, so 63 octets in UTF-8, 126 octets in UTF-16, and
> > 252 octets in UTF-32.
> >
> > 2 octets per character in UTF-8:
> > ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢.org gives
> > xn--8aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org
> > and has 58 characters, so 116 octets in UTF-8, 116 octets in UTF-16, and
> > 232 octets in UTF-32. 59 seems possible in theory, but impossible in
> > practice.
> >
> > ँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँ.org (using
> the currently lowest encoded character that needs 3 bytes,
> > U+0901, DEVANAGARI SIGN CANDRABINDU), gives
> > xn--h1baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org
> > and has 57 characters, so 171 octets in UTF-8, 114 octets in UTF-16, and
> > 228 octets in UTF-32. Please note that even characters in the U+0800
> > range would need that much, because already a character such as 'ü'
> > needs that much.
> >
> > Trying to assess how many characters one could use of
> > 𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀
> 𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀.org
> > (using U+10300, OLD ITALIC LETTER A, the lowest character in Unicode 3.2
> > that needs 4 bytes in UTF-8) gives
> > xn--097caaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org
> > and has 56 characters, so 224 octets in UTF-8, 224 octets in UTF-16, and
> > 224 octets in UTF-32.
> >
> > Overall, we get a maximum label length in octets of 252 octets for
> > UTF-32 (with US-ASCII), and 224 octets in UTF-8 and UTF-16 (with Old
> > Italic and the like).
> > >>>>>>>>
> >
> > Regards,   Martin.
> >
> > On 2016/05/18 00:58, RFC Errata System wrote:
> >> The following errata report has been submitted for RFC5890,
> >> "Internationalized Domain Names for Applications (IDNA): Definitions
> and Document Framework".
> >>
> >> --------------------------------------
> >> You may review the report below and at:
> >> http://www.rfc-editor.org/errata_search.php?rfc=5890&eid=4695
> >>
> >> --------------------------------------
> >> Type: Technical
> >> Reported by: Juan Altmayer Pizzorno <juan@sparkpost.com>
> >>
> >> Section: 2.3.2.1
> >>
> >> Original Text
> >> -------------
> >> expansion of the A-label form to a U-label may produce strings that are
> >> much longer than the normal 63 octet DNS limit (potentially up to 252
> >> characters)
> >>
> >> Corrected Text
> >> --------------
> >> expansion of the A-label form to a U-label may produce strings that are
> >> much longer than the normal 63 octet DNS limit (potentially up to 59
> >> Unicode code points or 236 octets)
> >>
> >> Notes
> >> -----
> >> The contents of U-labels are encoded in the up to 59 ASCII characters
> (see 2.3.2.1 itself)
> >> output by the Punycode algorithm in their corresponding A-labels.  The
> Punycode
> >> decoder (https://tools.ietf.org/html/rfc3492#section-6.2) consumes at
> least one
> >> of those ASCII characters for each code point inserted into the
> U-label. An U-label,
> >> thus, can contain at the most 59 Unicode code points.
> >>
> >> Since U-labels are defined (in 2.3.2.1) to be expressed in a standard
> Unicode Encoding
> >> Form, and UTF-32, UTF-16 and UTF-8 (as revised by RFC3629) all can
> encode a code
> >> point in at most 4 octets, 236 octets is an upper bound for an
> U-label's length.
> >>
> >> I think it should be possible to derive a tighter bound, but its
> rationale would likely be
> >> less straighforward.
> >>
> >> I imagine the number 252 was originally derived by multiplying 63, the
> maximum
> >> length of an A-label (including the "xn--" prefix), by 4, the maximum
> number of
> >> octets needed to represent a code point.
> >>
> >> Instructions:
> >> -------------
> >> This erratum is currently posted as "Reported". If necessary, please
> >> use "Reply All" to discuss whether it should be verified or
> >> rejected. When a decision is reached, the verifying party (IESG)
> >> can log in to change the status and edit the report, if necessary.
> >>
> >> --------------------------------------
> >> RFC5890 (draft-ietf-idnabis-defs-13)
> >> --------------------------------------
> >> Title               : Internationalized Domain Names for Applications
> (IDNA): Definitions and Document Framework
> >> Publication Date    : August 2010
> >> Author(s)           : J. Klensin
> >> Category            : PROPOSED STANDARD
> >> Source              : Internationalized Domain Names in Applications
> (Revised)
> >> Area                : Applications
> >> Stream              : IETF
> >> Verifying Party     : IESG
> >>
> >> _______________________________________________
> >> Idna-update mailing list
> >> Idna-update@alvestrand.no
> >> http://www.alvestrand.no/mailman/listinfo/idna-update
> >>
>
>


-- 
New postal address:
Google
1875 Explorer Street, 10th Floor
Reston, VA 20190