Re: [Technical Errata Reported] RFC5890 (4695)
Vint Cerf <vint@google.com> Wed, 28 September 2016 22:53 UTC
Return-Path: <vint@google.com>
X-Original-To: idna-update@alvestrand.no
Delivered-To: idna-update@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by mork.alvestrand.no (Postfix) with ESMTP id D32FA7CAC83 for <idna-update@alvestrand.no>; Thu, 29 Sep 2016 00:53:09 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at alvestrand.no
Authentication-Results: mork.alvestrand.no (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mork.alvestrand.no ([127.0.0.1]) by localhost (mork.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id klcY5-jp8sTc for <idna-update@alvestrand.no>; Thu, 29 Sep 2016 00:53:07 +0200 (CEST)
X-Greylist: delayed 02:14:13 by SQLgrey-1.8.0
X-Greylist: domain auto-whitelisted by SQLgrey-1.8.0
Received: from mail-wm0-x22c.google.com (mail-wm0-x22c.google.com [IPv6:2a00:1450:400c:c09::22c]) by mork.alvestrand.no (Postfix) with ESMTPS id 7A3697CAC82 for <idna-update@alvestrand.no>; Thu, 29 Sep 2016 00:53:07 +0200 (CEST)
Received: by mail-wm0-x22c.google.com with SMTP id l132so262647404wmf.0 for <idna-update@alvestrand.no>; Wed, 28 Sep 2016 15:53:07 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=pNcl1hT3CzNPIGFWrh9V6w/xVKsshM5a1oRO+2BYhOM=; b=OfBqi5uOWGHUk2i93ZPqifGaxl6R7YQiWb3FbKt7XTAZWnimRm30UlObjXRXF1a4ZT RFa34Di9+vUio1ThFOuvHV/8zBcRMWyRlC183USrkuy8Snzja8VGy9K7rhfxkQONdlcR jjSi43hbMUV3LSVXSmkquWFvbbFvwT6yCSbT7qiJHU8xuTSv66yxpbAvwkJFrCtIHZht MGYOD55PpEGMp488EOpIvSXj+baij6YKQ1vUmRl3+KOwEkyIRKC137H2XLjnkoudfiT/ bIk9lZp3Vd7HP+ql8YJdepbV8K4lJgzXFoJ3WAXORevC3sCGg0FPDXrwqgblEy+8Vxm2 jLPw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=pNcl1hT3CzNPIGFWrh9V6w/xVKsshM5a1oRO+2BYhOM=; b=IwT9TeJ8GgOk9FUT3/uY1oMx5EFkarT3hLiiaIRMMkql/E1g8Ib947x4WYOck8tMLa 63/hwWy06UqB14wyFoyeZNBjneC23mnJm7scXuKr0O3cN5YtTNZqndBDuzwWAxJLfp3A oCNnfVuO3efN64JePmxnYcs+92r2a//SYhjrzcLCP/nYCFQXZu1IBCxD/xze4RtOgyYm i0d4pcobuOp/KXnWuH9f22rvBT0qnS1H9t7mb/WIDBDJh1gJbmV1k14+kXx+iZAayGIM 7Ahg/0grJTXipTOA+Ag+WzNOJVK5ba5y9nWhpjEiQ3+j2x8jSJ+Vpfl/YVfLEYEufHYE jyww==
X-Gm-Message-State: AE9vXwP6FzEh27sMfFGWbc5S4bKK+jX+atEUGrfo1eWnY1H2G//EFS4dMzpbk7Lo55ps2X3w3zO4H9SQR2+qa8TA
X-Received: by 10.194.234.69 with SMTP id uc5mr26983030wjc.76.1475070540939; Wed, 28 Sep 2016 06:49:00 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.80.166.38 with HTTP; Wed, 28 Sep 2016 06:49:00 -0700 (PDT)
In-Reply-To: <058F1697-DF1D-40EC-9C88-3A79E55F1A21@sparkpost.com>
References: <20160517155800.74B3D180004@rfc-editor.org> <d3c26d83-15bb-4a8c-25d2-72a69d5a7669@it.aoyama.ac.jp> <058F1697-DF1D-40EC-9C88-3A79E55F1A21@sparkpost.com>
From: Vint Cerf <vint@google.com>
Date: Wed, 28 Sep 2016 09:49:00 -0400
Message-ID: <CAHxHggeEmKFA3t0DvosY0ROzB0cMbCW=Ke0nVwk9+xbBHecSqA@mail.gmail.com>
Subject: Re: [Technical Errata Reported] RFC5890 (4695)
To: Juan Altmayer Pizzorno <juan@sparkpost.com>
Content-Type: multipart/alternative; boundary="089e014941629cc76e053d919f53"
Cc: ben@nostrum.com, aamelnikov@fastmail.fm, alissa@cooperw.in, Harald Tveit Alvestrand <harald@alvestrand.no>, john+ietf@jck.com, "idna-update@alvestrand.no" <idna-update@alvestrand.no>, "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, RFC Errata System <rfc-editor@rfc-editor.org>
X-BeenThere: idna-update@alvestrand.no
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: IDNA update work <idna-update.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/options/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/idna-update/>
List-Post: <mailto:idna-update@alvestrand.no>
List-Help: <mailto:idna-update-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Wed, 28 Sep 2016 22:53:10 -0000
Juan, "move ahead" means what? published a revised RFC? other? v On Wed, Sep 28, 2016 at 9:44 AM, Juan Altmayer Pizzorno <juan@sparkpost.com> wrote: > Hi! > > I think it would be good to move these errata ahead, be it as > “verified” or with alternate wording: the issue of the required > buffer sizes came up while my team implemented SMTPUTF8, and these > (incorrect) sizes given in the RFC created confusion. > > .. Juan > > > On May 22, 2016, at 2:40 AM, Martin J. Dürst <duerst@it.aoyama.ac.jp> > wrote: > > > > [What I say below also applies to erratum 4696. If it's desirable to > reply to that with the same comment, please let me know.] > > > > I believe that Juan is essentially right. > > > > This has come up before before, and possibly already noted by John > Klensin for fixing in an eventual update. > > > > I provided some more detailed calculations with examples in the mail to > idna-update@alvestrand.no with the following identifying details: > > Message-ID: <4AACA7E6.1070503@it.aoyama.ac.jp> > > Date: Sun, 13 Sep 2009 17:05:58 +0900 > > > > Unfortunately, when I currently try to access the archive at > > http://www.alvestrand.no/pipermail/idna-update/ from > https://www.ietf.org/wg/concluded/idnabis.html, I get the following: > > > > ---- > > Forbidden > > > > You don't have permission to access /pipermail/idna-update/ on this > server. > > > > Apache/2.4.7 (Ubuntu) Server at www.alvestrand.no Port 80 > > ---- > > > > I have cc'ed Harald in the hope that the archive can be fixed soon. > > > > > > I'm coping the relevant part of that mail here: > > > > >>>>>>>> > > Here are my calculations. After a few tests, one finds out that punycode > > uses a single 'a' to express 'one more of the same character'. The > > question is then how many characters it takes punycode to express the > > first character. Expressing that first character takes more and more > > punycode characters as its Unicode number gets higher, so one has to > > test with the smallest Unicode character that needs a certain number of > > bytes in UTF-8. Going through lengths 1,2,3, and 4 per character in > > UTF-8, we find: > > > > 1 octet per character in UTF-8: > > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org > gives > > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org > > and has 63 characters, so 63 octets in UTF-8, 126 octets in UTF-16, and > > 252 octets in UTF-32. > > > > 2 octets per character in UTF-8: > > ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢.org gives > > xn--8aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org > > and has 58 characters, so 116 octets in UTF-8, 116 octets in UTF-16, and > > 232 octets in UTF-32. 59 seems possible in theory, but impossible in > > practice. > > > > ँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँ.org (using > the currently lowest encoded character that needs 3 bytes, > > U+0901, DEVANAGARI SIGN CANDRABINDU), gives > > xn--h1baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org > > and has 57 characters, so 171 octets in UTF-8, 114 octets in UTF-16, and > > 228 octets in UTF-32. Please note that even characters in the U+0800 > > range would need that much, because already a character such as 'ü' > > needs that much. > > > > Trying to assess how many characters one could use of > > 𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀 > 𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀.org > > (using U+10300, OLD ITALIC LETTER A, the lowest character in Unicode 3.2 > > that needs 4 bytes in UTF-8) gives > > xn--097caaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org > > and has 56 characters, so 224 octets in UTF-8, 224 octets in UTF-16, and > > 224 octets in UTF-32. > > > > Overall, we get a maximum label length in octets of 252 octets for > > UTF-32 (with US-ASCII), and 224 octets in UTF-8 and UTF-16 (with Old > > Italic and the like). > > >>>>>>>> > > > > Regards, Martin. > > > > On 2016/05/18 00:58, RFC Errata System wrote: > >> The following errata report has been submitted for RFC5890, > >> "Internationalized Domain Names for Applications (IDNA): Definitions > and Document Framework". > >> > >> -------------------------------------- > >> You may review the report below and at: > >> http://www.rfc-editor.org/errata_search.php?rfc=5890&eid=4695 > >> > >> -------------------------------------- > >> Type: Technical > >> Reported by: Juan Altmayer Pizzorno <juan@sparkpost.com> > >> > >> Section: 2.3.2.1 > >> > >> Original Text > >> ------------- > >> expansion of the A-label form to a U-label may produce strings that are > >> much longer than the normal 63 octet DNS limit (potentially up to 252 > >> characters) > >> > >> Corrected Text > >> -------------- > >> expansion of the A-label form to a U-label may produce strings that are > >> much longer than the normal 63 octet DNS limit (potentially up to 59 > >> Unicode code points or 236 octets) > >> > >> Notes > >> ----- > >> The contents of U-labels are encoded in the up to 59 ASCII characters > (see 2.3.2.1 itself) > >> output by the Punycode algorithm in their corresponding A-labels. The > Punycode > >> decoder (https://tools.ietf.org/html/rfc3492#section-6.2) consumes at > least one > >> of those ASCII characters for each code point inserted into the > U-label. An U-label, > >> thus, can contain at the most 59 Unicode code points. > >> > >> Since U-labels are defined (in 2.3.2.1) to be expressed in a standard > Unicode Encoding > >> Form, and UTF-32, UTF-16 and UTF-8 (as revised by RFC3629) all can > encode a code > >> point in at most 4 octets, 236 octets is an upper bound for an > U-label's length. > >> > >> I think it should be possible to derive a tighter bound, but its > rationale would likely be > >> less straighforward. > >> > >> I imagine the number 252 was originally derived by multiplying 63, the > maximum > >> length of an A-label (including the "xn--" prefix), by 4, the maximum > number of > >> octets needed to represent a code point. > >> > >> Instructions: > >> ------------- > >> This erratum is currently posted as "Reported". If necessary, please > >> use "Reply All" to discuss whether it should be verified or > >> rejected. When a decision is reached, the verifying party (IESG) > >> can log in to change the status and edit the report, if necessary. > >> > >> -------------------------------------- > >> RFC5890 (draft-ietf-idnabis-defs-13) > >> -------------------------------------- > >> Title : Internationalized Domain Names for Applications > (IDNA): Definitions and Document Framework > >> Publication Date : August 2010 > >> Author(s) : J. Klensin > >> Category : PROPOSED STANDARD > >> Source : Internationalized Domain Names in Applications > (Revised) > >> Area : Applications > >> Stream : IETF > >> Verifying Party : IESG > >> > >> _______________________________________________ > >> Idna-update mailing list > >> Idna-update@alvestrand.no > >> http://www.alvestrand.no/mailman/listinfo/idna-update > >> > > -- New postal address: Google 1875 Explorer Street, 10th Floor Reston, VA 20190
- Re: [Technical Errata Reported] RFC5890 (4695) Martin J. Dürst
- [Technical Errata Reported] RFC5890 (4695) RFC Errata System
- Re: [Technical Errata Reported] RFC5890 (4695) John C Klensin
- Re: [Technical Errata Reported] RFC5890 (4695) Juan Altmayer Pizzorno
- Re: [Technical Errata Reported] RFC5890 (4695) Juan Altmayer Pizzorno
- Re: [Technical Errata Reported] RFC5890 (4695) Juan Altmayer Pizzorno
- Re: [Technical Errata Reported] RFC5890 (4695) Vint Cerf