Re: [Technical Errata Reported] RFC5890 (4695)
Juan Altmayer Pizzorno <juan@sparkpost.com> Wed, 28 September 2016 14:37 UTC
Return-Path: <juan@sparkpost.com>
X-Original-To: idna-update@alvestrand.no
Delivered-To: idna-update@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by mork.alvestrand.no (Postfix) with ESMTP id 571877CAC69 for <idna-update@alvestrand.no>; Wed, 28 Sep 2016 16:37:46 +0200 (CEST)
X-Virus-Scanned: Debian amavisd-new at alvestrand.no
Authentication-Results: mork.alvestrand.no (amavisd-new); dkim=pass (1024-bit key) header.d=sparkpost.com
Received: from mork.alvestrand.no ([127.0.0.1]) by localhost (mork.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z9cwFUSJPoA3 for <idna-update@alvestrand.no>; Wed, 28 Sep 2016 16:37:42 +0200 (CEST)
X-Greylist: delayed 00:46:44 by SQLgrey-1.8.0
X-Greylist: from auto-whitelisted by SQLgrey-1.8.0
Received: from mail-qk0-x22e.google.com (mail-qk0-x22e.google.com [IPv6:2607:f8b0:400d:c09::22e]) by mork.alvestrand.no (Postfix) with ESMTPS id C86B67CAC68 for <idna-update@alvestrand.no>; Wed, 28 Sep 2016 16:37:41 +0200 (CEST)
Received: by mail-qk0-x22e.google.com with SMTP id j129so40902819qkd.1 for <idna-update@alvestrand.no>; Wed, 28 Sep 2016 07:37:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sparkpost.com; s=google; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=y64lWkmkKdLlRJguANZT7V8LkpU/eKr64bBZdKQYl70=; b=ZnlhrENNRY/PKRLzaXVMxj5+tY5hN1TNUKIL/9S1F7LNX4f4kc917rbD1a7cA6wNqA GUS4dW5e/6lmGTcw/xQqU+wQfZl1QAUmyC3vkAHhRJQil0P60KekDK8pQRJpcaAq/jNs /RqTjxC+BaUiuN58q8nwjIc+lO4JIMxST2lfk=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=y64lWkmkKdLlRJguANZT7V8LkpU/eKr64bBZdKQYl70=; b=ZejBtcClwkPfFEuDuwEYzow36vjA+1fU/xlyz2itZBvoE5Z1WkWdkRnb7XLZsRVpD5 QiY317drTyZSwCacZuVoIaD035T4pCL1VuQN80ekUf//1n8z22QwvWO0z4999Aog+/d/ QLlqH68U4wFCpwOzfCoCtQmM3FSc1TePuipbblkDDQwoQa6tDK32CIkz20EHfaaKPuuP 1j2lK/QcGEdhXJyHDclGrHkxtuHvv9P7UsTYod/hiRPARQzhMSRNEQzygUynop8bPjXL Gw4GErDvRo+jG8KpFJe1/PSGhqxI5IvQCpSYZUiqJypzy/49rc2UjRfMXKDFz3BW11Wh GQxg==
X-Gm-Message-State: AA6/9RnNZfX7KbDeT+4gstXT9s7xAvJv7wLTCJ8JoVAQ7siJ6eB6hZX9HP7wedtA1tRzvh+j
X-Received: by 10.55.25.29 with SMTP id k29mr20867592qkh.296.1475070246023; Wed, 28 Sep 2016 06:44:06 -0700 (PDT)
Received: from [192.168.88.12] (173-9-94-25-NewEngland.hfc.comcastbusiness.net. [173.9.94.25]) by smtp.gmail.com with ESMTPSA id l22sm3881032qtl.34.2016.09.28.06.44.03 (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Wed, 28 Sep 2016 06:44:04 -0700 (PDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
Subject: Re: [Technical Errata Reported] RFC5890 (4695)
From: Juan Altmayer Pizzorno <juan@sparkpost.com>
In-Reply-To: <d3c26d83-15bb-4a8c-25d2-72a69d5a7669@it.aoyama.ac.jp>
Date: Wed, 28 Sep 2016 09:44:03 -0400
Content-Transfer-Encoding: quoted-printable
Message-Id: <058F1697-DF1D-40EC-9C88-3A79E55F1A21@sparkpost.com>
References: <20160517155800.74B3D180004@rfc-editor.org> <d3c26d83-15bb-4a8c-25d2-72a69d5a7669@it.aoyama.ac.jp>
To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>
X-Mailer: Apple Mail (2.3124)
Cc: ben@nostrum.com, aamelnikov@fastmail.fm, alissa@cooperw.in, Harald Tveit Alvestrand <harald@alvestrand.no>, john+ietf@jck.com, idna-update@alvestrand.no, vint@google.com, RFC Errata System <rfc-editor@rfc-editor.org>
X-BeenThere: idna-update@alvestrand.no
X-Mailman-Version: 2.1.16
Precedence: list
List-Id: IDNA update work <idna-update.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/options/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/idna-update/>
List-Post: <mailto:idna-update@alvestrand.no>
List-Help: <mailto:idna-update-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Wed, 28 Sep 2016 14:37:46 -0000
Hi! I think it would be good to move these errata ahead, be it as “verified” or with alternate wording: the issue of the required buffer sizes came up while my team implemented SMTPUTF8, and these (incorrect) sizes given in the RFC created confusion. .. Juan > On May 22, 2016, at 2:40 AM, Martin J. Dürst <duerst@it.aoyama.ac.jp> wrote: > > [What I say below also applies to erratum 4696. If it's desirable to reply to that with the same comment, please let me know.] > > I believe that Juan is essentially right. > > This has come up before before, and possibly already noted by John Klensin for fixing in an eventual update. > > I provided some more detailed calculations with examples in the mail to idna-update@alvestrand.no with the following identifying details: > Message-ID: <4AACA7E6.1070503@it.aoyama.ac.jp> > Date: Sun, 13 Sep 2009 17:05:58 +0900 > > Unfortunately, when I currently try to access the archive at > http://www.alvestrand.no/pipermail/idna-update/ from https://www.ietf.org/wg/concluded/idnabis.html, I get the following: > > ---- > Forbidden > > You don't have permission to access /pipermail/idna-update/ on this server. > > Apache/2.4.7 (Ubuntu) Server at www.alvestrand.no Port 80 > ---- > > I have cc'ed Harald in the hope that the archive can be fixed soon. > > > I'm coping the relevant part of that mail here: > > >>>>>>>> > Here are my calculations. After a few tests, one finds out that punycode > uses a single 'a' to express 'one more of the same character'. The > question is then how many characters it takes punycode to express the > first character. Expressing that first character takes more and more > punycode characters as its Unicode number gets higher, so one has to > test with the smallest Unicode character that needs a certain number of > bytes in UTF-8. Going through lengths 1,2,3, and 4 per character in > UTF-8, we find: > > 1 octet per character in UTF-8: > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org gives > aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org > and has 63 characters, so 63 octets in UTF-8, 126 octets in UTF-16, and > 252 octets in UTF-32. > > 2 octets per character in UTF-8: > ¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢¢.org gives > xn--8aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org > and has 58 characters, so 116 octets in UTF-8, 116 octets in UTF-16, and > 232 octets in UTF-32. 59 seems possible in theory, but impossible in > practice. > > ँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँँ.org (using the currently lowest encoded character that needs 3 bytes, > U+0901, DEVANAGARI SIGN CANDRABINDU), gives > xn--h1baaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org > and has 57 characters, so 171 octets in UTF-8, 114 octets in UTF-16, and > 228 octets in UTF-32. Please note that even characters in the U+0800 > range would need that much, because already a character such as 'ü' > needs that much. > > Trying to assess how many characters one could use of > 𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀𐌀.org > (using U+10300, OLD ITALIC LETTER A, the lowest character in Unicode 3.2 > that needs 4 bytes in UTF-8) gives > xn--097caaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.org > and has 56 characters, so 224 octets in UTF-8, 224 octets in UTF-16, and > 224 octets in UTF-32. > > Overall, we get a maximum label length in octets of 252 octets for > UTF-32 (with US-ASCII), and 224 octets in UTF-8 and UTF-16 (with Old > Italic and the like). > >>>>>>>> > > Regards, Martin. > > On 2016/05/18 00:58, RFC Errata System wrote: >> The following errata report has been submitted for RFC5890, >> "Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework". >> >> -------------------------------------- >> You may review the report below and at: >> http://www.rfc-editor.org/errata_search.php?rfc=5890&eid=4695 >> >> -------------------------------------- >> Type: Technical >> Reported by: Juan Altmayer Pizzorno <juan@sparkpost.com> >> >> Section: 2.3.2.1 >> >> Original Text >> ------------- >> expansion of the A-label form to a U-label may produce strings that are >> much longer than the normal 63 octet DNS limit (potentially up to 252 >> characters) >> >> Corrected Text >> -------------- >> expansion of the A-label form to a U-label may produce strings that are >> much longer than the normal 63 octet DNS limit (potentially up to 59 >> Unicode code points or 236 octets) >> >> Notes >> ----- >> The contents of U-labels are encoded in the up to 59 ASCII characters (see 2.3.2.1 itself) >> output by the Punycode algorithm in their corresponding A-labels. The Punycode >> decoder (https://tools.ietf.org/html/rfc3492#section-6.2) consumes at least one >> of those ASCII characters for each code point inserted into the U-label. An U-label, >> thus, can contain at the most 59 Unicode code points. >> >> Since U-labels are defined (in 2.3.2.1) to be expressed in a standard Unicode Encoding >> Form, and UTF-32, UTF-16 and UTF-8 (as revised by RFC3629) all can encode a code >> point in at most 4 octets, 236 octets is an upper bound for an U-label's length. >> >> I think it should be possible to derive a tighter bound, but its rationale would likely be >> less straighforward. >> >> I imagine the number 252 was originally derived by multiplying 63, the maximum >> length of an A-label (including the "xn--" prefix), by 4, the maximum number of >> octets needed to represent a code point. >> >> Instructions: >> ------------- >> This erratum is currently posted as "Reported". If necessary, please >> use "Reply All" to discuss whether it should be verified or >> rejected. When a decision is reached, the verifying party (IESG) >> can log in to change the status and edit the report, if necessary. >> >> -------------------------------------- >> RFC5890 (draft-ietf-idnabis-defs-13) >> -------------------------------------- >> Title : Internationalized Domain Names for Applications (IDNA): Definitions and Document Framework >> Publication Date : August 2010 >> Author(s) : J. Klensin >> Category : PROPOSED STANDARD >> Source : Internationalized Domain Names in Applications (Revised) >> Area : Applications >> Stream : IETF >> Verifying Party : IESG >> >> _______________________________________________ >> Idna-update mailing list >> Idna-update@alvestrand.no >> http://www.alvestrand.no/mailman/listinfo/idna-update >>
- Re: [Technical Errata Reported] RFC5890 (4695) Martin J. Dürst
- [Technical Errata Reported] RFC5890 (4695) RFC Errata System
- Re: [Technical Errata Reported] RFC5890 (4695) John C Klensin
- Re: [Technical Errata Reported] RFC5890 (4695) Juan Altmayer Pizzorno
- Re: [Technical Errata Reported] RFC5890 (4695) Juan Altmayer Pizzorno
- Re: [Technical Errata Reported] RFC5890 (4695) Juan Altmayer Pizzorno
- Re: [Technical Errata Reported] RFC5890 (4695) Vint Cerf