RE: Lower casing
Shawn Steele <Shawn.Steele@microsoft.com> Sat, 29 January 2011 18:13 UTC
Return-Path: <Shawn.Steele@microsoft.com>
X-Original-To: idna-update@alvestrand.no
Delivered-To: idna-update@alvestrand.no
Received: from localhost (localhost [127.0.0.1]) by eikenes.alvestrand.no (Postfix) with ESMTP id B9F0439E13B for <idna-update@alvestrand.no>; Sat, 29 Jan 2011 19:13:03 +0100 (CET)
X-Virus-Scanned: Debian amavisd-new at eikenes.alvestrand.no
Received: from eikenes.alvestrand.no ([127.0.0.1]) by localhost (eikenes.alvestrand.no [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LeaCkx0PVf22 for <idna-update@alvestrand.no>; Sat, 29 Jan 2011 19:12:58 +0100 (CET)
X-Greylist: from auto-whitelisted by SQLgrey-1.6.8
Received: from smtp.microsoft.com (smtp.microsoft.com [131.107.115.215]) by eikenes.alvestrand.no (Postfix) with ESMTPS id 646D339E0FC for <idna-update@alvestrand.no>; Sat, 29 Jan 2011 19:12:58 +0100 (CET)
Received: from TK5EX14MLTC104.redmond.corp.microsoft.com (157.54.79.159) by TK5-EXGWY-E802.partners.extranet.microsoft.com (10.251.56.168) with Microsoft SMTP Server (TLS) id 8.2.176.0; Sat, 29 Jan 2011 10:13:24 -0800
Received: from TK5EX14MBXC133.redmond.corp.microsoft.com ([169.254.2.63]) by TK5EX14MLTC104.redmond.corp.microsoft.com ([157.54.79.159]) with mapi id 14.01.0270.002; Sat, 29 Jan 2011 10:13:24 -0800
From: Shawn Steele <Shawn.Steele@microsoft.com>
To: John C Klensin <klensin@jck.com>, Mark Davis ☕ <mark@macchiato.com>
Subject: RE: Lower casing
Thread-Topic: Lower casing
Thread-Index: AQHLvhRbwjCNdCZt+UCpzaBuuPPbpJPlkC2AgAArWICAAC8ugIABQeGggAAAs0CAAYLGgP//gaSh
Date: Sat, 29 Jan 2011 18:13:24 +0000
Message-ID: <E14011F8737B524BB564B05FF748464A11C8AF59@TK5EX14MBXC133.redmond.corp.microsoft.com>
References: <8762u4o1ty.fsf@latte.josefsson.org> <AANLkTin5CYOt=h6FsMsAQXQnjC-V+LjCmkS1_Dk96PT-@mail.gmail.com> <87d3nkwqy4.fsf@latte.josefsson.org> <AANLkTi=6LLkiVRGG9S9VAC5_4EQst+HfvP7F67OnJnpt@mail.gmail.com> <87sjweppa0.fsf_-_@latte.josefsson.org> <AANLkTikU+p+AMWf3RzxfxTqZQnOK-tk397Mfs8E3wzdh@mail.gmail.com> <4F23BD940E50BCA3F707ADEA@PST.JCK.COM> <AANLkTimsK3HPp-3iy=NnC_-LLHJsxP2fFTGp2198Nnss@mail.gmail.com> <E14011F8737B524BB564B05FF748464A11C899E7@TK5EX14MBXC133.redmond.corp.microsoft.com>, <2661898E90BEF63FFC0A7092@[192.168.1.128]>
In-Reply-To: <2661898E90BEF63FFC0A7092@[192.168.1.128]>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [157.54.123.12]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Cc: Simon Josefsson <simon@josefsson.org>, "idna-update@alvestrand.no" <idna-update@alvestrand.no>, Peter Constable <petercon@microsoft.com>, Dave Thaler <dthaler@microsoft.com>
X-BeenThere: idna-update@alvestrand.no
X-Mailman-Version: 2.1.13
Precedence: list
List-Id: IDNA update work <idna-update.alvestrand.no>
List-Unsubscribe: <http://www.alvestrand.no/mailman/options/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=unsubscribe>
List-Archive: <http://www.alvestrand.no/pipermail/idna-update>
List-Post: <mailto:idna-update@alvestrand.no>
List-Help: <mailto:idna-update-request@alvestrand.no?subject=help>
List-Subscribe: <http://www.alvestrand.no/mailman/listinfo/idna-update>, <mailto:idna-update-request@alvestrand.no?subject=subscribe>
X-List-Received-Date: Sat, 29 Jan 2011 18:13:04 -0000
Well, that's why there's UTR#46 :) (Because IDNA2008 didn't allow for compatibility). Mark Davis, myself, and others tried to point out that importance of compatibility and mappings. Note that, for web sites, the eszett, etc. is needed for display, not matching. People would like their display to be correct, however matching cannot change. Here's the problem (actually only one) with just turning on a switch and dropping the old behavior: * I get my bright new shiny browser and go to myßbank.com. Cool! It works in IDNA2008. * Of course there are maybe a billion computers on the planet? * Now I have to take a business trip and need to check my balance. So I go to the local library and visit myßbank.com. It takes me to myssbank.com, which just happens to be spoofing myßbank.com. We also asked that bundling be required to avoid this, but it isn't. I'm not going to make a change that has this severe of a security problem. It's one thing to be spoofed by a typo. It's completely different to be spoofed by the right name. We are agreed that eszett, etc. are necessary for display, but I'm not going to intentionally cause a state where different domain names go to different web sites on different machines. We still have machines running Windows 2000, Windows XP. (And probably older.) If I could guarantee that everyone upgraded their system instantly, or even a month, maybe this could work. Unfortunately the evidence is that there's a very long tail of people that don't take the necessary updates. I am definitely NOT saying that "I know German better than Germans." Eszett is important for proper display of words, in German, especially in Germany. However I see that I cannot register aaa.com for Aardvarks Advocates of Antartica because it's already taken, and even though the current aaa.com owners have nothing to do with Aardvarks (AFAIK). So I have no problem with saying "sorry, fußball.com" is already disallowed because someone has "fussball.com", or the opposite. Sure, there are a few cases where there is a semantic difference between two words in Germany that would be spelled the same in Switzerland, but I thought there we were pretty clear on numerous other threads that DNS labels are intended as conveniences, and not a way to enable every semantic concept in every language. For example, I seriously doubt that the case matching of i and I will ever be change in DNS to allow for the Turkish expectations. Even in English there're concepts that are mutual exclusive but we'd see as different if CamelCased. I believe my company is pretty unanimous on this; I've talked with other experts in IE, DNS infrastructure, and other teams. That includes one of your co-authors of draft-iab-idn-encoding. We cannot implement IDNA2008 without UTS#46, it is a potentially serious security problem. We made that fairly clear before the RFCs were published. It's also clear that we aren't the only company with this belief. I won't speak for them, but would note that Mark wrote UTS#46 and made the comment that started this branch of the thread. I think you miscounted: With native UTF-8 not being normalized or mapped or anything, IDNA2008 by itself would cause three lookup systems, not two: UTF-8, IDNA2003 & IDNA2008. I believe that reconciling UTF-8 lookup with the IDNA2003/UTS#46 mappings is a huge problem that needs to be solved, likely outside the scope of this thread. -Shawn PS: FWIW: My PERSONAL preference would be that there be an additional record that explicitly states the desired display form. Then DNS can be used for matching/lookup (as it should be, and has been), and domain owners could still state their intent, with CamelCasing or other differences that are important to them, which may not even be semantic, but could be. http://blogs.msdn.com/shawnste ________________________________________ From: John C Klensin [klensin@jck.com] Sent: Saturday, January 29, 2011 8:39 AM To: Shawn Steele; Mark Davis ☕ Cc: Simon Josefsson; idna-update@alvestrand.no Subject: RE: Lower casing Shawn, The problem here is that there is no "transition" for those four characters. If browsers and other client systems provide the IDNA2003/ TR46 mapping there are only: -- IDNA2003 behavior forever -- Rolling flag day now -- Rolling flag day at some indefinite point in the future. By "rolling flag day" I mean that a client computer has one behavior or the other on a given day but that not all client systems will convert on the same day (or even in the same month or year). IMO, the reason why the WG was willing to make the change was because of significant input that the ability to distinguish between the characters that are, under UTS#46, source and targets of mappings was important on both input and output (remember that there is a display issue here too because an IDNA2008 A-label that encodes the four characters is essentially invalid under IDNA2003). For those groups for whom the distinction among one or more of those character pairs (including "ignore" as the pair for the Joiner set) actually is important, "register both" is not meaningful: "we are applying the UTS#46 rules, including those for 'deviation' characters" is equivalent to "you lose; we know what your language needs better than you do". It is telling that all of the registries who are focused on those strings and from whom we've received reports (other than the somewhat-conflicting reports about Greek) have basically said "ok, let's do it and get it over with". There is another element of this depending on when the mapping is applied: the "native UTF-8 in lookups outside the public DNS" situation that is addressed in draft-iab-idn-encoding is, in general, UTF-8 without even any normalization, much less encoding. By applying UTS#46 mappings, you compound the problem of having to support two lookup encodings by having strings that are fully-valid and accessible under IDNA2008 _and_ the internal databases/ directories but that are not accessible from your browser (at all for the public DNS and maybe not from the private databases if you guess wrong about when to apply the mapping. That is also another way to look at the "incompetible change" problem, which is that this is either about maintaining compatibility with the public DNS names that were registered or used assuming the IDNA2003 rules and restoring compatibility with the strings that are valid and sensible in those internal databases that you support and encourage. As long as you understand all of those tradeoffs, you should make whatever decisions make sense to you. I'm glad I don't have to make the decision. john --On Saturday, January 29, 2011 1:35 AM +0000 Shawn Steele <Shawn.Steele@microsoft.com> wrote: > (& I've been describing that behavior, including UTS#46 > transitional behavior and mappings, as IDNA2008 + UTS#46 to > make it clear). > > -Shawn > > From: Shawn Steele > Sent: Friday, January 28, 2011 5:34 PM > To: 'Mark Davis ☕'; John C Klensin > Cc: Simon Josefsson; idna-update@alvestrand.no > Subject: RE: Lower casing > > It is worth mentioning that our code will follow the > transitional guidelines, as we will otherwise break existing > IDNA2003 users. Presumably people who want both versions to > work will register both versions.
- IDNA2008 test vectors Simon Josefsson
- Re: IDNA2008 test vectors Mark Davis ☕
- Re: IDNA2008 test vectors Yoshiro YONEYA
- Re: IDNA2008 test vectors Simon Josefsson
- Re: IDNA2008 test vectors Patrik Fältström
- Re: IDNA2008 test vectors Mark Davis ☕
- Lower casing Simon Josefsson
- Re: Lower casing Mark Davis ☕
- Re: Lower casing John C Klensin
- Re: Lower casing Simon Josefsson
- Re: Lower casing Mark Davis ☕
- RE: Lower casing Shawn Steele
- RE: Lower casing Shawn Steele
- RE: Lower casing John C Klensin
- RE: Lower casing Shawn Steele
- RE: Lower casing John C Klensin
- RE: Lower casing Shawn Steele
- RE: Lower casing John C Klensin
- RE: Lower casing Shawn Steele
- RE: Lower casing John C Klensin
- RE: Lower casing J-F C. Morfin
- RE: Lower casing John C Klensin
- Re: IDNA2008 test vectors Simon Josefsson
- Re: IDNA2008 test vectors Simon Josefsson
- Re: IDNA2008 test vectors Mark Davis ☕
- Re: IDNA2008 test vectors Simon Josefsson