[EAI] [IETF] Homographic Attacks [was: Internationalized Email Internet Draft]
<nalini.elkins@insidethestack.com> Fri, 14 October 2016 13:50 UTC
Return-Path: <nalini.elkins@insidethestack.com>
X-Original-To: ima@ietfa.amsl.com
Delivered-To: ima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 98FAC12940E for <ima@ietfa.amsl.com>; Fri, 14 Oct 2016 06:50:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.42
X-Spam-Level:
X-Spam-Status: No, score=-1.42 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id NbD6PBM1iSdL for <ima@ietfa.amsl.com>; Fri, 14 Oct 2016 06:50:43 -0700 (PDT)
Received: from nm27-vm5.bullet.mail.ne1.yahoo.com (nm27-vm5.bullet.mail.ne1.yahoo.com [98.138.91.249]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DBE2912977C for <ima@ietf.org>; Fri, 14 Oct 2016 06:50:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1476453031; bh=VXm9ExJmW3HUVqfQj5hnr1pPbgp+7vCPJPEHsftJYVs=; h=Date:From:Reply-To:To:In-Reply-To:References:Subject:From:Subject; b=KZbYr90usBzU+tfVTv5FIfzQsfnh8QWXwQNZwg1NPo/NdVhG26MCtu4gkvSs4WSi313J9Xbdl7rHZF5r7GfgW5iggzlpyBy1zyTFx8LxuCWsNy5wFkSyIxAzKBiDnsm17DXUgGfb/17sEe4R9cFjB5J+oFunpPXfLJulh9nGx07SeL9kZHjwn8mYwrGasu17zR2eWf0TGjqU9tKDinutkQWiM1yJ0WQC/CX16W6ZUyr5ad0zo4C367MpwHcaxVUVq/SjvbaC+1BTXalaNP68wskAFj2LNR1TBYovptRO+cgTjrjzNjkvA2+yBmhDqP8dvFH/JKRXe2bIVfFo7+SXhg==
Received: from [98.138.226.176] by nm27.bullet.mail.ne1.yahoo.com with NNFMP; 14 Oct 2016 13:50:31 -0000
Received: from [98.138.89.160] by tm11.bullet.mail.ne1.yahoo.com with NNFMP; 14 Oct 2016 13:50:31 -0000
Received: from [127.0.0.1] by omp1016.mail.ne1.yahoo.com with NNFMP; 14 Oct 2016 13:50:31 -0000
X-Yahoo-Newman-Property: ymail-3
X-Yahoo-Newman-Id: 288319.20617.bm@omp1016.mail.ne1.yahoo.com
X-YMail-OSG: bYKieZsVM1mIaDe2uhX2_QbAGYEGlx40l8Atj5Z6JswtmEMYQfed2aR_Uirfj7c V4qwtJ2mgHRYN0p0MOJ4phma4_DSJaX.agl0UB0.pCS9skpzFHznGJvEPqKVs91NR4Z63_xh3Xhs XD7XxzuPqPlRc36ExTeeKpc_JCpdrGyXhNXk4ED3WYYXRVJJcFXWEYGJG.NL7CA8wxlHFWSKEQHW uklRPlU7mH1Zb9yI8RNC1E9v_MLE4_RiDsW6Lb7pEAXKBN1YkYwy6rOw7zTl7s6PZ5tBfSJUlPBB mT9narshQQj0VRMuzA.jy2w59LOfMhF3Yl6L4Ene7n9O5rJmxqH3GwHwQR9rTK2HQxM_sLD2QCoW aPymxioMDQt2jq2_90A82Ocrh86TJYxVrmA3xiotKX5JOnj2UntcnaSPimjpFP4SqZS8tePGU8mf .nqu1uQRYXNzY19UIHHpVqWHo7jSk9GuzmacYmx.nRzeQXwfv6Md3afroeRDZ2vKZg8UfEC9cTY. 76YMNH0geQkF_gcmkXNdySB15Y9WE2nT6zwz0jZ7U66pFE5zLTMw-
Received: from jws200148.mail.ne1.yahoo.com by sendmailws141.mail.ne1.yahoo.com; Fri, 14 Oct 2016 13:50:30 +0000; 1476453030.825
Date: Fri, 14 Oct 2016 13:50:30 +0000
From: nalini.elkins@insidethestack.com
To: John C Klensin <klensin@jck.com>, "HANSEN, TONY L" <tony@att.com>, "ima@ietf.org" <ima@ietf.org>
Message-ID: <648244759.241596.1476453030456@mail.yahoo.com>
In-Reply-To: <E125B6AC26988823306936BF@JcK-HP5.jck.com>
References: <20161006055447.32573.qmail@pro-236-157.rediffmailpro.com> <9EC0EB65-9C58-43ED-9A80-1DA32C58E3E0@att.com> <E125B6AC26988823306936BF@JcK-HP5.jck.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/ima/JE2iADZm0ZfJRYXhO8XkNcf3baA>
Subject: [EAI] [IETF] Homographic Attacks [was: Internationalized Email Internet Draft]
X-BeenThere: ima@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: nalini.elkins@insidethestack.com
List-Id: "EAI \(Email Address Internationalization\)" <ima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ima>, <mailto:ima-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ima/>
List-Post: <mailto:ima@ietf.org>
List-Help: <mailto:ima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ima>, <mailto:ima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 14 Oct 2016 13:50:48 -0000
John, I am splitting your comment #2 into two parts: Homographic attacks and display of email addresses. First, homographic attacks: >(2) Within an address, there is, as the I-D points out and consistent with RFC 5321, a local part and a domain part. RFCs 6530 and 6531 make it quite clear (at least we thought they >did) that they are handled differently. For the domain part, the rules are laid out in the IDNA2008 specs (RFC 5890ff). Issues about look-alike characters have been extensively >discussed and written about (even though some of us have questioned the quality of some of that work). It does not seem useful to me to revisit those issues here, especially without >reference to the prior work and discussions or if some of the discussion here is wrong or contains obvious omissions. We can certainly refer to the prior work. I will contact you offline for your kind assistance. But, I think that the area of homographic attacks is definitely worth talking about. If I might use the example of IPv6, I think at the IETF people think thought that IPv6 was very well defined but operationally, in the "real" world, many people had (and still have!) absolutely no idea how it works. Many people have NO idea of how homographic attacks work and as we go forth into a truly internationalized Internet then this is very definitely a topic that needs to be discussed - and often! >As an example from the first paragraph of Section 6.1, Latin "c" (U+0063) and Cyrillic "c" (U+0441) are typically written with identical graphemes, butare not on the list. More >important, while the "paypal" example with U+0430 substituted for "a" (U+0061) has been used repeatedly, including in a careful study in an article that is not cited in this draft, Will contact you for reference >it is possible to write "раура1" with the first five characters in Cyrillic and the last one a digit (which is script independent) (\u'0440'\u'0430'\u'0443'\u'0440'\u'040'\u'0031' [1]), therefore>not even violating conventions prohibiting mixed-script labels. There is, of course, no ambiguity in the A-label form, although the authors quite properly point out that it is not >user-friendly. Indeed. I think this topic merits much discussion including the area of what the ICANN rules REALLY are in this area and whether registrars are complying. As we get "new blood" into this area, then some of the topics that are so familiar to you, Tony, and many of the long time members of this WG will need to be revisited as the rest of us try to catch up to you. But, I hope it will be worth it for the new energy that new people bring. Not to mention that as concepts proceed into operations, issues that no one had thought of surface. Thanks, Nalini Elkins Inside Products, Inc. www.insidethestack.com (831) 659-8360 ________________________________ From: John C Klensin <klensin@jck.com> To: "HANSEN, TONY L" <tony@att.com>; ima@ietf.org Sent: Sunday, October 9, 2016 7:57 PM Subject: Re: [EAI] [IETF] Internationalized Email Internet Draft --On Thursday, October 06, 2016 4:39 PM +0000 "HANSEN, TONY L" <tony@att.com> wrote: > I think getting deployment feedback from EAI is important, and > this draft is an excellent start. > > I'm not convinced that section 1.2 describes a real problem. > People do this all the time today with various combinations of > languages. Why is the combination of Russian and Chinese any > different? If you think it is, then please expand on the > aspect that does make it more difficult. > > I forwarded a number of nits to the authors. Hi. I was going to hold off until some later and more mature version of this draft, but since Tony has commented, while I believe the issues with EAI deployment are important, I see several problems with this draft, some of which were actually discussed in the WG but appear to be ignored here. Perhaps more important, it is seriously incomplete relative to issues that have been discussed at great length in the EAI WG, at the APEC meeting on internationalized email in Beijing in October 2014, the May 2015 workshop in Thailand, and elsewhere. I strongly suggest that, if there is going to be a discussion in Seoul, this document is in need of a great deal of work first. Some of those issues are: (1) The so-called EAI standards, as listed in the Introduction, are about email envelope and header information presented directly (e.g., in UTF-8) as non-ASCII characters. A good deal of the document appears to address mail content information such as textual message bodies, in other scripts. With the possible exception of language selection when a message is sent with the same basic text in several languages (multipart/alternative was designed with that case in mind but have been used in other ways), we thought we solved that content problem with MIME in 1992. If MIME is inadequate, the authors or others should produce a document explaining the issues and not confuse them with EAI / SMTPUTF8. If it is adequate, then, like Tony although perhaps for different reasons, I don't see what Section 1.2 is doing here, what the relevance of Section 3.2 is, and several other statements should be examined carefully to be sure they are talking about addresses and/or headers and not content. (2) Within an address, there is, as the I-D points out and consistent with RFC 5321, a local part and a domain part. RFCs 6530 and 6531 make it quite clear (at least we thought they did) that they are handled differently. For the domain part, the rules are laid out in the IDNA2008 specs (RFC 5890ff). Issues about look-alike characters have been extensively discussed and written about (even though some of us have questioned the quality of some of that work). It does not seem useful to me to revisit those issues here, especially without reference to the prior work and discussions or if some of the discussion here is wrong or contains obvious omissions. As an example from the first paragraph of Section 6.1, Latin "c" (U+0063) and Cyrillic "c" (U+0441) are typically written with identical graphemes, but are not on the list. More important, while the "paypal" example with U+0430 substituted for "a" (U+0061) has been used repeatedly, including in a careful study in an article that is not cited in this draft, it is possible to write "раура1" with the first five characters in Cyrillic and the last one a digit (which is script independent) (\u'0440'\u'0430'\u'0443'\u'0440'\u'040'\u'0031' [1]), therefore not even violating conventions prohibiting mixed-script labels. There is, of course, no ambiguity in the A-label form, although the authors quite properly point out that it is not user-friendly. By contrast, Section 1.1 talks about display of email addresses, including the local part ("in Punycode" [2]). While a mail delivery server is free to create whatever aliases for a mailbox local part it likes, including "xn-t2bmh3a" or "123456", "george" or "example", in general converting a local part using the Punycode algorithm and displaying the result is prohibited by the EAI standards (and, incidentally, RFC5321). More important, it will often lose information and is potentially very dangerous. (3) Arabic should not be confused with a strictly right-to-left writing system. I am not aware of any such systems in wide use for contemporary languages today. The problem is that numerals, whether written in European digits, Arabic or Arabic-Indic digits, Chinese (Han) digits, or many others, have been written left to right since that type of positional notation was invented and became widely used. As a result, the scripts are referred to (in Unicode-speak) as "bidirectional" or "bidi" [3]. Their implications for domain names and IDNA are the subject of RFC 5893. (4) Multiple addresses for one user (and Section 4). Keeping in mind that many people maintain a number of identities, and even multiple email addresses, for different purposes, I don't understand what point you are trying to make with this section. Many of us believe that users who have mailboxes whose names involve non-ASCII local parts and who engage in communications outside their primary language group will find it necessary to maintain either separate all-ASCII mailboxes or all-ASCII aliases to their primary mailboxes and to do so for a very long time. That issue has been extensively analyzed and discussed but this document avoids that work, which is both a problem and an opportunity. (5) Section 2.1 asserts that email servers), implying all of them, store data (messages?) in relational databases. That is simply false. Some do; others don't. Even for those that do, there may be a difference between Unicode-capable data storage and Unicode-capable keys or indexes. There is also absolutely no requirement that any such system store Unicode strings encoded in UTF-8; many do not. (6) There is a necessary difficulty with SMTPUTF8, which is that one cannot transmit a message with non-ASCII characters in addresses or headers to a system that does not support them. Final delivery systems should probably not accept messages unless they have reason to predict that the mail store will handle them _and_ that the user associated with the target mailbox will be able to retrieve them. Since a user with an all-ASCII mailbox name might still receive a message with, e.g., a non-ASCII backward-pointing address in the envelope or headers, making that decision is not straightforward. That leads to a strong case that, if one wants broad deployment of SMTPUTF8, the place to start is with the MUAs (including the Webmail systems) and associated POP and IMAP servers and clients. The "to various extents" list in the first part of Section 3 is not particularly helpful in that regard. (7) Finally, this is an internationalization (i18n) problem as much as it is an email problem. Terminology (and, where characters or code points are referred to, their precise identification) is very important because the alternative is typically a good deal of user confusion about what you are talking about and other impediments to making progress. Saying "English" were you mean "Basic Latin Script" or "ASCII" is not helpful, especially given that 5321 local parts can include any ASCII character and that ASCII is not sufficient to write English. Conversely, it appears that there are a few places where, correctly or incorrectly, you really do mean "English" when you say that. Similarly, talking about one particular encoding when you mean "Unicode" is confusing and may be misleading. RFC 6365 may give you a start on some of the issues. regards, john ------------- [1] I recommend the authors have a look at RFC 5137. [2] Punycode is an encoding method, not a display format. See RFC 5890, Section 2.3.4. [3] http://unicode.org/reports/tr9/ _______________________________________________ IMA mailing list IMA@ietf.org https://www.ietf.org/mailman/listinfo/ima
- [EAI] [IETF] Internationalized Email Internet Dra… Harish Chowdhary
- Re: [EAI] [IETF] Internationalized Email Internet… HANSEN, TONY L
- Re: [EAI] [IETF] Internationalized Email Internet… John C Klensin
- Re: [EAI] [IETF] Internationalized Email Internet… nalini.elkins
- [EAI] [IETF] Content Issues [ was: Internationali… nalini.elkins
- [EAI] [IETF] Homographic Attacks [was: Internatio… nalini.elkins
- [EAI] [IETF] Display of Email Addresses [was: Int… nalini.elkins
- [EAI] [IETF] Arabic / Bidirectional Writing Syste… nalini.elkins
- [EAI] [IETF] Multiple Addresses [ was: Internatio… nalini.elkins
- Re: [EAI] [IETF] Display of Email Addresses [was:… Andrew Sullivan
- [EAI] [IETF] Relational Databases: UTF8 [was: Int… nalini.elkins
- Re: [EAI] [IETF] Display of Email Addresses [was:… nalini.elkins
- [EAI] [IETF] Migration / Backward Compatibility [… nalini.elkins
- [EAI] [IETF] Terminology [was: Internationalized … nalini.elkins
- [EAI] General issues and strategy (was: Re: Conte… John C Klensin
- Re: [EAI] [IETF] Internationalized Email Internet… nalini.elkins
- Re: [EAI] General issues and strategy (was: Re: C… nalini.elkins
- Re: [EAI] [IETF] Display of Email Addresses [was:… John C Klensin
- Re: [EAI] [IETF] Content Issues [ was: Internatio… Franck Martin
- Re: [EAI] [IETF] Content Issues [ was: Internatio… John C Klensin
- Re: [EAI] [IETF] Multiple Addresses [ was: Intern… John Bucy
- Re: [EAI] [IETF] Multiple Addresses [ was: Intern… nalini.elkins
- Re: [EAI] [IETF] Multiple Addresses [ was: Intern… John C Klensin