Re: [Idna-update] IDNA and combining sequences (was: Re: Expiration impending: <draft-klensin-idna-rfc5891bis-01.txt>)
"Asmus Freytag (c)" <asmusf@ix.netcom.com> Thu, 15 March 2018 21:01 UTC
Return-Path: <asmusf@ix.netcom.com>
X-Original-To: idna-update@ietfa.amsl.com
Delivered-To: idna-update@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id C92B0126DED
for <idna-update@ietfa.amsl.com>; Thu, 15 Mar 2018 14:01:28 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.72
X-Spam-Level:
X-Spam-Status: No, score=-2.72 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01,
RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001]
autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key)
header.d=ix.netcom.com; domainkeys=pass (2048-bit key)
header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id IBfzTczPpzXw for <idna-update@ietfa.amsl.com>;
Thu, 15 Mar 2018 14:01:26 -0700 (PDT)
Received: from elasmtp-dupuy.atl.sa.earthlink.net
(elasmtp-dupuy.atl.sa.earthlink.net [209.86.89.62])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id EF8301200F1
for <idna-update@ietf.org>; Thu, 15 Mar 2018 14:01:25 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com;
s=dk12062016; t=1521147686; bh=6Az7P/gk/fKfCV/hEssmTP0KShmbxz2QoY9x
yFZQv2c=; h=Received:Subject:To:References:From:Message-ID:Date:
User-Agent:MIME-Version:In-Reply-To:Content-Type:
Content-Transfer-Encoding:Content-Language:X-ELNK-Trace:
X-Originating-IP; b=Ft4nGlYDqT0vN16Xs4yHO+nAeU2WzAdnEP/IfWbE2d7L+M
9FfPIR/ws/PyUP2EMZX0Qa7fiYZJxpttO+rLxsJz4xbRlg4xo7C1vkyRk2d5qEpT6q6
h02P6+ieamgbg9BLTnxEUCPP3LOzaTZ4boRJO/Q5PNCE8aAJoIhloZ9JOxPygtbMMRw
ZIae9vHHp6woFRlAM3v0ZpLWYiE9vEAko/kHAxDaFsqD8TSD/h7pnCTUaOcHxZ1DTn7
HsrS+REkVTCC3zwnnrC9ybEbszuHh+nCXwvm/fLxIfL9R65dNdlOY8GH4/NsgPR6Eu2
m0KOyj/xmdpx5nD9Hx0GrFpMc78g==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com;
b=iYVdv9e2xL4RhwOnQAHEEmfddXsFD1y3GWocbxfOBFcuHLI7LrjlnwAEsdowX1DeYMPiALro/lUDnsNKw+q4PKB/O/Gf4cn3xmDhlM5TotPhuaq71EGCpr/CYdyJYaqY3kbu7zvv8IAGLx3bnk+jSi2Tutbj4590nfgS4dfVqzWC16EsVKG5yYF4l3BNsEDMgtuyVJecixSNju8scIHwW+66Pb6qPvJnakWroP5vcv6kMfgxW0YeTI9CF2RzS5HFfveqskTR1eRDWJHNODwulYwJwkgTs/+B+2f1O4Xa7FUvXBg+h7QSWaYZdeqmC10k8gJv5zDtLPGlif3HdAf8lw==;
h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Transfer-Encoding:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [173.244.217.91] (helo=[10.104.138.162])
by elasmtp-dupuy.atl.sa.earthlink.net with esmtpa (Exim 4)
(envelope-from <asmusf@ix.netcom.com>)
id 1ewa03-000DGc-Mp; Thu, 15 Mar 2018 17:01:23 -0400
To: John Levine <johnl@taugh.com>, idna-update@ietf.org
References: <20180315194256.CF68F22C3BA1@ary.local>
From: "Asmus Freytag (c)" <asmusf@ix.netcom.com>
Message-ID: <647d97cf-4b2a-c5f6-194b-c6887c5e4947@ix.netcom.com>
Date: Thu, 15 Mar 2018 14:01:28 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101
Thunderbird/52.6.0
MIME-Version: 1.0
In-Reply-To: <20180315194256.CF68F22C3BA1@ary.local>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b2c1627926350bb93f7ee6915665675388b3e2c5e54062abd1350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 173.244.217.91
Archived-At: <https://mailarchive.ietf.org/arch/msg/idna-update/cxqOqiyjjyIPt4hHRn3mXOauU_g>
Subject: Re: [Idna-update] IDNA and combining sequences (was: Re: Expiration
impending: <draft-klensin-idna-rfc5891bis-01.txt>)
X-BeenThere: idna-update@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: "Internationalized Domain Names in Applications \(IDNA\)
implementation and update discussions" <idna-update.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/idna-update>,
<mailto:idna-update-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/idna-update/>
List-Post: <mailto:idna-update@ietf.org>
List-Help: <mailto:idna-update-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/idna-update>,
<mailto:idna-update-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 15 Mar 2018 21:01:29 -0000
On 3/15/2018 12:42 PM, John Levine wrote: > In article <1420573e-4d9d-7853-ffdc-7fc7c2290598@ix.netcom.com> you write: >>> I think I can follow them OK. The characters each are characterized >>> as consonant, various types of vowels, tones, and diacritics. The >>> ordering rules would make more sense to me as regular expressions. >> Check out the bottom of the HTML version of each scripts LGR file. >> >> Click on >> >> https://www.icann.org/sites/default/files/lgr/lgr-2-thai-script-01jun17-en.html >> >> and go to WLE Rules. You should see a table of regexes. > I was more thinking of regexes for an entire valid string, not for > each rule, but close enough. John, the rules are designed this way for a purpose. The reason is that the most common scenario is the need to prevent some Y from following some X, where the Y is typically a combining mark. Sometimes, script rendering will "ligate" certain combinations, but fail if incompatible ones are encountered; in that case you may see right had contexts or both left and right contexts. We originally thought we could start for the Indic scripts with an Akshar (syllable) for which there exist BNF definitions. A label would then be a concatenation of valid Akshars. The attempt resulted in horrifically complex looking rules that were impossible to review and tended to be overly restrictive: there are many strings that are a bit unusual/uncommon, but not "broken". Sometimes the available Akshar rules were language-specific. For all of these reasons, but mainly to be able to break down things into easily understood bits, we ended up with the system we have now. Now, given that an XML per RFC 7940 is fully machine parsable, it could be an exercise to the reader to compile a regex for full labels. I think that should be possible, however, I've not attempted to prove this - there may be some syntax construction that is difficult/impossible to translate into a regex for the whole lable. All I know is that I can translate each of the context rules to regexes (that's how the tool I wrote evaluates labels based on an LGR), but I apply them iteratively at each character position in the string rather than using the regex mechanism. A./
- [Idna-update] FWD: Expiration impending: <draft-k… John C Klensin
- Re: [Idna-update] [Ext] FWD: Expiration impending… Kim Davies
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… John R. Levine
- Re: [Idna-update] [Ext] FWD: Expiration impending… Suzanne Woolf
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] FWD: Expiration impending: <dra… Francisco Arias
- Re: [Idna-update] [Ext] FWD: Expiration impending… John C Klensin
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] [Ext] FWD: Expiration impending… Andrew Sullivan
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] [Ext] FWD: Expiration impending… Asmus Freytag
- Re: [Idna-update] [Ext] FWD: Expiration impending… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Francisco Arias
- Re: [Idna-update] Expiration impending: <draft-kl… Patrik Fältström
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Andrew Sullivan
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… Andrew Sullivan
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- [Idna-update] IDNA and combining sequences (was: … John C Klensin
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… Patrik Fältström
- Re: [Idna-update] IDNA and combining sequences (w… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… Mark Davis ☕️
- Re: [Idna-update] IDNA and combining sequences Asmus Freytag (c)
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences Asmus Freytag (c)
- Re: [Idna-update] Expiration impending: <draft-kl… Asmus Freytag
- Re: [Idna-update] IDNA and combining sequences Patrik Fältström
- Re: [Idna-update] IDNA and combining sequences John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John R Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag (c)
- Re: [Idna-update] Expiration impending: <draft-kl… John C Klensin
- Re: [Idna-update] IDNA and combining sequences (w… John Levine
- Re: [Idna-update] IDNA and combining sequences (w… Asmus Freytag (c)
- Re: [Idna-update] IDNA and combining sequences (w… John R Levine