Re: [I18nrp] Confusion among characters and strings

Asmus Freytag <asmusf@ix.netcom.com> Mon, 15 October 2018 11:36 UTC

Return-Path: <asmusf@ix.netcom.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5AC73130DFD for <i18nrp@ietfa.amsl.com>; Mon, 15 Oct 2018 04:36:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.111
X-Spam-Level:
X-Spam-Status: No, score=0.111 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RAZOR2_CF_RANGE_51_100=1.886, RAZOR2_CHECK=0.922, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=ix.netcom.com; domainkeys=pass (2048-bit key) header.from=asmusf@ix.netcom.com header.d=ix.netcom.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id X_M4kSa7tKim for <i18nrp@ietfa.amsl.com>; Mon, 15 Oct 2018 04:36:14 -0700 (PDT)
Received: from elasmtp-dupuy.atl.sa.earthlink.net (elasmtp-dupuy.atl.sa.earthlink.net [209.86.89.62]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 88927130E43 for <i18nrp@ietf.org>; Mon, 15 Oct 2018 04:36:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ix.netcom.com; s=dk12062016; t=1539603374; bh=1ONGPWA6OdIHfXbC7JBpA+jko2eyAwB05h2i FhKkaoI=; h=Received:Subject:To:References:From:Message-ID:Date: User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language: X-ELNK-Trace:X-Originating-IP; b=mt4JFdA+CUpvron8SsS1Z3Hwk67i0Y6qn PERf+jZl+PBxxs9m033wXio6mMLzVsXHMYyKr1S+0F5CI3WiskPoLpvAz6Hn6HuRtI5 ZicoD0X3MF2Eq0WQ6VXtQo10QGqVPeIe9a6hytpefqv+yIWd/qdWXm/7nzpuVO1ZIFT k+3jgVzKckIhYDqNV/7PYrziPyXO47yNl4PBXcccjoZWtCcqFwLtFWmT0xt1u0nDF+x uQfoKaaGptq4fCtLolbTk13AYrP0guc+LuXK8b9xoqQrdRgwis12hAGibS5V4w87F05 A0xP4XGfyHBeeF5coj14Y8uPVv7VYXygzTCrW7+Qg==
DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=dk12062016; d=ix.netcom.com; b=Lj1c+AEMzkW4ph1IV+F2oLzpVfquve2Nvs1kj08VfmJxiboHbdYnKZ9UkSgwqJdNwIhseowXXdnVNZxNrpTfJmXq9nRDq8d00nax17EQhle8nMUKfyzS5wHUpfhmD+eUgEXweRYskrHE113KlFfPVThK/tFjuWs/y1cxgzsQis/OZkFrXRwEOKI2jz2qQF2rPT8gk1jT5sIJCoewWQRgITUjtG3DDlkIJP20/eEDMdYe8FpoU1oAB7HqOet4M329PjFTgEaxAg+PGm1Zsw8l1hCSi4bji07dtXnULLgDDSg9rQh6hGPKVsAnBLvjR44hNCFrp9Jz/+juktNgXlIN4Q==; h=Received:Subject:To:References:From:Message-ID:Date:User-Agent:MIME-Version:In-Reply-To:Content-Type:Content-Language:X-ELNK-Trace:X-Originating-IP;
Received: from [91.4.140.57] (helo=[192.168.0.182]) by elasmtp-dupuy.atl.sa.earthlink.net with esmtpa (Exim 4) (envelope-from <asmusf@ix.netcom.com>) id 1gC1AR-0008rz-IX; Mon, 15 Oct 2018 07:36:12 -0400
To: Larry Masinter <LMM@acm.org>, 'John C Klensin' <john-ietf@jck.com>, i18nrp@ietf.org
References: <145D45F77511A9B1281FE35D@PSB> <033401d461f1$7d181590$774840b0$@acm.org>
From: Asmus Freytag <asmusf@ix.netcom.com>
Message-ID: <4df1f049-bbdd-9c1c-7752-496fd3ff474c@ix.netcom.com>
Date: Mon, 15 Oct 2018 04:36:19 -0700
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.2.1
MIME-Version: 1.0
In-Reply-To: <033401d461f1$7d181590$774840b0$@acm.org>
Content-Type: multipart/alternative; boundary="------------7F1BF01F0B83071B6118CEFD"
Content-Language: en-US
X-ELNK-Trace: 464f085de979d7246f36dc87813833b2b7eec10b52094b3ec219d5943bf363f5166fa7cf9a03176d350badd9bab72f9c350badd9bab72f9c350badd9bab72f9c
X-Originating-IP: 91.4.140.57
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/x5VI5rBaI37Nsq_oeBHevvyKV5M>
Subject: Re: [I18nrp] Confusion among characters and strings
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>, <mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Oct 2018 11:36:17 -0000

On 10/11/2018 11:04 PM, Larry Masinter wrote:
> This
>
> An experiment:

This isn't a useful experiment given how poorly OCS performs. (?)

Try it with Chinese characters and each string would have a huge "halo" 
of OCR-confusables.

That said, I see nothing wrong with making letter o and zero (and some 
other examples) either outright blocked variants or something that's 
flagged as potentially malicious. For the Root Zone we don't have 
digits, so I didn't have to investigate them, of course.

All comes down to whether there's a will on the side of registries to 
police things, and whether, for those that do, we can define either 
minimal standards or best practices (perhaps on a per-script basis).

A./

> Given a string, convert the string to an image, OCR the image, and see
> you get back the same string, code-point by code-point.
> Vary the font to cover repertoire on common platforms (Android, iOS, windows mac).
>
> Note this secion contains lots of puns
>
> G00GLE.com OCRs to GOOGLE.com consistently.
> Larrч turns into Larry and Зcom into 3com.
> Toys-Я-Us.com turns into Toys-A-Us.com, even when language is Russian.
>
> I was using open office to turn text into image
> soffice --convert-to jpg test.txt
> and https://ocr.space/compare-ocr-software for ocr.
>
>
>
>
>
>
> _______________________________________________
> i18nRP mailing list
> i18nRP@ietf.org
> https://www.ietf.org/mailman/listinfo/i18nrp