Re: [I18nrp] Confusion among characters and strings
Larry Masinter <LMM@acm.org> Fri, 12 October 2018 06:04 UTC
Return-Path: <masinter@gmail.com>
X-Original-To: i18nrp@ietfa.amsl.com
Delivered-To: i18nrp@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1])
by ietfa.amsl.com (Postfix) with ESMTP id B4733130DF0
for <i18nrp@ietfa.amsl.com>; Thu, 11 Oct 2018 23:04:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.408
X-Spam-Level: *
X-Spam-Status: No, score=1.408 tagged_above=-999 required=5
tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1,
FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001,
HEADER_FROM_DIFFERENT_DOMAINS=0.25, RAZOR2_CF_RANGE_51_100=1.886,
RAZOR2_CHECK=0.922, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001]
autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key)
header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44])
by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id 34YdjDBw4Q4a for <i18nrp@ietfa.amsl.com>;
Thu, 11 Oct 2018 23:04:51 -0700 (PDT)
Received: from mail-pl1-x641.google.com (mail-pl1-x641.google.com
[IPv6:2607:f8b0:4864:20::641])
(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
(No client certificate requested)
by ietfa.amsl.com (Postfix) with ESMTPS id 2FEBD130DE0
for <i18nrp@ietf.org>; Thu, 11 Oct 2018 23:04:51 -0700 (PDT)
Received: by mail-pl1-x641.google.com with SMTP id 1-v6so5382007plv.7
for <i18nrp@ietf.org>; Thu, 11 Oct 2018 23:04:51 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025;
h=sender:from:to:references:in-reply-to:subject:date:message-id
:mime-version:content-transfer-encoding:thread-index
:content-language;
bh=Q0WCE0B0r7MTPKRCyArKiOvHifscSrlHXx5b3qkhWsI=;
b=neS69WHFyloQ9GfYGzrraUi7a0mRuNS3Auw8ChWIg3fNGzTMV2s2Bp4afwTL2exH82
AVj1jk7fCSBNR0/LJWyf1pw3ZfStVsLALLxZ/+1IYEWPdCOzE+Hv+hT2BCnMduJtB0yu
Zpoj5+CpzaXyRAqbYOxwQXArKLe/jnQtFXZ/1iPKk9ogTzUsryIPLQ3gXSp5lSHUYFs7
h55zcB7ry0zrRQUcBYEO9jmIPVAlzA3WjIHw5z68LJJc2sNFaj195fGWtRgDzbaJqRfn
VWgc2wFL/Wv1GJ6EQEeABloxKw35UQYuLWw4NuAWDez56ZcNk2enDZokGwhALC6hk4uT
L8ZQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=1e100.net; s=20161025;
h=x-gm-message-state:sender:from:to:references:in-reply-to:subject
:date:message-id:mime-version:content-transfer-encoding:thread-index
:content-language;
bh=Q0WCE0B0r7MTPKRCyArKiOvHifscSrlHXx5b3qkhWsI=;
b=PALpaUtcvrCrCIjX1CY0RV6NJ5ds5MgQd9Ty0b0yGS8D3bOdaeybXxBLRkiZ2xWLKU
id34QBF9c+SaXG08F6qfFqXiiEnjV9M8TJ5F9oPIo8//msuxiV2Y0JVyMLWSgLIaOjyL
ZPKu8xdGGcd7DSzUnoXXolfygPezrgBDm6R43bNIZlNwmQPdM+QqbBPoSP1dWBjEefbB
HQ1m5X0eSu4nd/PX1jUiEzQ4KMumP/2sMDR4m1Ly1yJfReg+2ARBbZ2RQbpMnC2GIZUF
+nsBqTq8NXboo9LxBPgTpTiHTeuRU7w6nSt6IiiaN8diu1ywpURq07b6fcbdKPg6+xJf
Qp/Q==
X-Gm-Message-State: ABuFfoiXCF+rLLPYtmpyI+yNPiCeP+v+EHJYiXQZgYxMjLy4+xVz2Xld
IPh54lMuglOOMCJ2m6Xj9qczmayL
X-Google-Smtp-Source: ACcGV60OWuZEh+J0hC+UVCBXumuaMgLrozK4JRRn5NNUZC6yUFXxIF/HjpXXFeKCoo/pHWM2l3smpA==
X-Received: by 2002:a17:902:8347:: with SMTP id
z7-v6mr4559620pln.111.1539324290378;
Thu, 11 Oct 2018 23:04:50 -0700 (PDT)
Received: from TVPC (c-24-6-174-39.hsd1.ca.comcast.net. [24.6.174.39])
by smtp.gmail.com with ESMTPSA id k72-v6sm760908pfj.63.2018.10.11.23.04.49
(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);
Thu, 11 Oct 2018 23:04:49 -0700 (PDT)
Sender: Larry Masinter <masinter@gmail.com>
From: Larry Masinter <LMM@acm.org>
X-Google-Original-From: "Larry Masinter" <lmm@acm.org>
To: "'John C Klensin'" <john-ietf@jck.com>,
<i18nrp@ietf.org>
References: <145D45F77511A9B1281FE35D@PSB>
In-Reply-To: <145D45F77511A9B1281FE35D@PSB>
Date: Thu, 11 Oct 2018 23:04:50 -0700
Message-ID: <033401d461f1$7d181590$774840b0$@acm.org>
MIME-Version: 1.0
Content-Type: text/plain;
charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQHDUmDVjpTpmH/VJAtCEe7qJubDJQIFeGBy
Content-Language: en-us
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18nrp/UIWgX0wj37SDzPC0UWVTKEbkmVE>
Subject: Re: [I18nrp] Confusion among characters and strings
X-BeenThere: i18nrp@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Review Procedures <i18nrp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18nrp>,
<mailto:i18nrp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18nrp/>
List-Post: <mailto:i18nrp@ietf.org>
List-Help: <mailto:i18nrp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18nrp>,
<mailto:i18nrp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 12 Oct 2018 06:04:53 -0000
This An experiment: Given a string, convert the string to an image, OCR the image, and see you get back the same string, code-point by code-point. Vary the font to cover repertoire on common platforms (Android, iOS, windows mac). Note this secion contains lots of puns G00GLE.com OCRs to GOOGLE.com consistently. Larrч turns into Larry and Зcom into 3com. Toys-Я-Us.com turns into Toys-A-Us.com, even when language is Russian. I was using open office to turn text into image soffice --convert-to jpg test.txt and https://ocr.space/compare-ocr-software for ocr.
- Re: [I18nrp] Confusion among characters and strin… John C Klensin
- Re: [I18nrp] Confusion among characters and strin… Nico Williams
- [I18nrp] Confusion among characters and strings John C Klensin
- Re: [I18nrp] Confusion among characters and strin… Larry Masinter
- Re: [I18nrp] Confusion among characters and strin… John C Klensin
- Re: [I18nrp] [Ext] Re: Confusion among characters… Sarmad Hussain
- Re: [I18nrp] Confusion among characters and strin… Asmus Freytag
- Re: [I18nrp] Confusion among characters and strin… John C Klensin
- Re: [I18nrp] Confusion among characters and strin… Asmus Freytag