Re: [Lucid] [mark@macchiato.com: Re: Non-normalizable diacritics

Re: [Lucid] [mark@macchiato.com: Re: Non-normalizable diacritics - new property]

Andrew Sullivan <ajs@anvilwalrusden.com> Wed, 11 March 2015 20:09 UTC

Date: Wed, 11 Mar 2015 16:09:41 -0400
From: Andrew Sullivan <ajs@anvilwalrusden.com>
To: lucid@ietf.org
Message-ID: <20150311200941.GV15037@mx1.yitter.info>
References: <20150311013300.GC12479@dyn.com> <CA+9kkMDZW9yPtDxtLTfY1=VS6itvHtXHF1qdZKtXdwwORwqnew@mail.gmail.com> <55008F97.8040701@ix.netcom.com> <CA+9kkMAcgSA1Ch0B9W1Np0LMn2udegZ=AzU1b26dAi+SDcbGgg@mail.gmail.com> <CY1PR0301MB07310C68F6CFDD46AE22086F82190@CY1PR0301MB0731.namprd03.prod.outlook.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <CY1PR0301MB07310C68F6CFDD46AE22086F82190@CY1PR0301MB0731.namprd03.prod.outlook.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
Archived-At: <http://mailarchive.ietf.org/arch/msg/lucid/mjRhjHuihrbp9lDynCYR5vafB4Y>
Subject: Re: [Lucid] [mark@macchiato.com: Re: Non-normalizable diacritics - new property]
Precedence: list

On Wed, Mar 11, 2015 at 07:43:12PM +0000, Shawn Steele wrote:
> This document makes a hard line between “homoglyphs” and “visually similar”.

No, it does not.  It says right there in section 2.2.1:

       Any character that can be confused for
   another one can be called confusable, and confusability can be
   thought of as a spectrum with "visually similar" at one end, and
   "homoglyphs" at the other.  (We use the term "homoglyph" strictly:
   code points that normally use the same glyph when rendered.)

> The ʻokina is a decent case where it looks a lot like another character, and often fonts may even use the same glyph, however sometimes font designers choose to make a distinction.  It’s nearly impossible to tell a developer to “use the right font”.
> 

It's funny that you should pick ʻokina, because you're sort of making
the point the draft is after.  In any properly-designed font, U+02BB
will be rendered as though it is a single opening curly-quote, in the
way English (not American) quotation marks were historically typeset.
But U+02BC (ʼ) is a letter, and it's actually called "APOSTROPHE", and
in any decent font will look like the single closing curly-quote,
which is what was historically used in English for the apostrophe as
well.  This is not U+0027 ('), of course.  So all three can be
distinguished in a proper font.  There's moreover an argument to be
made that U+0027 is close to U+02BC than it is to U+02BB, though of
course the context might make a difference.  (Also, of course, U+0027
has other different properties.)

None of this is like the case of, say, the precomposed e-with-acute
and the combining sequence, which should never even in principle show
a difference.  That particular case is solved by NFC, but it is
clearly different from the cases where a font should or could, in
principle, make them distinguishable.

I think we're all aware that you can't tell developers what font to
use, but there is clearly an in-principle difference between "could be
mitigated with font" and "cannot possibly be mitigated with font".
And as the draft notes, this is all a spectrum with many small
gradations.  The purpose of the discussion is to make distinctions
apparent where we can, so that we can talk sensibly about them.  So
trying to say they're all the same doesn't help make those
distinctions clear.

> Additionally it continues to treat these newly noticed characters as a special case without considering the many existing problems.
>

Where, please?

> I’m also confused by the document’s attention to the need for unique identifiers at the beginning, but then looks at the existing IDNA problem.  I don’t consider IDNA able to provide “secure” (meaning unconfusable) identifiers.
> 

IDNA is supposed to be providing unique identifiers.  It in fact does,
in the sense that when a series of U-labels or A-labels are put
together they (respectively) produce exactly one FQDN that can be
looked up.  That's why it's part of the problem.

> 
> IMO the reason to solve the problem with this character

Which character, exactly?

Best regards,

A
-- 
Andrew Sullivan
ajs@anvilwalrusden.com

Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Asmus Freytag
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Andrew Sullivan
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Andrew Sullivan
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Asmus Freytag
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Shawn Steele
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Andrew Sullivan
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… Asmus Freytag
Re: [Lucid] FW: [mark@macchiato.com: Re: Non-norm… John C Klensin
[Lucid] [mark@macchiato.com: Re: Non-normalizable… Andrew Sullivan
Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Ted Hardie
Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Ted Hardie
Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Shawn Steele
Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… Andrew Sullivan
Re: [Lucid] [mark@macchiato.com: Re: Non-normaliz… John C Klensin
[Lucid] FW: [mark@macchiato.com: Re: Non-normaliz… Shawn Steele