Re: [I18ndir] Review of new characters for Unicode 12.0.0

"Patrik Fältström " <paf@frobbit.se> Tue, 19 March 2019 06:11 UTC

Return-Path: <paf@frobbit.se>
X-Original-To: i18ndir@ietfa.amsl.com
Delivered-To: i18ndir@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 86BE6130F1F for <i18ndir@ietfa.amsl.com>; Mon, 18 Mar 2019 23:11:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.722
X-Spam-Level:
X-Spam-Status: No, score=-1.722 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FROM_EXCESS_BASE64=0.979, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=frobbit.se
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id m67TQHaDM2Ed for <i18ndir@ietfa.amsl.com>; Mon, 18 Mar 2019 23:11:39 -0700 (PDT)
Received: from mail.frobbit.se (mail.frobbit.se [IPv6:2a02:80:3ffe::176]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D80351277CE for <i18ndir@ietf.org>; Mon, 18 Mar 2019 23:11:38 -0700 (PDT)
Received: from [77.72.226.235] (unknown [IPv6:2a01:3f0:1:0:b9d4:af54:930:b103]) by mail.frobbit.se (Postfix) with ESMTPSA id 1B7AE22E0D; Tue, 19 Mar 2019 07:11:36 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=frobbit.se; s=mail; t=1552975896; bh=62OFewS+GdTs4gw/gZaWiBRXous6W6DYpYuOv4ZLigE=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=gbwRcW+dcDYjPrbZTHXD/eXrJI0jW0WStVyT0o4ridfqnQEp1G3siP8m6h2+31syW O65XCUGn8c9SCw0DBGXji9R9T/yU0i/VeUHl0IXLQzJon5/4rfEmF9hizXcU3iEe5M 1FrJEWfIJrmn/s/0m0QmAljcwUPLJB59FPLP3538=
From: Patrik Fältström <paf@frobbit.se>
To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>
Cc: i18ndir@ietf.org
Date: Tue, 19 Mar 2019 07:11:34 +0100
X-Mailer: MailMate (1.12.4r5597)
Message-ID: <6CF90187-2E45-4160-97D6-5571BE02EC58@frobbit.se>
In-Reply-To: <8aa72ac4-1eb9-5df1-8d56-165e12202456@it.aoyama.ac.jp>
References: <e0174987-056d-d74e-c3fa-5b457a72f8c3@it.aoyama.ac.jp> <12f6742d-081b-5ef0-097c-d571e7fe1e9f@it.aoyama.ac.jp> <ADFDCB3A-BAEA-46BD-991A-F9D4FC863ED1@dmarc.ietf.org> <8aa72ac4-1eb9-5df1-8d56-165e12202456@it.aoyama.ac.jp>
MIME-Version: 1.0
Content-Type: multipart/signed; boundary="=_MailMate_41055248-20ED-4617-B03A-2843F20BB6DF_="; micalg="pgp-sha1"; protocol="application/pgp-signature"
Archived-At: <https://mailarchive.ietf.org/arch/msg/i18ndir/K4jAb7c4_AUFGWtEJ09aXzg8ISc>
Subject: Re: [I18ndir] Review of new characters for Unicode 12.0.0
X-BeenThere: i18ndir@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Internationalization Directorate <i18ndir.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/i18ndir/>
List-Post: <mailto:i18ndir@ietf.org>
List-Help: <mailto:i18ndir-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/i18ndir>, <mailto:i18ndir-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 19 Mar 2019 06:11:42 -0000

On 19 Mar 2019, at 0:47, Martin J. Dürst wrote:

>> Martin, as I have implemented IDNA2008 from scratch, i.e. not using any
>> libraries at all,
>
> I'm also not using any libraries at all, just relying on the Unicode version in Ruby to be the same as the target version. But I understand what you're saying.

Well, Ruby do use ultimately libidn2 for IDNA2008, so yes, you do use libraries.

>> let me suggest we sync on what output format we use so
>> that we can do "diff" between your and my lists. My list for 12.0.0 is
>> btw attached.
>
> I think it's a good idea to have this for some cross-checks. But one of the features of my program was that it listed only the new characters, because these were the ones I wanted to review. So I'll definitely keep this ability.

Ok, I of course have that as well :-)

That operates on the file format I did show you.

>> May I suggest what my program is doing (of course), which is as follows:
>>
>> <Codepoint>;<Derived property value>;<Rule(s) that matched>;<Name>
>
> I'm fine with that. I'll have to do more work on supporting ranges in UnicodeData.txt and supporting the rules that I ignored in my first quick attempt. So please don't rely on my work for moving forward.

No problem, I already do have two implementations myself. One that do use libidn2 in python3, one in ruby that do not, and then I compare with IANA as I know they do have a completely code base as well.

   Patrik