Re: [xml2rfc] <country> in v3

Julian Reschke <julian.reschke@gmx.de> Sat, 04 January 2020 07:24 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 43B29120124 for <xml2rfc@ietfa.amsl.com>; Fri, 3 Jan 2020 23:24:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.898
X-Spam-Level:
X-Spam-Status: No, score=-1.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QJxjpwCduMtq for <xml2rfc@ietfa.amsl.com>; Fri, 3 Jan 2020 23:24:06 -0800 (PST)
Received: from mout.gmx.net (mout.gmx.net [212.227.15.15]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2F1E912011F for <xml2rfc@ietf.org>; Fri, 3 Jan 2020 23:24:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1578122629; bh=BBSdXFMljkbSHUID6DkyCsUq4NdB5Kbu3IjXtkdf0+0=; h=X-UI-Sender-Class:Subject:To:References:From:Date:In-Reply-To; b=LAFaMgAWtPYyboh1GV9qQl4k26Li0fD+Nk/v2I8YP5Mj99/pZeB7Xtn3iZOiQG8Jg SYzXz32XKCMubMM9tjilcLaOc+BOqO2gJOnU/9BenFyvUQyf9+FiThvNpsQFGGOc+S rmRYqKXQiE0sCiwKvHlXNQ6NI050565YLR1GZBu8=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.178.124] ([91.61.53.194]) by mail.gmx.com (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MO9zH-1j3MC41h2v-00OWqQ; Sat, 04 Jan 2020 08:23:49 +0100
To: Henrik Levkowetz <henrik@levkowetz.com>, Brian E Carpenter <brian.e.carpenter@gmail.com>, xml2rfc@ietf.org
References: <918df451-55de-9bd2-688b-37d2d83762b5@gmail.com> <d8763b85-2a6c-e927-b50b-1d6fa9a8f361@gmx.de> <1b6c8b07-df98-6b8e-9144-c203d23f632c@gmail.com> <f7615a5e-a4bb-bc0d-6cb4-5f1297c24c23@levkowetz.com>
From: Julian Reschke <julian.reschke@gmx.de>
Message-ID: <143ad0e7-3112-7db7-8e85-e9d24ddecbcc@gmx.de>
Date: Sat, 04 Jan 2020 08:23:48 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.3.1
MIME-Version: 1.0
In-Reply-To: <f7615a5e-a4bb-bc0d-6cb4-5f1297c24c23@levkowetz.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:HPm1ihA74KOp5fRKQgNl92WylGe95PwA3sSpOpwWO0HHgopXxHU N8HqEOmMAQxGPpFUqP1O3+QXkBnP6ZBhP8aGxmQuw/KrvOcia4az5nboLF6/tEeJSOYmnmm Ma/pXrWwUcAd5jju88rRNuaFv94Iu2GG/+f2mQobkYtIJChatta6PoivELMo8zE8m0xJWtv aycSPxelwQ1AwNrP97GRA==
X-UI-Out-Filterresults: notjunk:1;V03:K0:yhm7H8YlePE=:xqWdzzXygOlc14f1+ofF1z NiZVAqJgy392ZxVs7+JdJia7BqodF3CYYQ+PUNEUpv8A5Ck7fIcQX+D9XWUpZzDryctJ4Sr7y QJ3Tauz9afvOrA9+54k7KkhJ4nLCeNhYvimC2FrfZARQ2iHmc/8bNIRSvsZQQlPFIoTxBpWda u1apaDN3uo6MClfPaCQUa2tU9T7N2ONO2Obdjdhhw8ecHh5SIX3+5l5rg+K+gB22UcG+gqJbm OdayaCwanVTN9joMOXgsXswaaTzol6W4AA8nFk5Xm++ntMMfi1Re1UwHCRuog1RPRX8yP8gjy 1/tM1b6njEux/ioBEB6Fjax3eiCj6ERtfkxbJOdmMozK1DZQ1/Nvuaki5A8+euTUamRysFoNM SN2tkkAeha2IkCnT6vfJoz52xr2TQ8xSBtxMU5dyQmm28WDZiHqji96B66maiVnEdBiEVHmrF lG/4GQT6mek+R86E25ZTuNYk783yfTPx/Y/oUqFl5CSkVHwnlUDgjap9bMehFVO9MgVVOXwbQ BtglhQ9ZXiXeJmYDgkzCUGMYJVQUy6tWE9uQVjAQkMoA41Us54l8cgdchebuq/CmnuKoSWX37 56TQ7ifUKiG/s9UDm62DDL9V2wSEy3tf1cXzc7fLcybSVFPJSTf3ww1MHD0MHTTASMtJRSCg/ uXyvtZuz5zXVDUs7PGLkVicLSSTtUOcEVEFw/cD9vNEkMT8Xxo8y2+eDhxrxAjtE0zQO7crvV sEDyOBQ6WFmWsUU+lu9zg0T0/wnbDfaLCM7g94lno5zPrrgYs871e3jjWXHG0oQRFwbMUD+KH nlKdpgZdNsR+dF8gx6MrZ5LAfSPCqPCqtOInrfu9HJ8G49TQsHcooOZqLoHL3d4Bvq+aEnx0A RpWCU1sCYVqNLeq5DAuhExuFz4fheBw1FQNfdLyhxqU7uTuUva1V7kXMXWrnVzo5k61y7zHIu R2ekyPUw0GVqPIyyLmcRpaWtGNi+tAeUDGRSe6gIr8Y93CLcyGawBriImhyvMniQzTLHGxml+ kmSXx53sWz21r0fkMwOY5mzYKfiNkqqSMbnm5F4OXvTYRRC8EguRs50yYXM5XNJugIlCOtuHC eNpxq0X4E29HYl5r+M/8b1CtCo7UBxzNl1p3zN60G64NP7blCEKwuSu8kZEQOCa/yvkrC8qHj V3Of5gELqLWZF25VYZJitrgruPakh7eOR8lMckSfsPTzcu43iQcVHDsYN+njztVuvtQwWlgLP FNOyIENvSD6rFUvjs8YQ7V9/WqNHReL8rZX97e+toB6QCrlp9wUEkD51N3iQ=
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/AYY-LCfalkhcVs1g4iIzae9YzgM>
Subject: Re: [xml2rfc] <country> in v3
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 04 Jan 2020 07:24:08 -0000

On 03.01.2020 21:24, Henrik Levkowetz wrote:
> Hi Brian,
>
> On 2020-01-03 20:43, Brian E Carpenter wrote:
>> On 03-Jan-20 20:36, Julian Reschke wrote:
>>> On 02.01.2020 20:58, Brian E Carpenter wrote:
>>>> For some reason, it appears that <country>P.R. China</country> causes problems in v3, but <country>China</country> is OK.
>>>>
>>>> The problem I see is that <country> is simply ignored, and <code> is also ignored,
>>>> when the "P.R" was included.
>>>>
>>>> I think there is a similar issue with commas included in <street>.
>>>>
>>>> For both, RFC7991 says "Content model: only text content." I don't see that this
>>>> should exclude punctuation marks.
>>>>
>>>> Regards
>>>>      Brian Carpenter
>>>
>>> xml2rfc wants an "ISO short name" here (see
>>> <https://www.iso.org/obp/ui#search> for a lookup UI).
>>
>> I really hope it doesn't "want" an ISO code, since RFC7991 says nothing about
>> that. It's very convenient that they are available, though.
>
> xml2rfc recognizes the ISO 3166 2-letter, 3-letter, official, and common names
> of countries.  In cases where a name or abbreviation in common use isn't
> recognized, I'm happy to add extra entries.  I've already done so for 'UK',
> and will do so for "P.R. China" in the next release.
> ...

If this is the case, it should be properly documented (otherwise we
don't have a spec but just a single implementation that has some
heuristics that affect the generated RFC text). In particular, extra
entries would need to be documented.

Looking at the Taiwan case:

>       <author surname="Taiwan" initials="T." fullname="Tai Taiwan">
>          <address>
>             <postal>
>                <extaddr>EXTADDR-1</extaddr>
>                <extaddr>EXTADDR-2</extaddr>
>                <extaddr>EXTADDR-3</extaddr>
>                <street>STREET-1</street>
>                <street>STREET-2</street>
>                <street>STREET-3</street>
>                <pobox>POBOX</pobox>
>                <sortingcode>SORTINGCODE</sortingcode>
>                <code>CODE</code>
>                <cityarea>CITYAREA</cityarea>
>                <city>CITY</city>
>                <region>REGION</region>
>                <country>Taiwan</country>
>             </postal>
>          </address>
>       </author>

produces

>    Tai Taiwan
>    EXTADDR-1
>    EXTADDR-2
>    EXTADDR-3
>    STREET-1
>    STREET-2
>    STREET-3
>    POBOX
>    CITY, REGION CODE
>    Taiwan

So it falls back to US address formatting, but doesn't generate any warning.

Now,

>       <author surname="Taiwan" initials="T." fullname="Tai Taiwan2">
>          <address>
>             <postal>
>                <extaddr>EXTADDR-1</extaddr>
>                <extaddr>EXTADDR-2</extaddr>
>                <extaddr>EXTADDR-3</extaddr>
>                <street>STREET-1</street>
>                <street>STREET-2</street>
>                <street>STREET-3</street>
>                <pobox>POBOX</pobox>
>                <sortingcode>SORTINGCODE</sortingcode>
>                <code>CODE</code>
>                <cityarea>CITYAREA</cityarea>
>                <city>CITY</city>
>                <region>REGION</region>
>                <country>Taiwan (Province of China)</country>
>             </postal>
>          </address>
>       </author>

yields:

>    Tai Taiwan
>    EXTADDR-1
>    EXTADDR-2
>    EXTADDR-3
>    STREET-1
>    STREET-2
>    STREET-3
>    POBOX
>    CITY

So the country name is simply dropped, again without warning.

(I would open a ticket, but it seems that tickets opened by me are
ignored anyway)

Using TW or TWN as country names gets me:

>    Tai Taiwan
>    EXTADDR-1
>    EXTADDR-2
>    EXTADDR-3
>    STREET-1
>    STREET-2
>    STREET-3
>    POBOX
>    CITY, REGION CODE
>    Taiwan, Province of China

which indicates that the expected short name is "Taiwan, Province of
China", not "Taiwan (Province of China)" (which is what I get from
<https://www.iso.org/obp/ui#search> which agrees with
<https://en.wikipedia.org/wiki/List_of_ISO_3166_country_codes>").

I continue to believe that we are wasting time and energy here. There's
a reason why RFC 7991 has "postalLine"; it allows authors to format
their postal address exactly the way they want it; and that should be
sufficient. We should just get rid of any requirements made by the HTML
spec with respect to HCARD annotations and be done with it.

Best regards, Julian