[xml2rfc] formatting <postal>, was: [xml2rfc-dev] [Rfc-markdown] New xml2rfc release: v2.12.1

Julian Reschke <julian.reschke@gmx.de> Wed, 15 May 2019 07:20 UTC

Return-Path: <julian.reschke@gmx.de>
X-Original-To: xml2rfc@ietfa.amsl.com
Delivered-To: xml2rfc@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 17B771202B7; Wed, 15 May 2019 00:20:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.6
X-Spam-Level:
X-Spam-Status: No, score=-2.6 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=gmx.net
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8TPZ6oDcRSui; Wed, 15 May 2019 00:20:20 -0700 (PDT)
Received: from mout.gmx.net (mout.gmx.net [212.227.17.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 00742120255; Wed, 15 May 2019 00:20:19 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1557904786; bh=BaSo6ej4fT0lHcnCHmeOrsKmA/C20/zBBXzkgomWMoY=; h=X-UI-Sender-Class:Subject:From:To:Cc:References:Date:In-Reply-To; b=IunDx012dXyL6NrpqgCyEZM/Rco1p1pHDsD9gu3/mPleyip+Fqt6gck2wZrffSzQk XVO3WTa5hYe+uLOEMOVzElkUsJcPMK/9mZzBVD6Y76lQjWCzGHLwsYr903clS+Plix gjf1BL2wdIgb6CUgNWcC9DNc4DZAUf/CVc2OCGzM=
X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c
Received: from [192.168.178.124] ([84.171.148.39]) by mail.gmx.com (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1Mg6Zq-1glOZg1krO-00hgCG; Wed, 15 May 2019 09:19:46 +0200
From: Julian Reschke <julian.reschke@gmx.de>
To: Henrik Levkowetz <henrik@levkowetz.com>, Miek Gieben <miek@miek.nl>
Cc: xml2rfc@ietf.org, xml2rfc-dev@ietf.org
References: <E1gH6Wn-0002ca-CU@durif.tools.ietf.org> <20181029204446.xwjkdfdpj7sbhvsr@miek.nl> <0455bd48-3392-bb4c-f510-c57489792606@levkowetz.com> <20181029214325.5i3rrv55dpamvbsa@miek.nl> <a614b58e-0d4c-67d0-77f9-f5b6b66d7ff1@levkowetz.com> <d0a3693b-87b5-c8e9-8137-9d61805f58f0@gmx.de> <f01715aa-ee1b-ec80-0657-7d19b23da98a@levkowetz.com> <1d03c466-bd4a-934e-18f0-f9ac5bc2225e@gmx.de>
Message-ID: <b54c55cd-1add-ae83-4dfe-841700980de2@gmx.de>
Date: Wed, 15 May 2019 09:19:47 +0200
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1
MIME-Version: 1.0
In-Reply-To: <1d03c466-bd4a-934e-18f0-f9ac5bc2225e@gmx.de>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:P10U3lEY3+z+ish6k7oEfZE5CkfPdRLXLDaDeuLF8Iw9bfVfk1d +tgFNgAlTvFTNlbd2jK8ATbZLfSTr6HFaLuCf9ZC7gStE0GLKhCEyCBG21SmVJqZoqozMe6 6s/sLkSD0/VxwDVGe3ACK+nGsYwwutgp2RVjBzx0I77l0/Mdvxoyo96SrShXZD8yRaVDplc +fB6Av2Lp5sfCr2rrJ3Zg==
X-UI-Out-Filterresults: notjunk:1;V03:K0:J4vvdVVcWHs=:gqIBWkefafODAHCduW4v9+ UlqSuExBtSqt45eUtH1WcRpnTkZi77hjUDjG4SPLBmnTUJHlwwzLg6jm6siCbH90G50vtA96R KrR/S3HtmdDo76Kr3C2/hlCkhCFgeN3qzEW9QKIf3/9USLQejNWr617Z3Jd6wqVSBMsy+xwQu Ri8c4NGYYIkWzaMIMJ7PApEWl8WDM7Kf20y5jsJOAmb0B/RXjLa99tA1GiVj/v6ggaKQmW5EY DX/CK6DN/8P6qtYG/g5AtngY69Dr/bf+BlA9Sh+kMgxlUjmbep5B+oCHof59OHecHrD+dzRXt ZYzET1g7K8vkZC+sYVC1SWOPhJZcduVvtjAu2HxmuVmgvfFUDIa0Q+J5WCLOr1H8CDPIHBeSn DDurqJPazO93UxInBFuLo59U7vzner16WtysJ4QBtEplG8gmn67HFaGWE3v9RX2+oKZGtzbVc FVEoInlFRg/IYP3nVXy0iKxunfXCL0ZbPJ2OebVZgvabzU1aULenwYlfd7bAKpM2DHDKdgP8w JlMCXClTJSs9CMwP7jyG6Uf/u/m6DYUyrvHt2YuaHIUW3T3Wi/2r6VCQ6TGa1NPg5zqjLv6qb AZU3i7Mq8iDyE5+oPqnnYQv9GN3dIXoSsqNI4/SVJVMOf7yrOS8fcoJW2DA01w81/PxTh70cm nyMk1bn31MIOIKDah98cCOiV37zj1S2Y8KOhcF6d0bk5fJzdvctUj+0S5sRHTrWB9yQ9fl3QA gtrzT8IFp94zx6+ts9MWqoFpnk3zxjz8/y3gOCNGeCH47XCmyroeytHXBlBSP4YTrIKYtLtNC uZaOh4agU+nIEWQlugHDXC3dOD8qM6QDrTd5JnRVjO1ti5kIGzrdyA/dhbS6tGScE86V4u8Qo iGaKxlDQ2FVF16GZIzknzj2KakJNKdhgJG42SQGFzHJuVdRBTHnZqHulb1Uw050pUJTSyri1K cOY9gv4nE1w==
Archived-At: <https://mailarchive.ietf.org/arch/msg/xml2rfc/YjlH-3UNyyk49tl_3mO7NxMQyq8>
Subject: [xml2rfc] formatting <postal>, was: [xml2rfc-dev] [Rfc-markdown] New xml2rfc release: v2.12.1
X-BeenThere: xml2rfc@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <xml2rfc.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/xml2rfc/>
List-Post: <mailto:xml2rfc@ietf.org>
List-Help: <mailto:xml2rfc-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xml2rfc>, <mailto:xml2rfc-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 15 May 2019 07:20:26 -0000

On 30.10.2018 08:20, Julian Reschke wrote:
> On 2018-10-30 07:29, Henrik Levkowetz wrote:
>>
>>
>> On 2018-10-30 05:57, Julian Reschke wrote:
>>> On 2018-10-29 22:57, Henrik Levkowetz wrote:
>>>> ...
>>>> Country-specific address formatting.  Without it, you'll be stuck
>>>> with the
>>>> fallback, which is very americentric for the text output, somewhat
>>>> less so
>>>> for the html output (but not very refined).  I'd hate to see it go.
>>>> ...
>>>
>>> Example? Smells a bit like overkill, given that v3 has simplified
>>> address formatting (postalLine).
>>
>> Example: In Norwegian and Swedish address format, the postal code goes
>> before the city, not after the region code.  This is correct:
>>
>>    Midtskogveien 18
>>    2020 Skedsmokorset
>>    Norway
>>
>> The current html specification would instead have produced, depending on
>> how you work around its problems, one of
>>
>>    Midtskogveien 18
>>    Skedsmokorset
>>    2020
>>    Norway
>>
>> or
>>
>>    Midtskogveien 18
>>    Skedsmokorset, 2020
>>    Norway
>>
>> which is a slighter degree of mangling than for places like various areas
>> of Japan, that needs city-region.  Here is a Japanese example.  This is
>> correct (except that it's also a translation, not just a romanization):
>>
>>    Japan
>>    112-0001
>>    Tokyo-to
>>    Bunkyo-ku
>>    4-3-2 Hakusan
>>    3rd Floor Room B
>>
>> and the RFC7992 specification would have rendered it as either this,
>> preserving the semantic labelling of the parts, but loosing the
>> city-region
>> part:
>>
>>    3rd Floor Room B
>>    4-3-2 Hakusan
>>    Tokyo-to, 112-0001
>>    Japan
>>
>> or would loose all semantic labelling through forced use of postalLine
>> for
>> all entries.
>>
>> This problem has however been solved to a large degree by modern
>> libraries.
>> If you can do this *right*, why continue to do it wrong?
>
> Because it's complex, and now you are relying on a specific
> implementation. Is the algorithm documented?
> ...

Coming back to this because I'm looking into how much it would take to
implement it.

Apart from complexity, my concern here is that the output now depends on
a library that is specific to a certain language. If you can point to a
definition of what that library does, it would be less of a concern.

Looking at
<https://tools.ietf.org/html/draft-levkowetz-xml2rfc-v3-implementation-notes-08#section-3.1.13>:

> 3.1.13.  In Section 2.37, <postal>
>
>    The enhancement to <postal>, adding a <postalLine> element, is a fair
>    step on the way to permitting better representation of the wealth of
>    postal addresses around the globe which don't match the American
>    postal addresses.
>
>    Unfortunately, it manages to throw the baby out with the bathwater by
>    constraining postalLine to be used only if none of the other elements
>    are used.  This makes it impossible to apply hCard [HCARD] labels
>    (based on vCard [RFC6350] properties) to the elements of an address,
>    as [RFC7992] requires.  Applying the schema from [RFC7991] would make

I agree that there's a disconnect between RFC 7991 and 7992. However, I
would prefer to solve that disconnect by just removing the hCard
requirement from RFC 7992. Is anybody aware of any real-world code that
actually processes that information (and can demonstrate it with an
example)?

(Disclaimer: rfc2629.xslt used to produce hCard information 14 years ago
(<https://github.com/reschke/xml2rfc/commit/e36d8b12968f60781bdc4d5ea77b6b16b3895ed4>)
but I removed it due to complexity and unclear benefits 6 years ago).

>    country information and hCard tags unavailable for any locality with
>    a postal address scheme that needs to use <postalLine> because it
>    does not match the American scheme.  This would make statistics such
>    as the author origin statistics either miss authors with such
>    addresses, or make the statistics harder to compile than is
>    necessary, and make for instance the data on this page skewed:
>    <https://datatracker.ietf.org/stats/document/yearly/continent/>

I note that this page currently works based on the text versions.

That said, I agree that if we want these stats, they should be easy to
extract from the XML. The current proposal however seems to be a bit
over the top to me.

>    The current implementation maps <postalLine> to the hCard property
>    "extended-address", and permits it to be used together with other
>    elements, in particular <country>, <region>, and <city>.  This is a
>    change to the schema.
>
>    The current implementation also provides a full set of hCard- and
>    [RFC6350]-compatible address elements, including <extaddr> and
>    <pobox>.  The hCard locality address component is mapped to the
>    current <city> element, however; not renamed to '<locality>'.

I note that the implementation notes do not describe all new elements,
nor how they are rendered.

Looking at the currently checked-in grammar
(<https://trac.tools.ietf.org/tools/xml2rfc/trac/browser/trunk/cli/xml2rfc/data/v3.rnc?rev=3042>):

>    postal =
>      element postal {
>        attribute xml:base { text }?,
>        attribute xml:lang { text }?,
>        ((city | code | country | region | street)*
>         | postalLine+
>         | (city?
>            & cityarea?
>            & code?
>            & country
>            & extaddr*
>            & pobox?
>            & region?
>            & sortingcode?
>            & street*))
>      }

This defines cityarea, extaddr, pobox and sortingcode. We need
descriptions of these.

I also note that the grammar now has three different content models for
<postal>, and whether a given source instance matches the first
(classic) or the third (new) seems not to be always deterministic. If
this is kept, we should try to reduce this to two cases.

Best regards, Julian