Re: [ire] CSV woes

"Gould, James" <> Tue, 01 October 2013 13:18 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 42A6121E8200 for <>; Tue, 1 Oct 2013 06:18:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -5.745
X-Spam-Status: No, score=-5.745 tagged_above=-999 required=5 tests=[AWL=0.854, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 9V7k8ifNVb7J for <>; Tue, 1 Oct 2013 06:18:18 -0700 (PDT)
Received: from ( []) by (Postfix) with ESMTP id 4A23C21E81E2 for <>; Tue, 1 Oct 2013 06:18:15 -0700 (PDT)
Received: from ([]) (using TLSv1) by ([]) with SMTP ID DSNKUkrLlpF6yH/; Tue, 01 Oct 2013 06:18:16 PDT
Received: from ( []) by (8.13.6/8.13.4) with ESMTP id r91DIAG6018860 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Tue, 1 Oct 2013 09:18:11 -0400
Received: from ([::1]) by ([::1]) with mapi id 14.02.0342.003; Tue, 1 Oct 2013 09:18:09 -0400
From: "Gould, James" <>
To: Klaus Malorny <>, "" <>
Thread-Topic: [ire] CSV woes
Thread-Index: AQHOvpF5xwHv3nLgJEqR3DpisfF13pnf1CcA
Date: Tue, 1 Oct 2013 13:18:08 +0000
Message-ID: <>
In-Reply-To: <>
Accept-Language: en-US
Content-Language: en-US
user-agent: Microsoft-MacOutlook/
x-originating-ip: []
Content-Type: text/plain; charset="us-ascii"
Content-ID: <>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [ire] CSV woes
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Internet Registration Escrow discussion list." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Tue, 01 Oct 2013 13:18:23 -0000


I respond to your feedback below.


James Gould
Principal Software Engineer
703-948-3271 (Office)
12061 Bluemont Way
Reston, VA 20190

On 10/1/13 6:31 AM, "Klaus Malorny" <> wrote:

>Hi all,
>unfortunately, I now have to deal with the CSV format and read through
>the -05 
>version* of the document (previously, I skipped that CSV parts). I got
>following questions:
>- the separator: a separator different to the default comma separator can
>   be specified by the "sep" attribute. Following the schema definition,
>   it can be any token, including multi-character strings or even an empty
>   string (which makes the file unparsable, of course). I don't think that
>   this is intended

Good catch, we can update the schema to support a single character for the
separator like below:

    <attribute name="sep" type="rdeCsv:sepType" default=","/>

   <simpleType name="sepType">
      <restriction base="string">
         <minLength value="1"/>
         <maxLength value="1"/>

Do you have any additional proposals on restrictions?

>- it is left open how to deal with special characters, including, but
>   not limited to, quotes, separators and newlines. I wonder why there
>   is no reference to RFC 4180

Additional clarity could be added here.
draft-arias-noguchi-dnrd-objects-mapping does not conform to RFC 4180
(e.g. No header and support for UTF-8), but elements of RFC 4180 could be
used to provide additional clarity around the handling of quotes,
separators, and newlines in the CSV files.

>- it is unclear to me why the "deletes" entries may have subtables
>   e.g. the "csvDomain:deletes" the "domainContacts", as depicted in
>   the example in section 5.1.2 on page 23. Does it mean that in
>incremental or
>   differential escrows, the CSV format allows the addition and removal of
>   individual multi-value entries? For example, if I have the domain
>   "example.tld", and the technical contact has changed from "abc123" to
>   "def456", the "domainContacts" table in the <csvDomain:contents>
>   section would *only* contain the line
>     example.tld,def456,tech
>   (and missing the admin and billing references) and the "domainContacts"
>   table in the <csvDomain:deletes> section would contain an entry
>     example.tld,abc123,tech
>   This would be a large difference to the XML format, as there the full
>   object is always being replaced, and it would be also quite a hassle
>   error-prone to generate or read and process this.

Yes, the CSV model is relational based and not object based, so
incremental and differential deposits are done at the record level and not
at the object level.  The "domainContacts" CSV file, either used under the
<csvDomain:contents> or <csvDomain:deletes> elements, represents the link
table between the domain and the contact CSV files.  With the CSV files,
you apply the deltas at the record level as opposed to the object level.
That is the fundamental difference between the XML and CSV models in

>- I do not understand why in "domainNameServers", hosts are referenced
>   by their ROIDs, whereas in "domainContacts", contacts are referenced
>   by their IDs, especially as this seems to have been changed to the
>   worse. The specification argues with supporting both hostAttr and
>   models, however, it leaves completely open how the hostAttr model
>   is represented.

The hosts are referenced by ROIDs instead of the host name, since use of a
natural key (host name), will not work for some host models (hostAttr as
well as having a separate set of external hosts per registrar) and will
not support host name changes.  A host name change should only result in a
change to a single record and not all links to that host record.  The
contact identifier is unique and can't be changed, so it can be used
directly without use of a surrogate key via the ROID.

>These are the questions for now, maybe more come later ;-)

Thank you for your feedback.

>ire mailing list