Re: [ire] CSV and RFC4180

Gustavo Lozano <gustavo.lozano@icann.org> Thu, 13 December 2012 19:03 UTC

Return-Path: <gustavo.lozano@icann.org>
X-Original-To: ire@ietfa.amsl.com
Delivered-To: ire@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1AB2421F8A3E for <ire@ietfa.amsl.com>; Thu, 13 Dec 2012 11:03:47 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id y1hICUuSG6k5 for <ire@ietfa.amsl.com>; Thu, 13 Dec 2012 11:03:46 -0800 (PST)
Received: from EXPFE100-1.exc.icann.org (expfe100-1.exc.icann.org [64.78.22.236]) by ietfa.amsl.com (Postfix) with ESMTP id 47A1A21F8623 for <ire@ietf.org>; Thu, 13 Dec 2012 11:03:46 -0800 (PST)
Received: from EXVPMBX100-1.exc.icann.org ([64.78.22.232]) by EXPFE100-1.exc.icann.org ([64.78.22.236]) with mapi; Thu, 13 Dec 2012 11:03:45 -0800
From: Gustavo Lozano <gustavo.lozano@icann.org>
To: "Gould, James" <JGould@verisign.com>, "ire@ietf.org" <ire@ietf.org>
Date: Thu, 13 Dec 2012 11:03:46 -0800
Thread-Topic: [ire] CSV and RFC4180
Thread-Index: Ac3ZZJLbg0azj2qpR/WVaEMbmhYG0w==
Message-ID: <CCEF6231.69F6%gustavo.lozano@icann.org>
In-Reply-To: <C41D7AF7FCECBE44940E9477E8E70D7A0D742264@BRN1WNEXMBX02.vcorp.ad.vrsn.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.2.5.121010
acceptlanguage: en-US
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [ire] CSV and RFC4180
X-BeenThere: ire@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Internet Registration Escrow discussion list." <ire.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ire>, <mailto:ire-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ire>
List-Post: <mailto:ire@ietf.org>
List-Help: <mailto:ire-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ire>, <mailto:ire-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Dec 2012 19:03:47 -0000

James,

The encoding of the file could be UTF-8, but my reading of the ABNF
grammar (Section 2 of RFC4180) is that the the code point repertory is
limited to %x20-21 / %x23-2B / %x2D-7E.

I have created a text file with non us-ascii code points, fields separated
by "," in my computer and I saved this file as UTF-8. I opened the file
with different applications without problem , but I am not sure that this
file can be considered CSV format compliant.

I am not trying to be picky here, the escrow deposit is a fundamental
piece of the registry transition process and RFC 4180 made me feel that
CSV is not as well as defined as XML.


Regards,

Gustavo


On 12/13/12 10:32 AM, "Gould, James" <JGould@verisign.com> wrote:

>Gustavo,
>
>Doesn't it state in RFC 4180 the following?
>
>  Common usage of CSV
>is US-ASCII, but other character sets defined
>      by IANA for the "text"
>tree may be used in conjunction with the
>      "charset" parameter.
>
>It does
>not preclude the use of UTF-8 or any other character
>set
>(http://www.iana.org/assignments/character-sets/character-sets.xml), so
>we
>should be good.
>
>
>
>-- 
>
>JG
> 
>
> 
>James Gould
>Principal Software
>Engineer
>jgould@verisign.com
> 
>703-948-3271 (Office)
>12061 Bluemont
>Way
>Reston, VA 20190
>VerisignInc.com
>
>
>
>
>
>
>
>On 12/13/12 1:26 PM, "Gustavo
>Lozano" <gustavo.lozano@icann.org> wrote:
>
>>James,
>>
>>I understand that we
>can produce text files encoded in UTF-8.
>>
>>My concern is the TEXTDATA ABNF
>grammar defined in RFC4180: %x20-21 /
>>%x23-2B / %x2D-7E, which support a
>subset of US-ASCII extended only.
>>
>>After pre delegation testing, ICANN
>will have evidence that the escrow
>>deposit file is correct, but this is
>only a snapshot in time. Registries
>>platforms are updated, RDBMS are
>updated, libraries are updated and in
>>general all the components of the SRS
>evolve during time. My concern is to
>>find in the future that the escrow
>deposit file of a registry operator is
>>corrupted because some library is
>now following the ABNF grammar in
>>RFC4180 or other validation rules. The
>same applies to the EBERO system.
>>
>>XML being a well defined standard make
>more comfortable in this regard.
>>
>>Regards,
>>
>>Gustavo Lozano
>>
>>On
>12/13/12 10:04 AM, "Gould, James" <JGould@verisign.com>
>wrote:
>>
>>>Gustavo,
>>>
>>>I don't believe that the file encoding is the key
>determinate of the file
>>>format decision.  The CSV draft includes an
>encoding attribute with the
>>>default of UTF-8.  It's up to the producer and
>consumer to support the
>>>appropriate encoding of the data in any case
>whether we're talking about
>>>XML as the file format or CSV.  I don't
>believe we would have any issue
>>>producing or consuming UTF-8 encoded CSV
>files.
>>>
>>>-- 
>>>
>>>JG
>>> 
>>>
>>> 
>>>James Gould
>>>Principal Software
>Engineer
>>>jgould@verisign.com
>>> 
>>>703-948-3271 (Office)
>>>12061 Bluemont
>Way
>>>Reston, VA 20190
>>>VerisignInc.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 12/13/12
>1:00 PM, "Gustavo Lozano" <gustavo.lozano@icann.org>
>wrote:
>>>
>>>>Colleagues,
>>>>
>>>>I find the proposal of Comma-Separated
>Values (CSV) Objects Mapping
>>>>interesting and I think that both approaches
>CSV or XML for escrow data
>>>>have its advantages and
>disadvantages.
>>>>
>>>>The only RFC related to CSV that I have found is
>RFC4180.
>>>>
>>>>This text from RFC4180 concerns me:
>>>>TEXTDATA =  %x20-21 /
>%x23-2B / %x2D-7E
>>>>
>>>>The escrow deposit will contain non US-ASCII
>data.
>>>>
>>>>How can we be sure that the libraries/database tools used to
>implement
>>>>the
>>>>export/import of CSV will adequately work with non
>US-ASCII data? There
>>>>are different platforms and architectures used by
>different players
>>>>(EBEROs, registry operators and data escrow agents)
>that will be
>>>>upgraded
>>>>and will evolve during time.
>>>>
>>>>In this
>regard I feel more confortable with XML because Unicode support
>>>>have been
>present since the beginning.
>>>>
>>>>Thoughts?
>>>>
>>>>Regards,
>>>>Gustavo
>Lozano
>>>>
>>>>_______________________________________________
>>>>ire mailing
>list
>>>>ire@ietf.org
>>>>https://www.ietf.org/mailman/listinfo/ire
>>>
>>
>
>____
>___________________________________________
>ire mailing
>list
>ire@ietf.org
>https://www.ietf.org/mailman/listinfo/ire
>
>