Re: [ire] CSV and RFC4180

"Gould, James" <JGould@verisign.com> Thu, 13 December 2012 20:22 UTC

Return-Path: <JGould@verisign.com>
X-Original-To: ire@ietfa.amsl.com
Delivered-To: ire@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0441221F8AD2 for <ire@ietfa.amsl.com>; Thu, 13 Dec 2012 12:22:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.286
X-Spam-Level:
X-Spam-Status: No, score=-6.286 tagged_above=-999 required=5 tests=[AWL=0.313, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QTSskeVrOH8A for <ire@ietfa.amsl.com>; Thu, 13 Dec 2012 12:22:20 -0800 (PST)
Received: from exprod6og126.obsmtp.com (exprod6og126.obsmtp.com [64.18.1.77]) by ietfa.amsl.com (Postfix) with ESMTP id 78FFD21F89BE for <ire@ietf.org>; Thu, 13 Dec 2012 12:21:58 -0800 (PST)
Received: from osprey.verisign.com ([216.168.239.75]) (using TLSv1) by exprod6ob126.postini.com ([64.18.5.12]) with SMTP ID DSNKUMo45v68z1ydb5Zr1x5oMUPL86FEjbo5@postini.com; Thu, 13 Dec 2012 12:22:19 PST
Received: from brn1wnexcas02.vcorp.ad.vrsn.com (brn1wnexcas02.vcorp.ad.vrsn.com [10.173.152.206]) by osprey.verisign.com (8.13.6/8.13.4) with ESMTP id qBDKLtxQ017476 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Thu, 13 Dec 2012 15:21:55 -0500
Received: from BRN1WNEXMBX02.vcorp.ad.vrsn.com ([::1]) by brn1wnexcas02.vcorp.ad.vrsn.com ([::1]) with mapi id 14.02.0318.004; Thu, 13 Dec 2012 15:21:54 -0500
From: "Gould, James" <JGould@verisign.com>
To: Gustavo Lozano <gustavo.lozano@icann.org>
Thread-Topic: [ire] CSV and RFC4180
Thread-Index: Ac3ZW7D9ufs4LNVoS7qVrGt6cHeCeQAAJiwAAAtDYAD//620gIAAXJYA///CA28=
Date: Thu, 13 Dec 2012 20:21:54 +0000
Message-ID: <B365DDCA-8371-49AA-A090-05AC61EA7819@verisign.com>
References: <C41D7AF7FCECBE44940E9477E8E70D7A0D742264@BRN1WNEXMBX02.vcorp.ad.vrsn.com>, <CCEF6231.69F6%gustavo.lozano@icann.org>
In-Reply-To: <CCEF6231.69F6%gustavo.lozano@icann.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "ire@ietf.org" <ire@ietf.org>
Subject: Re: [ire] CSV and RFC4180
X-BeenThere: ire@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Internet Registration Escrow discussion list." <ire.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ire>, <mailto:ire-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ire>
List-Post: <mailto:ire@ietf.org>
List-Help: <mailto:ire-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ire>, <mailto:ire-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Dec 2012 20:22:21 -0000

Gustavo,

The RFC does not restrict it to ASCII and the use of CSV is heavily used in the industry for data export and import, so I don't believe there is an inherent advantage in the use of XML in supporting non-ASII data.

JG

James F. Gould
Principal Engineer
Verisign

jgould@verisign.com

On Dec 13, 2012, at 2:03 PM, "Gustavo Lozano" <gustavo.lozano@icann.org> wrote:

> James,
> 
> The encoding of the file could be UTF-8, but my reading of the ABNF
> grammar (Section 2 of RFC4180) is that the the code point repertory is
> limited to %x20-21 / %x23-2B / %x2D-7E.
> 
> I have created a text file with non us-ascii code points, fields separated
> by "," in my computer and I saved this file as UTF-8. I opened the file
> with different applications without problem , but I am not sure that this
> file can be considered CSV format compliant.
> 
> I am not trying to be picky here, the escrow deposit is a fundamental
> piece of the registry transition process and RFC 4180 made me feel that
> CSV is not as well as defined as XML.
> 
> 
> Regards,
> 
> Gustavo
> 
> 
> On 12/13/12 10:32 AM, "Gould, James" <JGould@verisign.com> wrote:
> 
>> Gustavo,
>> 
>> Doesn't it state in RFC 4180 the following?
>> 
>> Common usage of CSV
>> is US-ASCII, but other character sets defined
>>     by IANA for the "text"
>> tree may be used in conjunction with the
>>     "charset" parameter.
>> 
>> It does
>> not preclude the use of UTF-8 or any other character
>> set
>> (http://www.iana.org/assignments/character-sets/character-sets.xml), so
>> we
>> should be good.
>> 
>> 
>> 
>> -- 
>> 
>> JG
>> 
>> 
>> 
>> James Gould
>> Principal Software
>> Engineer
>> jgould@verisign.com
>> 
>> 703-948-3271 (Office)
>> 12061 Bluemont
>> Way
>> Reston, VA 20190
>> VerisignInc.com
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 12/13/12 1:26 PM, "Gustavo
>> Lozano" <gustavo.lozano@icann.org> wrote:
>> 
>>> James,
>>> 
>>> I understand that we
>> can produce text files encoded in UTF-8.
>>> 
>>> My concern is the TEXTDATA ABNF
>> grammar defined in RFC4180: %x20-21 /
>>> %x23-2B / %x2D-7E, which support a
>> subset of US-ASCII extended only.
>>> 
>>> After pre delegation testing, ICANN
>> will have evidence that the escrow
>>> deposit file is correct, but this is
>> only a snapshot in time. Registries
>>> platforms are updated, RDBMS are
>> updated, libraries are updated and in
>>> general all the components of the SRS
>> evolve during time. My concern is to
>>> find in the future that the escrow
>> deposit file of a registry operator is
>>> corrupted because some library is
>> now following the ABNF grammar in
>>> RFC4180 or other validation rules. The
>> same applies to the EBERO system.
>>> 
>>> XML being a well defined standard make
>> more comfortable in this regard.
>>> 
>>> Regards,
>>> 
>>> Gustavo Lozano
>>> 
>>> On
>> 12/13/12 10:04 AM, "Gould, James" <JGould@verisign.com>
>> wrote:
>>> 
>>>> Gustavo,
>>>> 
>>>> I don't believe that the file encoding is the key
>> determinate of the file
>>>> format decision.  The CSV draft includes an
>> encoding attribute with the
>>>> default of UTF-8.  It's up to the producer and
>> consumer to support the
>>>> appropriate encoding of the data in any case
>> whether we're talking about
>>>> XML as the file format or CSV.  I don't
>> believe we would have any issue
>>>> producing or consuming UTF-8 encoded CSV
>> files.
>>>> 
>>>> -- 
>>>> 
>>>> JG
>>>> 
>>>> 
>>>> 
>>>> James Gould
>>>> Principal Software
>> Engineer
>>>> jgould@verisign.com
>>>> 
>>>> 703-948-3271 (Office)
>>>> 12061 Bluemont
>> Way
>>>> Reston, VA 20190
>>>> VerisignInc.com
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On 12/13/12
>> 1:00 PM, "Gustavo Lozano" <gustavo.lozano@icann.org>
>> wrote:
>>>> 
>>>>> Colleagues,
>>>>> 
>>>>> I find the proposal of Comma-Separated
>> Values (CSV) Objects Mapping
>>>>> interesting and I think that both approaches
>> CSV or XML for escrow data
>>>>> have its advantages and
>> disadvantages.
>>>>> 
>>>>> The only RFC related to CSV that I have found is
>> RFC4180.
>>>>> 
>>>>> This text from RFC4180 concerns me:
>>>>> TEXTDATA =  %x20-21 /
>> %x23-2B / %x2D-7E
>>>>> 
>>>>> The escrow deposit will contain non US-ASCII
>> data.
>>>>> 
>>>>> How can we be sure that the libraries/database tools used to
>> implement
>>>>> the
>>>>> export/import of CSV will adequately work with non
>> US-ASCII data? There
>>>>> are different platforms and architectures used by
>> different players
>>>>> (EBEROs, registry operators and data escrow agents)
>> that will be
>>>>> upgraded
>>>>> and will evolve during time.
>>>>> 
>>>>> In this
>> regard I feel more confortable with XML because Unicode support
>>>>> have been
>> present since the beginning.
>>>>> 
>>>>> Thoughts?
>>>>> 
>>>>> Regards,
>>>>> Gustavo
>> Lozano
>>>>> 
>>>>> _______________________________________________
>>>>> ire mailing
>> list
>>>>> ire@ietf.org
>>>>> https://www.ietf.org/mailman/listinfo/ire
>> 
>> ____
>> ___________________________________________
>> ire mailing
>> list
>> ire@ietf.org
>> https://www.ietf.org/mailman/listinfo/ire
>