Re: [ire] CSV and RFC4180
"Gould, James" <JGould@verisign.com> Thu, 13 December 2012 20:22 UTC
Return-Path: <JGould@verisign.com>
X-Original-To: ire@ietfa.amsl.com
Delivered-To: ire@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0441221F8AD2 for <ire@ietfa.amsl.com>; Thu, 13 Dec 2012 12:22:21 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.286
X-Spam-Level:
X-Spam-Status: No, score=-6.286 tagged_above=-999 required=5 tests=[AWL=0.313, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id QTSskeVrOH8A for <ire@ietfa.amsl.com>; Thu, 13 Dec 2012 12:22:20 -0800 (PST)
Received: from exprod6og126.obsmtp.com (exprod6og126.obsmtp.com [64.18.1.77]) by ietfa.amsl.com (Postfix) with ESMTP id 78FFD21F89BE for <ire@ietf.org>; Thu, 13 Dec 2012 12:21:58 -0800 (PST)
Received: from osprey.verisign.com ([216.168.239.75]) (using TLSv1) by exprod6ob126.postini.com ([64.18.5.12]) with SMTP ID DSNKUMo45v68z1ydb5Zr1x5oMUPL86FEjbo5@postini.com; Thu, 13 Dec 2012 12:22:19 PST
Received: from brn1wnexcas02.vcorp.ad.vrsn.com (brn1wnexcas02.vcorp.ad.vrsn.com [10.173.152.206]) by osprey.verisign.com (8.13.6/8.13.4) with ESMTP id qBDKLtxQ017476 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Thu, 13 Dec 2012 15:21:55 -0500
Received: from BRN1WNEXMBX02.vcorp.ad.vrsn.com ([::1]) by brn1wnexcas02.vcorp.ad.vrsn.com ([::1]) with mapi id 14.02.0318.004; Thu, 13 Dec 2012 15:21:54 -0500
From: "Gould, James" <JGould@verisign.com>
To: Gustavo Lozano <gustavo.lozano@icann.org>
Thread-Topic: [ire] CSV and RFC4180
Thread-Index: Ac3ZW7D9ufs4LNVoS7qVrGt6cHeCeQAAJiwAAAtDYAD//620gIAAXJYA///CA28=
Date: Thu, 13 Dec 2012 20:21:54 +0000
Message-ID: <B365DDCA-8371-49AA-A090-05AC61EA7819@verisign.com>
References: <C41D7AF7FCECBE44940E9477E8E70D7A0D742264@BRN1WNEXMBX02.vcorp.ad.vrsn.com>, <CCEF6231.69F6%gustavo.lozano@icann.org>
In-Reply-To: <CCEF6231.69F6%gustavo.lozano@icann.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "ire@ietf.org" <ire@ietf.org>
Subject: Re: [ire] CSV and RFC4180
X-BeenThere: ire@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Internet Registration Escrow discussion list." <ire.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ire>, <mailto:ire-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ire>
List-Post: <mailto:ire@ietf.org>
List-Help: <mailto:ire-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ire>, <mailto:ire-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 13 Dec 2012 20:22:21 -0000
Gustavo, The RFC does not restrict it to ASCII and the use of CSV is heavily used in the industry for data export and import, so I don't believe there is an inherent advantage in the use of XML in supporting non-ASII data. JG James F. Gould Principal Engineer Verisign jgould@verisign.com On Dec 13, 2012, at 2:03 PM, "Gustavo Lozano" <gustavo.lozano@icann.org> wrote: > James, > > The encoding of the file could be UTF-8, but my reading of the ABNF > grammar (Section 2 of RFC4180) is that the the code point repertory is > limited to %x20-21 / %x23-2B / %x2D-7E. > > I have created a text file with non us-ascii code points, fields separated > by "," in my computer and I saved this file as UTF-8. I opened the file > with different applications without problem , but I am not sure that this > file can be considered CSV format compliant. > > I am not trying to be picky here, the escrow deposit is a fundamental > piece of the registry transition process and RFC 4180 made me feel that > CSV is not as well as defined as XML. > > > Regards, > > Gustavo > > > On 12/13/12 10:32 AM, "Gould, James" <JGould@verisign.com> wrote: > >> Gustavo, >> >> Doesn't it state in RFC 4180 the following? >> >> Common usage of CSV >> is US-ASCII, but other character sets defined >> by IANA for the "text" >> tree may be used in conjunction with the >> "charset" parameter. >> >> It does >> not preclude the use of UTF-8 or any other character >> set >> (http://www.iana.org/assignments/character-sets/character-sets.xml), so >> we >> should be good. >> >> >> >> -- >> >> JG >> >> >> >> James Gould >> Principal Software >> Engineer >> jgould@verisign.com >> >> 703-948-3271 (Office) >> 12061 Bluemont >> Way >> Reston, VA 20190 >> VerisignInc.com >> >> >> >> >> >> >> >> On 12/13/12 1:26 PM, "Gustavo >> Lozano" <gustavo.lozano@icann.org> wrote: >> >>> James, >>> >>> I understand that we >> can produce text files encoded in UTF-8. >>> >>> My concern is the TEXTDATA ABNF >> grammar defined in RFC4180: %x20-21 / >>> %x23-2B / %x2D-7E, which support a >> subset of US-ASCII extended only. >>> >>> After pre delegation testing, ICANN >> will have evidence that the escrow >>> deposit file is correct, but this is >> only a snapshot in time. Registries >>> platforms are updated, RDBMS are >> updated, libraries are updated and in >>> general all the components of the SRS >> evolve during time. My concern is to >>> find in the future that the escrow >> deposit file of a registry operator is >>> corrupted because some library is >> now following the ABNF grammar in >>> RFC4180 or other validation rules. The >> same applies to the EBERO system. >>> >>> XML being a well defined standard make >> more comfortable in this regard. >>> >>> Regards, >>> >>> Gustavo Lozano >>> >>> On >> 12/13/12 10:04 AM, "Gould, James" <JGould@verisign.com> >> wrote: >>> >>>> Gustavo, >>>> >>>> I don't believe that the file encoding is the key >> determinate of the file >>>> format decision. The CSV draft includes an >> encoding attribute with the >>>> default of UTF-8. It's up to the producer and >> consumer to support the >>>> appropriate encoding of the data in any case >> whether we're talking about >>>> XML as the file format or CSV. I don't >> believe we would have any issue >>>> producing or consuming UTF-8 encoded CSV >> files. >>>> >>>> -- >>>> >>>> JG >>>> >>>> >>>> >>>> James Gould >>>> Principal Software >> Engineer >>>> jgould@verisign.com >>>> >>>> 703-948-3271 (Office) >>>> 12061 Bluemont >> Way >>>> Reston, VA 20190 >>>> VerisignInc.com >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On 12/13/12 >> 1:00 PM, "Gustavo Lozano" <gustavo.lozano@icann.org> >> wrote: >>>> >>>>> Colleagues, >>>>> >>>>> I find the proposal of Comma-Separated >> Values (CSV) Objects Mapping >>>>> interesting and I think that both approaches >> CSV or XML for escrow data >>>>> have its advantages and >> disadvantages. >>>>> >>>>> The only RFC related to CSV that I have found is >> RFC4180. >>>>> >>>>> This text from RFC4180 concerns me: >>>>> TEXTDATA = %x20-21 / >> %x23-2B / %x2D-7E >>>>> >>>>> The escrow deposit will contain non US-ASCII >> data. >>>>> >>>>> How can we be sure that the libraries/database tools used to >> implement >>>>> the >>>>> export/import of CSV will adequately work with non >> US-ASCII data? There >>>>> are different platforms and architectures used by >> different players >>>>> (EBEROs, registry operators and data escrow agents) >> that will be >>>>> upgraded >>>>> and will evolve during time. >>>>> >>>>> In this >> regard I feel more confortable with XML because Unicode support >>>>> have been >> present since the beginning. >>>>> >>>>> Thoughts? >>>>> >>>>> Regards, >>>>> Gustavo >> Lozano >>>>> >>>>> _______________________________________________ >>>>> ire mailing >> list >>>>> ire@ietf.org >>>>> https://www.ietf.org/mailman/listinfo/ire >> >> ____ >> ___________________________________________ >> ire mailing >> list >> ire@ietf.org >> https://www.ietf.org/mailman/listinfo/ire >
- [ire] CSV and RFC4180 Gustavo Lozano
- Re: [ire] CSV and RFC4180 Gould, James
- Re: [ire] CSV and RFC4180 Gustavo Lozano
- Re: [ire] CSV and RFC4180 Gould, James
- Re: [ire] CSV and RFC4180 Francisco Obispo
- Re: [ire] CSV and RFC4180 Gustavo Lozano
- Re: [ire] CSV and RFC4180 Gould, James