Re: [ire] DNRD CSV Draft
"Gould, James" <JGould@verisign.com> Thu, 06 December 2012 14:32 UTC
Return-Path: <JGould@verisign.com>
X-Original-To: ire@ietfa.amsl.com
Delivered-To: ire@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0F1D421F86FD for <ire@ietfa.amsl.com>; Thu, 6 Dec 2012 06:32:24 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.598
X-Spam-Level:
X-Spam-Status: No, score=-6.598 tagged_above=-999 required=5 tests=[AWL=0.001, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2yyADf2ec1Ju for <ire@ietfa.amsl.com>; Thu, 6 Dec 2012 06:32:21 -0800 (PST)
Received: from exprod6og113.obsmtp.com (exprod6og113.obsmtp.com [64.18.1.31]) by ietfa.amsl.com (Postfix) with ESMTP id 1986021F86B5 for <ire@ietf.org>; Thu, 6 Dec 2012 06:32:20 -0800 (PST)
Received: from peregrine.verisign.com ([216.168.239.74]) (using TLSv1) by exprod6ob113.postini.com ([64.18.5.12]) with SMTP ID DSNKUMCsc5NGLr7jLkuckhhW3kmlxX0F3JiU@postini.com; Thu, 06 Dec 2012 06:32:21 PST
Received: from BRN1WNEXCHM01.vcorp.ad.vrsn.com (brn1wnexchm01.vcorp.ad.vrsn.com [10.173.152.255]) by peregrine.verisign.com (8.13.6/8.13.4) with ESMTP id qB6EWBd4030707 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=FAIL); Thu, 6 Dec 2012 09:32:15 -0500
Received: from BRN1WNEXMBX01.vcorp.ad.vrsn.com ([::1]) by BRN1WNEXCHM01.vcorp.ad.vrsn.com ([::1]) with mapi id 14.02.0318.004; Thu, 6 Dec 2012 09:32:10 -0500
From: "Gould, James" <JGould@verisign.com>
To: Francisco Obispo <fobispo@isc.org>
Thread-Topic: [ire] DNRD CSV Draft
Thread-Index: AQHNsvFe/aJ3RS8H6U+/LvAXD9hPVJgLmLEAgAB+WAA=
Date: Thu, 06 Dec 2012 14:32:09 +0000
Message-ID: <C41D7AF7FCECBE44940E9477E8E70D7A0D72F1C6@BRN1WNEXMBX01.vcorp.ad.vrsn.com>
In-Reply-To: <0B69D760-4713-4BCB-8FA1-2F0034BF8CEE@isc.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.2.2.120421
x-originating-ip: [10.173.152.4]
Content-Type: text/plain; charset="Windows-1252"
Content-ID: <DB5C145A56F2CA4AB1CE78005F1068B1@verisign.com>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "Thippeswamy, Chethan" <CThippeswamy@verisign.com>, "ire@ietf.org" <ire@ietf.org>
Subject: Re: [ire] DNRD CSV Draft
X-BeenThere: ire@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: "Internet Registration Escrow discussion list." <ire.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ire>, <mailto:ire-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/ire>
List-Post: <mailto:ire@ietf.org>
List-Help: <mailto:ire-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ire>, <mailto:ire-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 06 Dec 2012 14:32:24 -0000
Francisco, Thanks for the reply, my feedback is embedded below. -- JG James Gould Principal Software Engineer jgould@verisign.com 703-948-3271 (Office) 12061 Bluemont Way Reston, VA 20190 VerisignInc.com On 12/5/12 8:59 PM, "Francisco Obispo" <fobispo@isc.org> wrote: >So I'm now trying to go in more detail on the CSV implementation, to try >to understand its benefits. > >So far I've found the following cons of this approach: > >1) It does not eliminate the need of having to code, whoever implements > this, is going to have to write some scripts to generate the enclosing > XML file, and generate all the CSVs in the right format. In fact, I >believe > there is going to be more work associated with this approach, see #2. I disagree with this. First the XML definition that follows draft-arias-noguchi-registry-data-escrow-04 would be primarily static content, with the dynamic content being the file names and optionally the file checksums. You could use a templating language (e.g. Apache Velocity) or simple string replacement to replace the file names and checksums in the static definition XML file. This is very straight forward and might take less time then the time that I'm taking creating this e-mail. You define the fields that are applicable to your registry in the XML definition and then you're free to use any tool (off the shelf or custom) to generate the CSV files. The CSV files map one-to-one with a relational schema, so there is very little transformation required. The all XML approach of draft-arias-noguchi-dnrd-objects-mapping-01 requires a relational to object conversion, that is far more complex and is far more work then dumping fields from a relational schema to a set of CSV files. > >2) There's the issue of Validation of the data. The fields are defined > using an XML Schema (i.e.: <rdeCsv:fName> Name field with >type="eppcom:labelType") > which will create the need of writing a program to implement all of the > types used in the schema for validation. I do agree that there is the need to create a custom validation program that validates XSD field format definitions to the CSV data fields. I believe that the community can create this tool, and we would certainly like to participate in that effort. The fact that you can validate XML using an XML parser doesn't mean that we should make the generation and consumption of the data escrows more complex to forgo having to build a new validation program. The field types defined in draft-gould-thippeswamy-dnrd-csv-mapping-00 utilize XSD types, so it's a matter of reusing parts of the XML Parsers to validate individual field elements. > >3) We now have more files to process/store/sign, instead of just one. The files could be placed and compressed into a ZIP file per the AGB, so the number of contained files is not relevant for transfer, signing, and storage. > > >I'm not done with my tests yet, but I've been able to compress an XML file >with BZIP2 to 1/15th of its original size. I'm not envisioning that this >is >going to be a problem for most registries. You can also compress the CSV files down to 1/15th of their original size. You can't ignore the uncompressed size from the processing though. The compression can be done to reduce the transfer time and the storage costs. You should compare the uncompressed and compressed deposits using CSV and XML with incrementally larger randomized data sets. I believe you will find a large advantage with the use of CSV from a size perspective. I fundamentally believe that the draft should address all registries independent of its size. Saying that some of the registries will have an problem with the use of XML to me is a showstopper for XML deposits. > >I'll provide more feedback later, > > >Francisco Obispo >Director of Applications and Services - ISC >email: fobispo@isc.org >Phone: +1 650 423 1374 || INOC-DBA *3557* NOC >PGP KeyID = B38DB1BE > >On Oct 25, 2012, at 1:43 PM, "Gould, James" <JGould@verisign.com> wrote: > >> All, >> >> We have created a draft of the Domain Name Registration Data (DNRD) >>Comma-Separated Values (CSV) Objects Mapping that is attached for review >>and feedback. We intend to post it to the IETF as an I-D once the >>submission page is available on November 5th. This draft fully supports >>the Registry Data Escrow Specification >>(draft-arias-noguchi-registry-data-escrow-04). It defines the CSV files >>and the order and format of the CSV fields for the data escrow of domain >>name, host, contact, registrar, and IDN language objects. If there is >>interest in this model we can consider merging it with the Domain Name >>Registration Data (DNRD) Objects Mapping >>(draft-arias-noguchi-dnrd-objects-mapping-01). The basis of using CSV >>files for DNRD objects includes: >> >> € CSV is a natural format for exporting and importing data from and to >>a database. This could greatly simplify the generation of the data >>escrow files as well as the consumption of the files by an EBERO >>provider. >> € XML is a highly verbose format that will adversely affect the >>processing of large data sets . With the draft, XML can be used for >>definition and CSV can be used for data, so the duplication of the >>descriptive information does not have to be used for every record. >> € If you take the domain object (<rdeDomain:domain>) example from >>draft-arias-noguchi-dnrd-objects-mapping-01 and convert it to CSV files >>(domain, dnssec, and domainTransfer) you save around 75% uncompressed >>and 82% compressed using CSV. >> € Extrapolating out the uncompressed size of a <rdeDomain:domain> >>records (1718 bytes versus 416 bytes uncompressed per record) for XML >>and CSV, you get to 1.7 GB with XML and 443 MB for CSV with 1 million >>records and 170 GB with XML and 44.3 GB for CSV with 100 million records. >> € The deposits are generated uncompressed, validated, compressed and >>transferred by registry, and uncompressed, validated and stored by the >>data escrow provider. Both the size difference and the processing >>resources required for both the registry and the data escrow provider >>should be considered when comparing the two models. >> € EBERO providers must transfer, uncompress, validate, and import the >>data into their database from the data escrow deposits, where the larger >>the files and the processing resources required, the longer it will take >>to recover the TLD. >> € The full deposit is done weekly, so it is a weekly hit for all >>registries, where the larger the registry the bigger the hit. >> Please review the attached draft and provide any feedback for >>consideration. >> >> Thanks, >> >> -- >> >> JG >> >> <86BF0728-DD04-4F90-8380-5AA8A9AB5D0B[81].png> >> >> James Gould >> Principal Software Engineer >> jgould@verisign.com >> >> 703-948-3271 (Office) >> 12061 Bluemont Way >> Reston, VA 20190 >> VerisignInc.com >> >> >> >><draft-gould-thippeswamy-dnrd-csv-mapping.txt>___________________________ >>____________________ >> ire mailing list >> ire@ietf.org >> https://www.ietf.org/mailman/listinfo/ire >
- [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Gavin Brown
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft David Conrad
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft David Morris
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft John Boen
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft John Boen
- Re: [ire] DNRD CSV Draft Gavin Brown
- Re: [ire] DNRD CSV Draft Chris Wright
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Chris Wright
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Chris Wright
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Chris Wright
- Re: [ire] DNRD CSV Draft Gustavo Lozano
- Re: [ire] DNRD CSV Draft Chris Wright
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Gustavo Lozano
- Re: [ire] DNRD CSV Draft Christopher Browne
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Chris Wright
- Re: [ire] DNRD CSV Draft Chris Wright
- Re: [ire] DNRD CSV Draft Chris Wright
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Christopher Browne
- Re: [ire] DNRD CSV Draft Gustavo Lozano
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Gould, James
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Francisco Obispo
- Re: [ire] DNRD CSV Draft Gould, James