Re: File metadata format draft

Miguel Garcia <Miguel.Garcia@nsn.com> Thu, 15 November 2007 09:04 UTC

Return-path: <discuss-bounces@apps.ietf.org>
Received: from [127.0.0.1] (helo=stiedprmman1.va.neustar.com) by megatron.ietf.org with esmtp (Exim 4.43) id 1Isadw-0005Kq-Km; Thu, 15 Nov 2007 04:04:12 -0500
Received: from discuss by megatron.ietf.org with local (Exim 4.43) id 1Isadv-0005IC-6D for discuss-confirm+ok@megatron.ietf.org; Thu, 15 Nov 2007 04:04:11 -0500
Received: from [10.91.34.44] (helo=ietf-mx.ietf.org) by megatron.ietf.org with esmtp (Exim 4.43) id 1Isadu-0005HB-Oy for discuss@apps.ietf.org; Thu, 15 Nov 2007 04:04:10 -0500
Received: from smtp.nokia.com ([131.228.20.170] helo=mgw-ext11.nokia.com) by ietf-mx.ietf.org with esmtp (Exim 4.43) id 1Isadr-0002wD-7x for discuss@apps.ietf.org; Thu, 15 Nov 2007 04:04:10 -0500
Received: from esebh107.NOE.Nokia.com (esebh107.ntc.nokia.com [172.21.143.143]) by mgw-ext11.nokia.com (Switch-3.2.5/Switch-3.2.5) with ESMTP id lAF93ulM023751; Thu, 15 Nov 2007 11:03:59 +0200
Received: from esebh102.NOE.Nokia.com ([172.21.138.183]) by esebh107.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.1830); Thu, 15 Nov 2007 11:03:56 +0200
Received: from [10.144.23.72] ([10.144.23.72]) by esebh102.NOE.Nokia.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.1830); Thu, 15 Nov 2007 11:03:56 +0200
Message-ID: <473C0B7C.8090308@nsn.com>
Date: Thu, 15 Nov 2007 11:03:56 +0200
From: Miguel Garcia <Miguel.Garcia@nsn.com>
User-Agent: Thunderbird 1.5.0.12 (Windows/20070509)
MIME-Version: 1.0
To: Trace Bond <tbond@ctv.ca>
Subject: Re: File metadata format draft
References: <F8EBAC189452844C808799CDF8B1F7F707C52C4E@vacoms04.corp.ctv.ca>
In-Reply-To: <F8EBAC189452844C808799CDF8B1F7F707C52C4E@vacoms04.corp.ctv.ca>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-OriginalArrivalTime: 15 Nov 2007 09:03:56.0282 (UTC) FILETIME=[73C9EDA0:01C82766]
X-Nokia-AV: Clean
X-Spam-Score: 0.0 (/)
X-Scan-Signature: 86f85b2f88b0d50615aed44a7f9e33c7
Cc: discuss@apps.ietf.org, METS@LOC.gov
X-BeenThere: discuss@apps.ietf.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: general discussion of application-layer protocols <discuss.apps.ietf.org>
List-Unsubscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=unsubscribe>
List-Post: <mailto:discuss@apps.ietf.org>
List-Help: <mailto:discuss-request@apps.ietf.org?subject=help>
List-Subscribe: <https://www1.ietf.org/mailman/listinfo/discuss>, <mailto:discuss-request@apps.ietf.org?subject=subscribe>
Errors-To: discuss-bounces@apps.ietf.org

Hi Trace:

Thanks for your comments. See some inline discussion.

Trace Bond wrote:
> Hi Miguel,
> Thanks for this draft. I think this is a very important topic!
> 
> For the past two years, I've been searching for an XML format for describing
> computer files that I could use for creating catalogs of files for backup
> purposes.
> Such a format could be a very useful tool for making, duplicating and
> sharing vendor-independent file systems, as well as cataloging.
> 
> There seem to be two barriers to the embrace of some such standard:
> 1) The tendency of the community to focus on "resources" which may be
> misguided since most of the "resources" are files.

And as a matter of fact, earlier versions of the draft, which were at 
that time part of a SIP event package draft, was discussing "resources" 
rather than "files". We found difficult to get a general format for 
describing any kind of resource, so we decided to focus on "files", 
which is our main target.

> 2) There are already several file metadata XML formats and the
> community/communities cannot focus on one.
> 	Examples:
> 	a) the WebDAV format mentioned below
> 	b) FLUTE/RFC3926's file description format
> 	c) <file><fileGrp> schema from the library community
> (http://www.loc.gov/standards/mets/)
> 	d) the DublinCore/RDF format from the resource-centric SemanticWeb
> community
> 
> One of my many questions is.. could this draft be the core of a
> general-purpose XML format for describing computer files and file systems?

That is a good question. Honestly, it seems that creating a generic 
format that can serve any possible application is a real challenge. 
Perhaps we should take the approach you suggest... creating a core 
extensible specification that would be the foundation for describing 
files. If applications need to express additional characteristics of a 
file, those could be done through extensions.

This has the benefit of easiness of implementing additional 
applications, once you have implemented the first one.

This would also require some sort of commitment from the IETF community 
on the approach taken. I don't think we are here yet, perhaps this is 
the first step.

> Is such a question or format meaningful or possible? 
> 
> Some questions on this draft
> http://www.ietf.org/internet-drafts/draft-garcia-app-area-file-data-format-0
> 0.txt:
> 1) From page 5:  
> "Each <instance> element provides information that is related to a
> particular instance of the file, rather than the file itself." 
> This is unclear to me.. isn't each instance of the file an actual physical
> copy/clone?

That's right. Each instance is a copy of the same size. I'll try to find 
a better words to describe what the <instance> element is.


> 2) From page 6:
> "The <modification-date> element indicates the date and time at which the
> file was last modified."
> If this is the case, then this element should be in <identity> (not
> <instance>) since it is my understanding that all instances/copies of a file
> must be identical.

Hmm... this is an interesting question, and I have been thinking for a 
while, but I don't have a straight answer. These are some facts:

- If I open a file to edit it and then save it without changes... does 
the modification date changes? I guess this might depend on the file 
system. But the idea is: if the file is simply saved with no changes, it 
should be exactly the same file, right? If we add the 
<modification-date> to the <identity> element, then it would 
automatically imply it is a different file. Is this the desired behavior?

- I want to understand what happens if I receive a file somehow, I store 
it, but I don't have the modification date. Most likely, the operating 
system writes a modification date that equals the creation date (I have 
to verify this point, though). So, if the <modification-date> is part of 
the <identity> element, this file would be different from the same file 
stored in another endpoint. I think this isn't the desired effect.


BR,

    Miguel
> 
> Please forgive me for posting to two email forums. 
> 	Best regards,
> 	Trace Bond
> 	Vancouver
> 
> @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
> At the moment the integration with the WebDAV properties is not done. I
> believe this is a topic for discussion. I have here more questions than
> answers. For example:
> 
> Do we want to propose a single data format for describing file metadata? The
> WebDAV properties is already standardized and deployed, whereas the
> file-data-format draft is not. However, the requirements of both
> applications are not totally the same. For example, the file data format
> does not care about the creator, but needs to signal different endpoints
> (instances) that host the same file.
> 
> Or can we integrate the WebDAV properties into the file data format? This
> will effectively create a superset of the WebDAV properties, bt I am not
> sure it will have a footprint on the implementations, due to the advance
> state of deployment of WebDAV.
> 
> Or do we want to continue with two separate paths, perhaps with this mapping
> that you mentioned?
> 
> BR,
> Miguel
> 
> 
> Cyrus Daboo wrote: 
> 
> 	Hi Miguel,
> 
> 	--On November 12, 2007 9:58:50 PM +0200 Miguel Garcia <Miguel.Garcia
> at nsn.com> wrote:
> 	
> 	
> 
> 		I have submitted a draft that defines an XML schema for
> describing files
> 		and associated metadata. I am referring to this draft:
> 
> 	
> http://www.ietf.org/internet-drafts/draft-garcia-app-area-file-data-forma
> 		t-00.txt
> 
> 		The background of this draft is as follows: We first
> submitted a SIP
> 		event package for subscribing to changes in files stored in
> a remote
> 		endpoint. The idea is to use SIP to emulate the "shared
> folder"
> 		functionality that is available in common instant messaging
> and presence
> 		clients. A part of the overall function is a format for
> describing files
> 		and associated metadata.
> 
> 		At the last IETF meeting, I got a few comments indicating
> that the data
> 		format could be reused by other protocols, namely HTTP, and
> thus, it
> 		should be split out from the draft and discussed in the Apss
> area. So
> 		this is what I am doing.
> 
> 		So, the authors would like to get comments and questions
> about the
> 		document.
> 		
> 
> 	
> 	Had a quick look. In the HTTP world, meta-data about files is
> already covered by WebDAV properties. So the question I have is what, if
> any, thought has been given to integration with WebDAV. e.g. a lot of
> elements in your schema have exact equivalents in WebDAV. It might be
> useful, at a minimum, to define a mapping between your schema and what
> WebDAV provides.
> 	
> 	
> 
> 
> --
> Miguel A. Garcia           tel:+358-50-4804586
> Nokia Siemens Networks     Espoo, Finland
> 
> 
> 

-- 
Miguel A. Garcia           tel:+358-50-4804586
Nokia Siemens Networks     Espoo, Finland