Data Set Identifier Interoperablity BoF, IETF 84
Chairs: Beth Plale, Ted Hardie
Agenda: http://www.ietf.org/proceedings/84/agenda/agenda-84-dsii
Presentations: https://datatracker.ietf.org/meeting/84/materials.html#DSII
Recordings: http://www.ietf.org/audio/ietf84/ietf84-regencyc-20120731-1520-pm2.mp3 , http://connect.iu.edu/p2hoj2awzs5/
Note taker: Robert Ping


Ted Hardie welcomed the group and went through the Note Well statement for the IETF, reiterating that this BoF was not intended to form a working group.


Beth Plale then review the conceptual framework for work in this area, discussing both the framework for scientific data sets and the key role played by associating metadata with the generated data.  Core issues for data sets and their identifiers are: discovery, data access, access control, logical arrangement, governance, distribution models, costs, relationship interoperability and service interoperability.  The issues raised are particularly problematic for long-tail data, where the available funds and effort available to curate the data may be low.




 The group then reviewed several current Data Set ID Systems (please see slides) - 


      EZID - Janee
      Handle System - Lannom
      EPIC - Wittenberg - CLARIN - EUDAT
      NI URI scheme - Farrell


 Discussion of current systems - Plale


 Question of Data and Data Sharing in terms of Earth Sciences areas: adoptions of DOIs is common, but others are creating their own or winging it. Being able to do discovery on top of this will take some agreement.  Key question is:   Are we at a pain point where we can get some agreement?  Should we get agreement on information types and use that to create larger platform?  Do we need something like the IETF to get this going?  How do we collaborate?
Also note that there are commercial uses - create a collection to stream from a cache on
 the network - include audio/video/close captioning/ads - each potentially with different data identifier types.  We don't want to do this manually.


           Scott Bradner - Added as comment - Discussion this should include  localization - I want to get copy that is correct for me - harvard may have local copy - need that copy vs IU copy.  That’s a  pretty powerful aspect of this. 


           Leif Johannson - Another point - so what is the end game look like
 for succes for this? Do we pick a winner or does it remain  a little of this and a little
 of that ? (Chairs reply that this is not “pick a winner”).


           John Levine - email and malware use management is a potential use case; they keep large files  of spam (he also asked a question about DOIs resolving to a document)


           Melinda Shore - Are we talking about standardizing metadata or
 search interfaces? Not sure what is being asked in this discussion - how
 does IETF fit into this? - what part of squishy whole could have things to be
 done right now - mappings between existing systems - not defnitive metadata
 set but a way to map them together with some kind of registry - is a
 registry of those things a minimal success story - what other discovery or
 indirection would then be possible?


           Single comment -Need to get very specific about  vocabulary on this issue.


           Andy Buffet - Woods Hole - clearly identified as interop problem, and we have techniques/ approaches for interop.  We can that add into the work that has been going on some time in the sciences.  There is a limited number of pid schemces for  datasets - an api for dataset identifiers, as ezid already has, is useful.  When it becomes domain specific there is lots to be done. 


      Comments from Nasseed Usar - UNC Chapel-Hill


 Interoperability considerations - Hardie


The group discussed interoperability mechanisms briefly.  The chairs concluded by thanking the group for the days discussion and asked folks to continue the discussion on the list.