Data Set Identifier Interoperablity BoF, IETF 84 Chairs: Beth Plale, Ted Hardie Agenda: http://www.ietf.org/proceedings/84/agenda/agenda-84-dsii Presentations: https://datatracker.ietf.org/meeting/84/materials.html#DSII Recordings: http://www.ietf.org/audio/ietf84/ietf84-regencyc-20120731-1520-pm2.mp3 , http://connect.iu.edu/p2hoj2awzs5/ Note taker: Robert Ping Ted Hardie welcomed the group and went through the Note Well statement for the IETF, reiterating that this BoF was not intended to form a working group. Beth Plale then review the conceptual framework for work in this area, discussing both the framework for scientific data sets and the key role played by associating metadata with the generated data. Core issues for data sets and their identifiers are: discovery, data access, access control, logical arrangement, governance, distribution models, costs, relationship interoperability and service interoperability. The issues raised are particularly problematic for long-tail data, where the available funds and effort available to curate the data may be low. The group then reviewed several current Data Set ID Systems (please see slides) - EZID - Janee Handle System - Lannom EPIC - Wittenberg - CLARIN - EUDAT NI URI scheme - Farrell Discussion of current systems - Plale Question of Data and Data Sharing in terms of Earth Sciences areas: adoptions of DOIs is common, but others are creating their own or winging it. Being able to do discovery on top of this will take some agreement. Key question is: Are we at a pain point where we can get some agreement? Should we get agreement on information types and use that to create larger platform? Do we need something like the IETF to get this going? How do we collaborate? Also note that there are commercial uses - create a collection to stream from a cache on the network - include audio/video/close captioning/ads - each potentially with different data identifier types. We don't want to do this manually. Scott Bradner - Added as comment - Discussion this should include localization - I want to get copy that is correct for me - harvard may have local copy - need that copy vs IU copy. That’s a pretty powerful aspect of this. Leif Johannson - Another point - so what is the end game look like for succes for this? Do we pick a winner or does it remain a little of this and a little of that ? (Chairs reply that this is not “pick a winner”). John Levine - email and malware use management is a potential use case; they keep large files of spam (he also asked a question about DOIs resolving to a document) Melinda Shore - Are we talking about standardizing metadata or search interfaces? Not sure what is being asked in this discussion - how does IETF fit into this? - what part of squishy whole could have things to be done right now - mappings between existing systems - not defnitive metadata set but a way to map them together with some kind of registry - is a registry of those things a minimal success story - what other discovery or indirection would then be possible? Single comment -Need to get very specific about vocabulary on this issue. Andy Buffet - Woods Hole - clearly identified as interop problem, and we have techniques/ approaches for interop. We can that add into the work that has been going on some time in the sciences. There is a limited number of pid schemces for datasets - an api for dataset identifiers, as ezid already has, is useful. When it becomes domain specific there is lots to be done. Comments from Nasseed Usar - UNC Chapel-Hill Interoperability considerations - Hardie The group discussed interoperability mechanisms briefly. The chairs concluded by thanking the group for the days discussion and asked folks to continue the discussion on the list.