Re: [dsii] Potential IETF Work Items

This is an important conversation.  The issues with data identification are surfacing in science because science data is at multiple levels of granularity (e.g., national, state, metro area, street) and giving proper credit to data creators is of burgeoning important in the sciences.    Commercial video can have issues of granularity but once copyright issues are resolved, ownership is clear.   The issue of ownership/attribution is driving the urgency to come up with solutions to the data set identifier problem.

I see interoperability across ID schemes as something that IETF can help us think about and propose a solution to.  We're not going to accomplish much by trying to mandate a single ID scheme, not with several already in existence and with good adoption.  Ted and Andrew identified this problem as well.   I wished we'd had more time to discuss interoperability at the BOF.  

I like the connections Andrew made to work going on in other IETF groups.  That shows hope that there's existing expertise from which we can draw.

I see this topic as cloud-agnostic.  Clouds are heavily researched and used in academia; identifiers would describe data sets wherever they "live", and clouds are likely to be or already heavily used for replication (caching). 

The library folks bring a lot to the table, but are a subset of those interested in this topic.  Leif's remark  libraries are increasingly operated by content providers (who are at IETF) is a strong tie. 

Finally, Andrew's suggested 3 options for engagement (copied below) are very good.  

On Aug 14, 2012, at 12:35 PM, Andrew Maffei wrote:
> 
> Three options for engagement seem worthwhile considering:
> 
> 1. More dsii-interested folks currently outside IETF could start participating
> in WGs w cross-cutting interests, once they are identified.
> 
> 2. More IETF'ers could be engaged to participate in current dsii
> initiatives outside the IETF and be offered a platform from which
> an IETF perspective can be heard. ("Big Data" seems to be getting
> big these days for better or worse).
> 
> 3. A dsii working-group might someday be formed within IETF.
> 
> I think that the first 2 options are pre-requisites for the 3rd so
> that we can gain familiarity with each others use-cases and cultures
> and thus lower the risk of a "bad start". As I have gotten older I
> have learned how important "good starts" are to initiatives.

best

beth

: Beth Plale
: Director, Data to Insight Center
: Professor of Computer Science
: Indiana University Bloomington

: Beth Plale
: Director, Data to Insight Center
: Professor of Computer Science
: Indiana University Bloomington

On Aug 22, 2012, at 10:08 AM, Andrew Maffei wrote:

> On Aug 20, 2012, at 3:32 PM, Melinda Shore wrote:
> 
>> I'm still trying to figure out what's being proposed here and I
>> realized that my mental model might be considerably different from
>> that being used by the work's proponents.  Where I'm coming from,
>> someone who needs a chunk of data and isn't sure where it is (or,
>> in some cases, whether or not it exists) does a search, and the
>> search returns a set of stuff, where "stuff" includes descriptive
>> information (metadata) and an identifier that's actually a
>> locator.  The locator is used to access the data.
>> 
>> Is that consistent with what proponents have in mind?
> 
> Hi Melinda.
> 
> The above is the primary use case. I think the "stuff" is all metadata (attribute/value pairs about the dataset) that includes the "locator" you mention. 
> 
> I'd like to comment on some of the things I saw of value at the Vancouver meeting. I don't claim to be an identifier or metadata expert so perhaps some of these ideas were derived outside of IETF. But they were new to me.
> 
> One idea would be to consider working together to agree on the "core metadata" that would be returned about scientific datasets for data object access and delivery, etc.
> 
> One of the more interesting IETF WG docs I found was the CDN Interconnet Metadata i-d (draft-cjlmw-cdni-metadata-00). The sections on ACLs, ACLRules, Delivery seemed directly applicable to delivery of science datasets, some of which are proprietary and some of which are not. There are all sorts of issues related to delivery of proprietary scientific data and very large datasets (or their subsets) that seem applicable.
> 
> I noticed in another I-D (can't find it right now) the practice of allowing attribute values of type "URI" being either an explicit, fully qualified URI *or* a regular-expression substitution that can be applied to a previously defined URI attribute.
> 
> So, for example, if the URI for my identity as a WHOI employee was "http://www.whoi.edu/1912/241.11" the URI for a picture of me might be specified in the metadata associated with this URI as "s/$/.jpg/", indicating that adding .jpg to the end of the locator URI derives a picture of the person.
> 
> Another example might be a way to define the way to express a substitution string for receiving metadata about a timeslice of a video that is pointed to by a locator for a scientific data object of type "Video". If the orginal locator was http://www.whoi.edu/1912/2342.234 there might be metadata that declares how to modify this URI to one that specifies a start time and end time.
> 
> I'm interested in finding "lessons-learned" by the IETF that it would be worth considering in the realm of dataset identifier interoperability. Information in the I-Ds represent hours and hours of discussion/argument and trial in past meetings about what works and what does not work. 
> 
> It would be a shame if we could not find some way to take advantage of this work done in the past to help with datset identifiers and certain types of the metadata that would sit behind them.
> 
> --Andy
> 
> _______________________________________________
> dsii mailing list
> dsii@ietf.org
> https://www.ietf.org/mailman/listinfo/dsii