[arcmedia] [DISCUSS] archive fragment identifiers

Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk> Mon, 29 June 2015 12:32 UTC

Return-Path: <stian@mygrid.org.uk>
X-Original-To: arcmedia@ietfa.amsl.com
Delivered-To: arcmedia@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com []) by ietfa.amsl.com (Postfix) with ESMTP id 24BEA1A90C6 for <arcmedia@ietfa.amsl.com>; Mon, 29 Jun 2015 05:32:41 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.421
X-Spam-Level: *
X-Spam-Status: No, score=1.421 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FM_FORGED_GMAIL=0.622, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id th3vFiHTyZFs for <arcmedia@ietfa.amsl.com>; Mon, 29 Jun 2015 05:32:40 -0700 (PDT)
Received: from mail-oi0-x233.google.com (mail-oi0-x233.google.com [IPv6:2607:f8b0:4003:c06::233]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D54281A90BE for <arcmedia@ietf.org>; Mon, 29 Jun 2015 05:32:39 -0700 (PDT)
Received: by oigx81 with SMTP id x81so116578562oig.1 for <arcmedia@ietf.org>; Mon, 29 Jun 2015 05:32:39 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mygrid.org.uk; s=google; h=mime-version:sender:from:date:message-id:subject:to:content-type; bh=6/+0zRP9U04rzlVXU53wz8kZBneVzjBW6s9VS/9GEVQ=; b=MskuHmoCi99SBB5of1tebRgWw/oqoGpsaC+wU5SWwuKZFw+vi0L9RFJw/v+zGIzRhz rF5TRMUmsVvjbepFFwzUEyiQy3qJQV7gx0qwMmAtbOFRftl04K699G0/Wql3ih2yFjCt zWkYRqx2Tx1dbkgzYp1vyll9ACZ1eTRvnqhH0=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:sender:from:date:message-id:subject :to:content-type; bh=6/+0zRP9U04rzlVXU53wz8kZBneVzjBW6s9VS/9GEVQ=; b=MVIlkto+Mrk5E8kJ6gi6/DgO+DtqUvXV0EqfAghIjSyIKkCMZoA8xneRPXowbjque/ mAmrSpnwvCh6HymxPkelQ5ltExfIV5aFCTLfxbUAJw9KYwkqrMAKyUaxewcbcDpC/AMF R57M0bZ3VQWX0dZesN/cx3k0xzLg92KpgE3gy4vWIPhYqBYY1stqAS5aY94cbPhhCY2+ oSoKQZaYXrTozE+ig/YOYN9zuHlgRg1m6bHZwuu5BqYYCFTllc4m2aP+blwr+2Di52iP c3ynu5PKaQj1Qz3VCtzO8fA0vuhaP/YXKDVDaLKew/QxzIDU63kAZ78JvxH8NlSaORBj x0Xw==
X-Gm-Message-State: ALoCoQloq2elok/aOghnzSZpB65KhC+NjDzozS2k2GCTgKWcXgX3koSBfdTAmutj8mZnN69+GZz0
X-Received: by with SMTP id h65mr13136528oif.1.1435581159275; Mon, 29 Jun 2015 05:32:39 -0700 (PDT)
MIME-Version: 1.0
Sender: stian@mygrid.org.uk
Received: by with HTTP; Mon, 29 Jun 2015 05:32:19 -0700 (PDT)
From: Stian Soiland-Reyes <soiland-reyes@cs.manchester.ac.uk>
Date: Mon, 29 Jun 2015 13:32:19 +0100
X-Google-Sender-Auth: Ti-C9E9FpQb97TXXEZotm2iZDgI
Message-ID: <CAPRnXtmQQKWP3HBiD=yuPP3qxeceqg3m7X5UmVAbmef=4mmmsw@mail.gmail.com>
To: arcmedia@ietf.org
Content-Type: text/plain; charset=UTF-8
Archived-At: <http://mailarchive.ietf.org/arch/msg/arcmedia/XkoRClinLVDAo19UnaJfokVWExQ>
Subject: [arcmedia] [DISCUSS] archive fragment identifiers
X-BeenThere: arcmedia@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion of creating a new top-level media type, \"archive\", for archive bundles." <arcmedia.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/arcmedia>, <mailto:arcmedia-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/arcmedia/>
List-Post: <mailto:arcmedia@ietf.org>
List-Help: <mailto:arcmedia-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/arcmedia>, <mailto:arcmedia-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 29 Jun 2015 12:32:41 -0000


was added as my suggested text for fragment identifiers.

As i said, to foster discussion it is a bit 'brave' in that it is not
the usual "Nothing defined, we don't care" - but defines a standard
behaviour for any fragment identifiers that start with #/ - leaving
anything else open for individual registrations.

This mean that for archive/file or archive/* there would still be an
interpretation for fragment identifiers that would be very useful for
several purposes:

1) For semantic description of resources within any type of archive
2) For telling an archive tool to highlight a particular file
3) Giving a good default for registrations - e.g. compare with
application/zip which strangely do not define any fragment identifier.

*) ) But not their own subresources

Should fragment identifiers be exemplified always with the trailing #?
The current text is not consistent here. (The # is not formally part
of the URI fragment identifier - and fragments could be used outside

q2: Should registration be allowed to specify that they do NOT support
#/* ? I am not sure of any use-cases - say a dmg file without any
readable file system.  The question here is if those use cases really
should be archive/* at all.

q3: What to call the #/ Pattern? I called it "Resource Path", but it
is a bit unclear still that the rest of the paragraph only specifies
the #/ pattern rules.

q4: case (in)sensitivity might need to be specified - e.g. preferred lowercase?

q5: Is the encoding paragraph precise enough?
(I want to mandate UTF8 where possible to make it IRI-compatible --
which means there could be many archives (zip, tar) where you only
know the bytes of a filename and have no idea about which file name
encoding was used on the system that made it -- there could even be
archives with mixed encoding..  registrations could however use the
non-/ fragments to access these)

Any other thoughts?

Stian Soiland-Reyes, eScience Lab
School of Computer Science
The University of Manchester
http://soiland-reyes.com/stian/work/    http://orcid.org/0000-0001-9842-9718