RE: [Uri-review] Request for review

On 7/5/2006, Roy T. Fielding <fielding@gbiv.com> wrote:

>On Jul 5, 2006, at 5:45 PM, Andrey Shur wrote:
>> On 7/5/2006, Roy T. Fielding wrote:
>>> This entire design space was covered by the discussions regarding
>>> MHTML
>>> (encapsulating packages of HTML + inline resources via email)
>>> almost ten
>>> years ago, and the same techniques apply here.  The packaging format
>>> needs to contain a catalog table of "package part" --> "real URI" and
>>> treat the package itself as a local cache.  All references to
>>> those URIs
>>> are resolved by the package manager to the part given in the catalog.
>>
>> This is also the case for the packaging technology Microsoft is
>> coming up with. Saying this I assume that a catalog table does not
>> necessarily needs to be persisted in actual bits within the
>> package. It could be virtual as long as it is well-defined. The
>> "pack:" Uri schema establishes unambiguous rules for "package part"
>> --> "real Uri" table.

>No, that is the opposite of a catalog.  The purpose of an embedded
>catalog is to create an environment in which all the parts should be >interpreted regardless of how they were obtained.  For example, an archive >that represents a complete tree of resources at a given time X can be >browsed just like the original.

This is correct: we do not have such embedded catalog in our packages. The scenario you've mention was not requested by our customers as a target one. Many of our target scenarios imply the package creation from in-memory objects, not from the tree of resources.

>In addition, the package format defines another
>set of URIs, which can be relative to the base URI of the package (e.g.,
>"./part1"), to form a virtual identifier space wherein each part is
>treated as an individual resource with its own media type (e.g.,
>"text/html").  All of this can be defined by the media type
>specification(s) for the package format.

We have this feature in our packaging technology. In a package each part is uniquely identified by the "part name" which is a relative Uri that conforms to specific pattern within the Uri grammar. In addition each part has associated media type.

>In contrast, creating a new identifier for "pack" duplicates all of URI
>space and ties the identifier to a specific media type.

This is correct: "pack" Uri indeed ties the identifier to the family of media types. Doing this we are consistent with the Guidelines for new URL Schemes (see RFC 2718, section 2.2.3 "Definition of non-protocol URL schemes").

>Placing metadata within the URI is wrong because it makes references >brittle and undermines future extensibility. What happens, for example, >when the pack format is changed?  Do you deploy yet another URI scheme for >the new format?

Extensibility is a fair concern applicable not only to the "pack" scheme, but to the entire packaging as a framework for various media types. Changing package format is an extremely costly action. In fact, shortly after we ship Vista and Office 2007, millions of circulating documents (Word, Excel, PowerPoint, XPS) will be package-based.

As for the "pack" URI we put a lot of efforts to simplify its design and minimize the possibility of it being impacted by unlikely package format changes. For instance, the way we define decoding of the authority component is based purely on RFC 3986 and has zero dependencies on packaging format.

>>> Each part has a defined base URI, usually its original URI at the
>>> time
>>> the package was generated.  The individual parts do not need to be
>>> altered in any way (a requirement for digital sigs).  It also defines
>>> a set of new URIs by treating the individual part names as
>>> hierarchical
>>> identifiers within the package (i.e., a folder containing those
>>> parts).
>>
>>> The above design is implemented in the media type handler for the
>>> package type by inserting a local cache handler and associating each
>>> part with its corresponding base URI.  Any browser supporting the
>>> package
>>> type (internally or via plug-in) will require special code to do so,
>>> regardless of how references are handled.  The rest of the browser is
>>> generic, and the package can be served from any source.
>>
>> The substantial feature of the packaging technology is its purpose
>> to be the framework for multiple media types. It is already the
>> foundation for MS Office 2007 files, and Windows .XPS files.
>> Microsoft has solid plans to extend the line of media-types based
>> on this packaging within and outside of the company.

>There is nothing preventing multiple media types (profiles) from
>being based on a single packaging definition.  There are many media types  >based on XML. It doesn't change the design at all.

This is true. I like the analogy. For all these XML-based media types there are applicable media-type-agnostic XML techniques, like for example XPath expressions.

>> Therefore we made a special effort to move the packaging addressing
>> model outside of the particular media type(s) handling. That is why
>> we picked URI schema as a mechanism we can support across the open
>> family of package-based media types.

>The "addressing model" is URIs. If you define a special "pack" URI,
>then you are creating an addressing mechanism specific to one media type.
>If you do what I explained, all URI schemes will work fine.

Packaging technology does not define media type of a package in itself; file formats built on top of the package have media types. Given that I would say: defining a special "pack" URI, we are creating an addressing mechanism specific to the family of package-based media types.

>> More information on this can be found in MSDN article at http://
>> msdn.microsoft.com/windowsvista/default.aspx?pull=/library/en-us/
>> dnlong/html/opcadmdl.asp

>Yeah, I know, and you will have to throw out that work and start over.
>Microsoft will be embarrassed again if they ship an implementation of
>a proprietary URI scheme that directly contradicts the Web architecture.

>>> Creating a "pack" media-type-specific URI that encapsulates all other
>>> URIs is a terrible idea, especially when it merely applies media-type
>>> instructions on how a browser should handle embedded resources.
>>> It would invert the current relation between identifiers and media
>>> types,
>>> which is a gross violation of web architecture.  I will never
>>> implement
>>> such a beast.
>>
>> I'm a bit confused by the term "terrible idea" you've used to
>> denote the "pack" Uri. In fact when the "pack" Uri is supported,
>> browser navigates to the part without knowing the mime-type of the
>> package file. Part does not appear as an embedded resource for the
>> browser, but rather as a regular resource identified by a Uri. The
>> mime-type of the part is returned to caller, so browser can run the
>> appropriate media type handler for the part.

>The browser can't "navigate to the part" without knowing the media type
>of the package or having an intermediary to do so for them.  If the
>browser must receive the entire package prior to obtaining the part, then >it will know the package media type and its own implementation will be >doing the "navigating" within the package.  If some intermediary, such as >an HTTP server, is doing the "navigating" within each package and simply
>providing an identifier for each part, then the browser is going to be
>using the URI scheme for that intermediary -- not pack.  The latter is
>what Mark was describing for "http" URIs.

This is correct: for navigating to the part the intermediary is required. As one of our design goals was to avoid any server code changes, the intermediary we use is the client code that handles "pack" URIs. (more specific: managed PackWebRequest(), which we ship with Vista, and native pluggable protocol - URL moniker for "pack" URI - which is in our nearest post-Vista plans).

Intermediary takes a "pack" Uri, decodes the authority component and retrieves the package file; then it takes the path component and uses it to get the package part. The thin place here is that the media type of the retrieved file may not tell the intermediary identically that this is a package. This is the cost of being a framework - we cannot list the complete set of media types that are packages.

There are a few improvements which enable intermediary to get the part before the entire package is loaded but they are not essential in the context of our discussion.

Browsing application calls the intermediary and gets back the part, staying unaware about the package file media type and even about the protocol (http, ftp) used to retrieve the package file.

>> Clearly the pattern of the "pack" Uri, implied by its purpose to be
>> the foundation of the addressing model for the family of package-
>> based media types, is to some extent unusual. We have done a lot of
>> usability and security analysis before coming up with the current
>> design. If you have in mind scenarios which make the "pack" Uri
>> experience "terrible", please let us know.

>As I said, review the MHTML discussions of ten years ago.  I am sure
>they are archived somewhere.  The scenario you describe is not "unusual"
>at all.
>The same applies to all compound document formats.  What is unusual
>is the suggested solution of a "pack" URI that inverts the entire >addressing space to satisfy a single media type.  That simply isn't done >because it is contrary to the way the Web has been designed to work.

Strictly speaking, the inversion of the addressing space happens quite often by URIs which use query component for redirection. But unlike "pack" URI you would not claim that such URIs are contrary to the Web design.
Let me ask you what do you consider as significant difference here: the fact that inversion is implied by the scheme, the fact that inversion happens on the client side, or something else?

Obviously the fact that inversion is implied by the scheme grammar is an unusual solution. And this was clearly understood from the very beginning of the design work. Although it is unusual, it does not violate any normative Web architecture guideline.

>    http://www.w3.org/TR/webarch/#URI-scheme
>    http://www.w3.org/TR/webarch/#orthogonal-specs
>    http://www.w3.org/2001/tag/doc/mime-respect.html#external
>    http://www.w3.org/2001/tag/doc/metaDataInURI-31

_______________________________________________
Uri-review mailing list
Uri-review@ietf.org
https://www1.ietf.org/mailman/listinfo/uri-review