[Wpack] Counter-proposal where bundles only contain a single origin

Jeffrey Yasskin <jyasskin@google.com> Wed, 26 August 2020 18:00 UTC

Return-Path: <jyasskin@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 56B683A1955 for <wpack@ietfa.amsl.com>; Wed, 26 Aug 2020 11:00:50 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.599
X-Spam-Level:
X-Spam-Status: No, score=-17.599 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qBE2qXZOs1oJ for <wpack@ietfa.amsl.com>; Wed, 26 Aug 2020 11:00:48 -0700 (PDT)
Received: from mail-qk1-x730.google.com (mail-qk1-x730.google.com [IPv6:2607:f8b0:4864:20::730]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D3BDA3A1953 for <wpack@ietf.org>; Wed, 26 Aug 2020 11:00:47 -0700 (PDT)
Received: by mail-qk1-x730.google.com with SMTP id p4so2871254qkf.0 for <wpack@ietf.org>; Wed, 26 Aug 2020 11:00:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc; bh=rVHMLGeBv1WBtW6TW+iHFjqsdq+MUIClvsTmtV7Ptug=; b=vYBzNO3mIFE3RE0F/Z7VxYINd5pY4R0kHRyh4chM8LCSSvUxfCPmd/GJqjqEzsRgT0 h2YA2RgAojw1OIuceZYMXh1abrbIN80Lj5nwb8etU+4kw6/J8TjkEKEErnWYKq4t0LXZ Po5fpcMNpYhPRKONmTNh3sT3sgGVlBB2gq7fJGeOw55T9ypdnddUM5uH0hB2JBToM5Ld 15GPetxIWUzDnbP7XH3B5RZoXr3mEGJG1E4GpYk1eoKUnIDkiAv+XG8ZYkzAl3ok9WK8 YPKlhG1OT5XHF1fUaKUOJgwcBxdrUREUUNqDac1IdchGWEi8ov9b1MhtDuR516QVxg9A V5iA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=rVHMLGeBv1WBtW6TW+iHFjqsdq+MUIClvsTmtV7Ptug=; b=k6FKcXuuyf/UGpA+9enI4yJ5OTXxFARNUyUz6oEdRY04PDW3iwuhAIU+/ASUhxZJXb +HJv+RGbOO9lJuBHKyVi2S0zAG20Mk0BVglu08InTL7mvP9NTO0fEqoT3AH27EO5qPfE 7T/cK3tMdkvbNTLv12ePtD3QvKucCXaoVsYl/aT7PHItWwk1p1ICI49zZMap4v2bpig5 dK3EgVFhtZgmEVOTSDYGI30iGftfRZEY3vFVU7hqdMpeUiHS/D+QQe1zug2QNlOEIJ5W Wc2c+wmxP3//+PzaKRMX13WEhkk0t3CAQV/4f4HWRXf+bWosQ7ImTrk31N+Zwgl8N7GE ebkw==
X-Gm-Message-State: AOAM531DymFru4SJ08ypHB6Ge6pbFuhEnJnGiV1JLkAOs8ztGuafbXFR dfDV3AfOyuVUCiUZkjfEzQ5QCgfKeS66Zv4Ii/4fXSnLcTGaEil5
X-Google-Smtp-Source: ABdhPJxkMaoeUhSBh9g0N26UsKQueZnFgH/e7solU+K10OmfJ04W11xFwrREa4XtI75PDvm+lyM73jv5gvNWRIS1xCY=
X-Received: by 2002:a37:b502:: with SMTP id e2mr15906445qkf.144.1598464846007; Wed, 26 Aug 2020 11:00:46 -0700 (PDT)
MIME-Version: 1.0
From: Jeffrey Yasskin <jyasskin@google.com>
Date: Wed, 26 Aug 2020 11:00:34 -0700
Message-ID: <CANh-dXkC8i6F1gxoD6nTJ4bp=7TVyy1fcN3v1vurj6h4+cZqiQ@mail.gmail.com>
To: WPACK List <wpack@ietf.org>
Cc: Martin Thomson <mt@mozilla.com>
Content-Type: multipart/alternative; boundary="00000000000055a05a05adcb9952"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/8fFVJv0AIksODEha8iyJrVDvOek>
Subject: [Wpack] Counter-proposal where bundles only contain a single origin
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 26 Aug 2020 18:00:50 -0000

I worked with Martin to flesh out his ideas for simplifying the bundle URL
and origin design, and we came up with the following. I still prefer the
proposal I presented a couple weeks ago
<https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md#proposal-a-package-scheme>,
based on the tradeoffs down at the bottom here, but it's quite possible
I've missed things. I'd appreciate this group's input on which way we
should go.

The basic idea here is that:

* Each bundle defines a single origin.
* Bundles identify their contents with paths, not full URLs.
* Metadata can provide a base URL to resolve those paths against.
* We allow a single file to contain multiple origins using nested bundles.
Use cases:
   * Users should be able to share subsets of the web that can link within
themselves, in a way that different sites within those subsets don't have
their storage collide.
   * One request for ads should be able to return the contents of multiple
iframes in a way that those contents can't modify each other.


# URLs for nested bundle resources

```
package://distributor.example/bundle.wbn$app/foo.wbn$bar.html
package:///c:/Users/name/Downloads/bundle.wbn$app/foo.wbn$bar.html
```

This fetches the outer bundle by removing everything after the first `$`
and:

* If the URL has an authority, replacing the `package:` with `https:`
* If the URL doesn't have an authority, replacing the `package:` with
`file:`.

To allow the outer bundle to be fetched using a third scheme, we would need
to add a matching new scheme for addressing inside it.

Subsequently, each segment separated by `$`s is the path to look up in a
nested bundle. Any `$`s inside a segment are percent-encoded.


# Origins for nested bundle resources

The origin of one of these holds the information from the URL up to the
last `$`. So for

```
package://distributor.example/bundle.wbn$app/foo.wbn$bar.html
```

* scheme: package:
* host: distributor.example/bundle.wbn$app/foo.wbn
* port: null
* domain: null

We could also hold the part after the first `/` in a new component, similar
to Gecko's OriginAttributes
<https://wiki.mozilla.org/Security/Contextual_Identity_Project/Containers#An_extended_origin>,
but OriginAttributes are mostly undocumented, and origins define "opaque
hosts <https://url.spec.whatwg.org/#opaque-host>" to encode this sort of
information into the existing field.


# Navigating across nested bundles

Within something like El Paquete Semanal or the Web Archive, it's
straightforward to store each origin in separate nested bundles, but we
want links from one origin to another to also work within the same
top-level bundle.

Addressing something in a sibling or parent bundle is reasonably
straightforward, with something similar to the following syntax:

```
<a href="package:..$/https/other.site.example.wbn$/path.html">
```

The downside is that if a user wants to take one site out of the big bundle
and use it on its own, and they want links outside that site to land on the
internet, they have to know the mapping from URLs to bundle names and undo
it in all the links. If they don't do this, the links are broken instead.

Similarly, if someone wants to compose a couple of pre-existing bundles,
they have to rewrite links to point to sibling bundles appropriately.


# Comparison

We need to compare "single-origin bundles", described above, with
"multi-origin bundles", described at
https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md#proposal-a-package-scheme
.

Both use a package:bundle-location$within-bundle format.

Because the single-origin proposal chooses not to encode the whole origin
to fit in the authority URL component, and implies rather than states the
bundle's scheme, I'll compare it to a variant of the multi-origin proposal
that does the same, yielding:

```
package://archive.example/2020-04-01.wbn$https://camera.example/edit.html
```

The difference becomes whether the origin computation includes an authority
component in the last $-delimited segment.


## Single-origin bundles are better:

* Implementers can use a simpler algorithm to compute the origin, ending at
the last $ instead of having to also parse a URL from the last component.

* Including just a single origin may avoid the need for signatures to
specify a subset of the bundle, which could simplify that section.

* Naming subresources with paths instead of URLs is more consistent with
other archiving formats.


## Multi-origin bundles are better:

* Users can expect simple tools to combine and split pre-existing bundles.

* When authors are composing a bundle, cross-origin resources can go
directly into the bundle, instead of needing to rewrite them to same-origin
or put them in a nested bundle.

* Implementations only need to spin up the bundle parser once, which could
affect performance.

* Implementers need to write and maintain tools to rewrite cross-origin
URLs when saving a bundle from a website