Re: [Wpack] package: URL scheme

Jeffrey Yasskin <jyasskin@chromium.org> Wed, 08 July 2020 22:47 UTC

Return-Path: <jyasskin@google.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 26D013A08AC for <wpack@ietfa.amsl.com>; Wed, 8 Jul 2020 15:47:27 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -9.25
X-Spam-Level:
X-Spam-Status: No, score=-9.25 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_DEF_SPF_WL=-7.5] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=chromium.org
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wQ58JwGQhzDt for <wpack@ietfa.amsl.com>; Wed, 8 Jul 2020 15:47:24 -0700 (PDT)
Received: from mail-qt1-x842.google.com (mail-qt1-x842.google.com [IPv6:2607:f8b0:4864:20::842]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 49D493A08AA for <wpack@ietf.org>; Wed, 8 Jul 2020 15:47:24 -0700 (PDT)
Received: by mail-qt1-x842.google.com with SMTP id b25so281273qto.2 for <wpack@ietf.org>; Wed, 08 Jul 2020 15:47:24 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gS5EQF6V8iM/Htyz506beZYPRD4kw0gVfR73XJWDCkk=; b=ZD5S4MgBc8l3eD2dODIJyd24OomIkTkiwVPu4xh/fi54XbsoDYUztjC/5PybuCtLzy vfdj1kVhKiOcEiFAGGRgqdFoUd0FLp8mtiJ5AUywrrW0GxJIsNuLcpbqNsmjwCdeEeZ9 2coiztnjdCcIj+n5jsaaERC4DbUJtByGzS3ZA=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gS5EQF6V8iM/Htyz506beZYPRD4kw0gVfR73XJWDCkk=; b=QTRDfE5FRJPo+sKOgPhPVEoL7Dle/hR5SVKsM9R4NQWet/5fAN3LJ43XF2eRHTnXwy Fyn47CUtdgrg4uSfaa2NO4mKvGV7v+kP4K+Ym7lmcbX5Nm7gQ7HgvY7gcUOZUnSwx3Fb l/Vi25H0tt+Bt3WVr/hHqyIqSypCacr7h+FKtl7F+mXKEF7yOPQFgkHlpu+HJTh3X7AH WNVEXTBYQAjs+daO6u+wrkNEOr79euc9WPhnC3PKV3gl+9a2PktZjOm9I4Hl3jPZgOJI 0A04kRmjE6xnISbEheZUFZtb5tUPbbjIffttwekOWGzWlKE6AxXbru2FNR/fJAXj5oIy tZ8A==
X-Gm-Message-State: AOAM531zOalBdXHwuEg02pFlUO/MVDsCI+9YwDWJrQ94VSxFjJXCVrq/ /GqW9Js9xSgz8+f4rnk5m4jKzEOK2qmUQxTqvJzzGQ==
X-Google-Smtp-Source: ABdhPJzirn76Bj1UZQg1flSAyaXfU+2SvCT5aP6qLT6ILtUGOtIiJ0e/HehnbwvPm4yPX//LZuxYbLKZv6drOSnAMO0=
X-Received: by 2002:ac8:371c:: with SMTP id o28mr10008803qtb.153.1594248442575; Wed, 08 Jul 2020 15:47:22 -0700 (PDT)
MIME-Version: 1.0
References: <CANh-dXndPaue3zAADhpc+wyNb8dxs=nVKOAp1n=6SMCKoUe=eQ@mail.gmail.com> <97bcac95-c220-41ae-b957-d93fc57f4a74@www.fastmail.com> <CANh-dXkXnvi+1YK-+CjPaiiN9VhAecLEEjpever7D-gVB-sN0A@mail.gmail.com> <c52ed6da-a7fa-4802-8923-d9782f498daf@www.fastmail.com> <CANh-dXmoWarbW=9wy1=t6rcFh2T8ph7LhhBg5aZ_6auLSYp7-w@mail.gmail.com>
In-Reply-To: <CANh-dXmoWarbW=9wy1=t6rcFh2T8ph7LhhBg5aZ_6auLSYp7-w@mail.gmail.com>
From: Jeffrey Yasskin <jyasskin@chromium.org>
Date: Wed, 08 Jul 2020 15:47:11 -0700
Message-ID: <CANh-dXnBSpOdCEt7s2tb8f441QraXHGpA1QLb-yRzurSj27GEg@mail.gmail.com>
To: Jeffrey Yasskin <jyasskin@chromium.org>
Cc: Martin Thomson <mt@lowentropy.net>, mknodel@cdt.org, WPACK List <wpack@ietf.org>
Content-Type: multipart/alternative; boundary="0000000000001af3ba05a9f5e4e1"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/iEUT-102ThCcoUXnf3OezMIMRig>
Subject: Re: [Wpack] package: URL scheme
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 08 Jul 2020 22:47:27 -0000

I've put an overall discussion of the new scheme and the plausible
alternatives at
https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md.
The URL encoding variants section
<https://github.com/WICG/webpackage/blob/master/explainers/bundle-urls-and-origins.md#url-encoding-variants>
discusses
several of the alternatives that have come up in this thread, along with
their downsides. We can definitely switch to one of them (or another new
option) if folks think the tradeoffs are worth it.

Jeffrey

On Mon, Jun 15, 2020 at 11:16 AM Jeffrey Yasskin <jyasskin@chromium.org>
wrote:

> Replies inline.
>
> On Sun, Jun 14, 2020 at 6:38 PM Martin Thomson <mt@lowentropy.net> wrote:
>
>> On Sat, Jun 13, 2020, at 09:07, Jeffrey Yasskin wrote:
>> > 1. https://foo.example/page.html
>> > 2. https://bar.example/page.html
>> > 3. A resource in the bundle at https://foo.example/bundle.wbn named
>> > https://bar.example/page.html.
>> > 4. A resource in the bundle at https://foo.example/bundle.wbn named
>> > https://quux.example/page.html.
>> > 5. A resource in the bundle at https://foo.example/otherbundle.wbn
>> > named https://bar.example/page.html.
>> >
>> > If we say (1) and (3) get the same origin, it means the result can't
>> > help the Internet Archive serve their pages more safely, and we force
>> > (3), (4), and (5) to have the same origins.
>>
>> Yes, I think that having 1 and 3 being distinct is an important
>> property.  I tend to stop at this point and not try to add too much more,
>> but there you go.
>>
>
> I think distinguishing (1) and (3) requires a new scheme.
>
> > If we say (2) and (3) get the same origin, we break the entire web
>> > origin security model. :-D
>>
>> (1) and (2) more so.
>>
>> Part of what we're trying to do is find a way to bring (2) and (3) closer
>> (and (5) too). That doesn't necessarily need to mean that we treat them
>> identically.  Though we might find a path to doing so.
>>
>
> I think the signing or transfer effort described
> in draft-thomson-wpack-content-origin
> and draft-yasskin-http-origin-signed-responses tries to get (2) and (3)
> closer together, but here I'm focusing on use cases that don't need the
> browser to recognize (2) and (3) as related at all.
>
> > If we say (3) and (4) get the same origin, it means that El Paquete
>> > Semanal couldn't safely put multiple websites into the same bundle
>> > without risking them stepping on each other's storage. It could put
>> > them in separate files, but then they'd have trouble linking to each
>> > other.
>>
>> What cannot be solved with one level of abstraction might be solved with
>> two.
>>
>> If the distinctive characteristic of a bundle is that it creates a new
>> origin, then a bundle that contains other bundles can be used to isolate
>> content from different origins.  The El Paquete Semanal as a whole could be
>> assembled from multiple bundles from different "real" origins.
>>
>> As you say, that complicates the process of inter-origin references.  Say
>> the El Paquete Semanal bundle was served by non-determined means, then you
>> can't rely on having a shared identifier for the outer bundle, unless you
>> have some means of minting one.  (This is a case where you can't refer to
>> the outer bundle from the inner one using content identification, for
>> reasons that I hope are obvious.)  So you mint one (a signing key might
>> suffice) and then you can refer from one inner bundle to another inner
>> bundle via that identifier.
>>
>> The reference chain might become convoluted, but I don't see a need to
>> constrain URL design to fit that use case.  As long as references are
>> possible, then we can work on making it more usable.  If you have
>> siteA.example and siteB.example served from bundle.example from
>> distributor.example, then my hope would be that your inter-site references
>> are of the form:
>>
>> <something>://siteX.example/<a hint that mentions bundle.example and
>> maybe distributor.example, but the latter might be best left implied>
>>
>> I say that because while siteA and siteB might expect to be served from
>> the same meta-bundle, there shouldn't need to be a strong binding to that
>> situation.  If one is and the other not, then I would hope that the
>> identifiers would support that without requiring strong changes.
>>
>
> There's a downside that this still requires rewriting all the cross-origin
> links inside the documents, but as El Paquete Semanal is already rewriting
> links to put everything on a filesystem, maybe that's not a big problem.
>
> I'm curious what you think the hint could look like. You've sketched it in
> the path area, but these URLs also have paths, so we'd need something to
> separate the two parts of the path in the same way my first attempt
> separated two parts of the authority.
>
> > If we say (3) and (5) get the same origin, it means that if an archive
>> > stores multiple versions of the same website, but those versions use
>> > storage differently, users couldn't easily try more than one version.
>>
>> I think that this is a key question.  My assertion is that maintaining a
>> clean abstraction for bundles is important and that would dictate having
>> (3) and (5) distinct.  At least initially.  Mechanisms that allow transfer
>> of state or content between origins might allow an upgrade to occur from
>> (3) to (5).
>>
>> > Abandoning some of these use cases definitely makes the URL design
>> > easier, if there's consensus to go that direction.
>> >
>> > However, if we want to keep the use cases, and we want to put the
>> > bundle's server in the authority position of the URL, we get something
>> > like Larry's suggestion:
>> > pkg+
>> https://foo.example/bundle.wbn?query#https://bar.example/page.html?q=query%23fragment.
>> To give (3)-(5) distinct origins, the origin algorithm <
>> https://url.spec.whatwg.org/#origin> for pkg+https needs to take the
>> fragment into account, returning something like ("pkg+https",
>> "foo.example/bundle.wbn?query#https://bar.example", null, null). This
>> design also makes it possible to resolve relative URLs relative to a
>> pkg+https:// base URL, and it gives what's probably the wrong answer,
>> moving relative to the bundle instead of the active subresource. That's
>> probably ok: links inside a bundle need to explicitly search the bundle so
>> that absolute references search the bundle first.
>>
>> Rather than try to work the identity of the serving entity into the
>> origin, it might be better to look to the limits of the origin concept.
>> We're seeing browsers move more toward placing origins within a greater
>> context.  Double-keying storage by top-level browsing context shows us how
>> maybe the origin isn't able to capture the entire context.  The same might
>> apply here.  Maybe it is important that this bundle originally came from
>> foo.example, but it might not be important to the concept of the origin
>> model.
>>
>
> This is an interesting point: we could use the new storage keys
> <https://storage.spec.whatwg.org/#storage-keys> being created by
> https://github.com/privacycg/storage-partitioning to declare that an
> environment settings object with a bundle attached uses a storage key
> derived from both the bundle's URL and the subresource's URL, instead of
> trying to encapsulate both into a single origin.
>
> Relative references within a bundle should be simple and possible, but I
>> fear that expecting fragments to support that is likely to run afoul of all
>> sorts of existing expectations.  Using fragments for bundles is likely good
>> if you don't want to mint a new URI scheme, but if you are in the business
>> of defining a new scheme, then go for it.
>>
>> If you are defining a new media type, then the fragment can do whatever
>> you like.  With no new URI scheme needed:
>>
>> https://foo.example/bundle.wbn?query#<use whatever you feel like, but
>> don't expect internal references to work>
>>
>> Similarly, once you are defining a new URI scheme, then you have a lot of
>> options available.  The URI standard is perhaps unnecessarily narrow in its
>> definition of authority, but there's a lot of room for creative
>> interpretation: a registered name doesn't need to be domain name, and there
>> are no real mechanism that ensure that it is "registered" (whatever that
>> means).
>>
>> pkg://<the authority for the above bundle, which might not be
>> bar.example>/page.html?q=query%23fragment
>>
>
> This is how I got
> to package:https:,,distributor.example,package.wbn;q=query$https:,,publisher.example/page.html?q=query
> (or package://...). It claims that the "authority" for a subresource inside
> a bundle consists of both the bundle's location and the subresource's
> origin. arcp:// did the same, but without the subresource's origin, which
> would have the effect of unifying (3) and (4).
>
> The definition of origin necessarily becomes more flexible.  Rather than
>> insist on the reduction to a simple tuple for an origin, we should regard
>> these having a same-origin comparison operation, which sometimes allow two
>> origins to be regarded the same, even with vastly different inputs.  The
>> reduction to a tuple is useful, but potentially quite confining.
>>
>> If pkg://<bar> and https://<foo> happen to be same origin because we
>> decide that is useful (and safe), then we should be free to define that as
>> we choose, within the constraints of the origin tuple or not.
>>
>> Despite it being quite clearly invalid for domain name and port, this is
>> completely valid URI if we so choose: <pkg://$:1000000/>. That doesn't mean
>> that it couldn't be same-origin with <https://example.com/>, but it
>> might be a little tricky to reduce to an origin tuple.
>>
>
> I don't mind diverging from the origin-tuple concept, but so far I haven't
> found a need to.
>
> Thanks,
> Jeffrey
>