Re: [Wpack] package: URL scheme

Larry Masinter <LMM@acm.org> Sun, 14 June 2020 03:40 UTC

Return-Path: <masinter@gmail.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C3A383A09F6 for <wpack@ietfa.amsl.com>; Sat, 13 Jun 2020 20:40:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.402
X-Spam-Level:
X-Spam-Status: No, score=0.402 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id esVscOeQreP1 for <wpack@ietfa.amsl.com>; Sat, 13 Jun 2020 20:40:47 -0700 (PDT)
Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9CC603A09F4 for <wpack@ietf.org>; Sat, 13 Jun 2020 20:40:47 -0700 (PDT)
Received: by mail-pl1-x634.google.com with SMTP id d8so5390254plo.12 for <wpack@ietf.org>; Sat, 13 Jun 2020 20:40:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:references:in-reply-to:subject:date:message-id :mime-version:thread-index:content-language; bh=JeG9nl3KuDgY0k2C8SkDVmvutAOzjLbpwQir+DpGKCQ=; b=m+4EqiDZ3jzzjwSZwJbCEpR59dGxchJI9FfowdNWMsVFRDsI28hQVWgAeAMw7/8Cd7 0j6HFeWm/wOO0csdftKoI17ilDQMgenl3HpDw2hbakDFKW2/Uqr+w6E5V0xLdN0P7v7E RwCBasDmEUqsvJ57pErvIWb01hDhsOlLgKgPf43dCHwXDJ/MTMyU96K50LgvqRoMNVD2 BhR6B/uQUfSncHRHvtgJ5D71I+l8DBCX5MMdsTEEsErNHDFF1ktb9KswVhX7oSfRmoY4 Cc8E6RlIl0yX9dP1wF/mPH8ldScPv1SBMIkIUtxVbAPhLqsh/kSnBk3wIpPL2C6zc4/5 2VPA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:references:in-reply-to:subject :date:message-id:mime-version:thread-index:content-language; bh=JeG9nl3KuDgY0k2C8SkDVmvutAOzjLbpwQir+DpGKCQ=; b=BZJ+ua01NUxw+p3vjL/3uHNb8xXC1aOuRNFRNDH2VAjnzI1w5x/tvDHHW1admxggq6 HptQu3kfaA5aNm+4Luulbs/caq3uMlYbSHR3q/rv6kw5wVYLokTmaTqKxrl+16PJrrh5 z1muUPA/AV8GvhRrTl/2L0whSJ5cUq7fVheAhMJVsdX4KNVRm5EiysEYZzzxH3BBB+/Q Kd4zj36EwlKVPOPKA83psFh83eQvV5VRSnN1b9L39I753Zwy62NDlpeU5uCNOMb/yO32 COdVeMK/X0FHOteyhQcW+NcU5UaSmoOls/NzDa5T8376Y3tbOWxrb+sfjgCmZZX++6m6 zI8Q==
X-Gm-Message-State: AOAM532/2CaMYLgpD36C3wV0ES3kulCOXFJ5b7Ur8F69cURNGLMECq2x iTR/saSbHT4DD4/d5FJAWSUVU/5YBrM=
X-Google-Smtp-Source: ABdhPJy2z6pTEz34URu/tsC4eP/K/mzRK02BozI1k2ijF+EI3iefHt5NJDEJqvRonBFzSCZ0WtZ2fQ==
X-Received: by 2002:a17:902:ab98:: with SMTP id f24mr2564203plr.154.1592106046810; Sat, 13 Jun 2020 20:40:46 -0700 (PDT)
Received: from TVPC (c-67-169-101-78.hsd1.ca.comcast.net. [67.169.101.78]) by smtp.gmail.com with ESMTPSA id iq19sm8674218pjb.48.2020.06.13.20.40.45 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 13 Jun 2020 20:40:45 -0700 (PDT)
Sender: Larry Masinter <masinter@gmail.com>
From: Larry Masinter <LMM@acm.org>
X-Google-Original-From: "Larry Masinter" <lmm@acm.org>
To: 'Jeffrey Yasskin' <jyasskin@chromium.org>, 'Martin Thomson' <mt@lowentropy.net>, mknodel@cdt.org
Cc: 'WPACK List' <wpack@ietf.org>
References: <CANh-dXndPaue3zAADhpc+wyNb8dxs=nVKOAp1n=6SMCKoUe=eQ@mail.gmail.com> <97bcac95-c220-41ae-b957-d93fc57f4a74@www.fastmail.com> <CANh-dXkXnvi+1YK-+CjPaiiN9VhAecLEEjpever7D-gVB-sN0A@mail.gmail.com>
In-Reply-To: <CANh-dXkXnvi+1YK-+CjPaiiN9VhAecLEEjpever7D-gVB-sN0A@mail.gmail.com>
Date: Sat, 13 Jun 2020 20:40:45 -0700
Message-ID: <006801d641fd$96c67c50$c45374f0$@acm.org>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0069_01D641C2.EA68DCD0"
X-Mailer: Microsoft Outlook 16.0
Thread-Index: AQJUN3n7f7U8Tm8IyhTb0CDwyv1MGgIYWzNaAfVnaKynuv/JoA==
Content-Language: en-us
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/tG0nDSrLKtT2BFxbgOfHP7B8CiU>
Subject: Re: [Wpack] package: URL scheme
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Jun 2020 03:40:50 -0000

The problem I keep coming back to is that there is no firm line between the “archival” case and the immediate use case.

As soon as you get a representation of the publisher’s resource, it could change. 

I looked through all of the links you provided and couldn’t find any definition
of what a “signed bundle” is, or at least what is a bundle that isn’t “unsigned”.

Who signs what, when? And what happens over time with a signed bundle
when sites change, certs expire? How are signatures validated?

 

You should look at https://url.spec.whatwg.org/#url-rendering for the rules you’ll need to modify to get your “address bar’ conventions.

==

 <https://larrymasinter.net/> https://LarryMasinter.net  <https://going-remote.info/> https://going-remote.info

 

From: Wpack <wpack-bounces@ietf.org> On Behalf Of Jeffrey Yasskin
Sent: Friday, June 12, 2020 4:08 PM
To: Martin Thomson <mt@lowentropy.net>; mknodel@cdt.org
Cc: WPACK List <wpack@ietf.org>
Subject: Re: [Wpack] package: URL scheme

 

Hi Martin and Mallory,

 

As Larry mentioned, when the bundle is unsigned, it's the entity serving the bundle -- distributor.example here -- who has the ability to change the content, so they're probably the "true" authority. They're quoting the original publisher of the content, but there's no way for the user to verify that the quote is accurate.

 

Do you support the general goal of giving pairwise-distinct origins to all of:

 

1. https://foo.example/page.html

2. https://bar.example/page.html

3. A resource in the bundle at https://foo.example/bundle.wbn named https://bar.example/page.html.

4. A resource in the bundle at https://foo.example/bundle.wbn named https://quux.example/page.html.

5. A resource in the bundle at https://foo.example/otherbundle.wbn named https://bar.example/page.html.

 

If we say (1) and (3) get the same origin, it means the result can't help the Internet Archive serve their pages more safely, and we force (3), (4), and (5) to have the same origins.

If we say (2) and (3) get the same origin, we break the entire web origin security model. :-D

If we say (3) and (4) get the same origin, it means that El Paquete Semanal couldn't safely put multiple websites into the same bundle without risking them stepping on each other's storage. It could put them in separate files, but then they'd have trouble linking to each other.

If we say (3) and (5) get the same origin, it means that if an archive stores multiple versions of the same website, but those versions use storage differently, users couldn't easily try more than one version.

 

Abandoning some of these use cases definitely makes the URL design easier, if there's consensus to go that direction.

 

However, if we want to keep the use cases, and we want to put the bundle's server in the authority position of the URL, we get something like Larry's suggestion: pkg+https://foo.example/bundle.wbn?query#https://bar.example/page.html?q=query%23fragment. To give (3)-(5) distinct origins, the origin algorithm <https://url.spec.whatwg.org/#origin>  for pkg+https needs to take the fragment into account, returning something like ("pkg+https", "foo.example/bundle.wbn?query#https://bar.example", null, null). This design also makes it possible to resolve relative URLs relative to a pkg+https:// base URL, and it gives what's probably the wrong answer, moving relative to the bundle instead of the active subresource. That's probably ok: links inside a bundle need to explicitly search the bundle so that absolute references search the bundle first.

 

If we want to prevent package URLs from being used as base URLs, we could move to something like "package:https://foo.example/bundle.wbn?query#https://bar.example/page.html?q=query%23fragment", which has a similar syntax to blob: URLs and is still more readable than my original proposal.

 

Jeffrey

 

On Thu, Jun 11, 2020 at 5:02 PM Martin Thomson <mt@lowentropy.net <mailto:mt@lowentropy.net> > wrote:

I have a bunch of concerns about this approach, but let's start with a fairly major one:

This buries the authority.  If your authority is truly publisher.example, then that should be the authority component.  If you regard the rest as a split between the remainder of the identity of the resource itself and some secondary information about how that resource might be obtained, then you might have something closer to how you might want to structure a URI.

On Thu, Jun 11, 2020, at 08:00, Jeffrey Yasskin wrote:
> Hi all,
> 
> I wanted to raise awareness of a discussion about the URL scheme for 
> addressing resources within bundles 
> (draft-yasskin-wpack-bundled-exchanges).
> 
> We seem to be heading toward a URL of the form 
> package:<encoded-package-url>$<encoded-resource-uri>, which for a 
> package URL of https://distributor.example/package.wbn and resource URI 
> of https://publisher.example/page.html?q=query would lead to a URL of:
> 
> *package:https:,,distributor.example,package.wbn;q=query$https:,,publisher.example/page.html?q=query*
> 
> This arises from several considerations:
> 1. A bundle is served from a URL.
> 2. After a user downloads the bundle, it gets a new URL, often 
> file:///...
> 3. We can also hash the bundle to get a URI that stays stable across 
> transfers.
> 4. Resources inside a bundle are named by URIs (which, since the bundle 
> has an index, are also URLs even if, like urn:uuid:..., they wouldn't 
> normally be locators).
> 5. Once a user downloads a bundle, for web browsers to give its content 
> storage that's persistent across reloads, as requested in 
> https://github.com/WICG/webpackage/issues/498, the content needs to be 
> assigned a non-opaque origin.
> 
> I'm updating one of the documents about this in 
> https://github.com/WICG/webpackage/pull/584 and would welcome comments 
> here or there.
> 
> The URLs are obviously gross, so 
> https://github.com/WICG/webpackage/pull/560 suggests that browsers 
> avoid showing them to users in most cases. 
> 
> We could potentially simplify things if packages named things with just 
> paths instead of full URIs. We'd then name things based on the bundle's 
> origin. However, this loses archiving use cases.
> 
> This is all further discussed in the following documents and issues, 
> but you shouldn't feel responsible to read everything here:
> 
> * 
> https://docs.google.com/document/d/1BYQEi8xkXDAg9lxm3PaoMzEutuQAZi1r8Y0pLaFJQoo/edit# <https://docs.google.com/document/d/1BYQEi8xkXDAg9lxm3PaoMzEutuQAZi1r8Y0pLaFJQoo/edit> 
> * 
> https://chromium-review.googlesource.com/c/chromium/src/+/2226248/7#message-0a3efda5aff84770a1729422a5b26aeca3ee4e80 <

 

https://chromium-review.googlesource.com/c/chromium/src/+/2226248/7#message-0a3efda5aff84770a1729422a5b26aeca3ee4e80>
> * https://github.com/WICG/webpackage/issues/583
> * 
> https://github.com/WICG/webpackage/blob/master/explainers/navigation-to-unsigned-bundles.md#urls-for-bundle-components
> * https://lists.w3.org/Archives/Public/uri/2019Nov/0000.html
> 
> Thanks,
> Jeffrey
> _______________________________________________
> Wpack mailing list
> Wpack@ietf.org <mailto:Wpack@ietf.org> 
> https://www.ietf.org/mailman/listinfo/wpack
>

_______________________________________________
Wpack mailing list
Wpack@ietf.org <mailto:Wpack@ietf.org> 
https://www.ietf.org/mailman/listinfo/wpack