[Wpack] WPACK and Web Archiving-focused bundle format
Ilya Kreymer <ikreymer@gmail.com> Tue, 05 October 2021 19:18 UTC
Return-Path: <ikreymer@gmail.com>
X-Original-To: wpack@ietfa.amsl.com
Delivered-To: wpack@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DA2953A0953 for <wpack@ietfa.amsl.com>; Tue, 5 Oct 2021 12:18:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.097
X-Spam-Level:
X-Spam-Status: No, score=-2.097 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WRv4rQUN_H4C for <wpack@ietfa.amsl.com>; Tue, 5 Oct 2021 12:18:50 -0700 (PDT)
Received: from mail-ed1-x52f.google.com (mail-ed1-x52f.google.com [IPv6:2a00:1450:4864:20::52f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 14EF23A0958 for <wpack@ietf.org>; Tue, 5 Oct 2021 12:18:50 -0700 (PDT)
Received: by mail-ed1-x52f.google.com with SMTP id g10so800985edj.1 for <wpack@ietf.org>; Tue, 05 Oct 2021 12:18:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=VBsT/pdTk8DwlF1FYmUwQmnv8JR4XKlqBpdcW32u7u4=; b=MI9gYqlRyfhdaKCkq1Ta70BwtlmsgshJNfhsuudVYn2ppK4Gd0G079D9kvpVMKMF7l mtJKmXrdazwMVAGnLYJw9deu1/VOYZlBGGd/qFfj1+verE/I1Z76ct0/xotpLwo/ZF35 u70sR++nFsWTHnthgZ7FXvJrzgEfIXGApaQ58aC0KIYArY9Vow2tUE5kkHbol0NCFa2/ H6Yn5jSK3HGrSjwAaA+48kmjBWJ3XB83HfwleeMcyKwS9buaiNDVUyuSkumAOv/kWVjX Mp5OFn2t7EssHk3cRC7gaSzZptFNwMIE0dMcNgnNoo5dOSbjstT5EG3w7PGVG04D/Dd5 PF8A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=VBsT/pdTk8DwlF1FYmUwQmnv8JR4XKlqBpdcW32u7u4=; b=D+jgBMpDfj4yxzvvH64iX0LD3DVYiceKvH3Q5AsK1gm0/62mQRAnWKTiQLEzKcrPTQ FtP/q0PdPID+y5cm58rNgSZb5NTAg4Kk7LQT3LhmriqOR1c1B40mcG6mnFVyJmwI72MV e6j1m2lGJCKsw4br1n1li2mQtYp75K9OvuAU+i40YguRm4NFt4ETfPPrjw7s4S/qHdnp SV4NSL/yO8X064jldfOk/4yZuvOZ1GflODlKcBOBxhbk/DgGKQhLzaOP2FW9mYZWnWVW 64bTwaJXi/D40CL/6OcEJk2slbzj9x3RuhX36t3uAXJV2HewmyDOQxrr9DuWiF2iGB8o hC6w==
X-Gm-Message-State: AOAM531uptGv0mKeaRQ0GTU+KN6Cf99wNU/0zlMt/rSd0URiWikqON8w X4mZKHL044s4My833bosYP1pj7GsaHJU9ymNRatO4faoB2U=
X-Google-Smtp-Source: ABdhPJxtGyk6ASKjrhPv5nugpheu8chGiaxJ5mq5i7bY9MJloAvzmrH8HYmJi7UgbXoJV0/bmqYY76CaxA6aRNpaz4o=
X-Received: by 2002:a17:906:6d0a:: with SMTP id m10mr26476255ejr.90.1633461527944; Tue, 05 Oct 2021 12:18:47 -0700 (PDT)
MIME-Version: 1.0
From: Ilya Kreymer <ikreymer@gmail.com>
Date: Tue, 05 Oct 2021 12:18:37 -0700
Message-ID: <CANAUx6iHU2ip8af0Z32Hiy_nLw24cNX3GcHQcv7WL4UrJRcrPw@mail.gmail.com>
To: WPACK List <wpack@ietf.org>
Content-Type: multipart/alternative; boundary="00000000000020a09e05cd9fe645"
Archived-At: <https://mailarchive.ietf.org/arch/msg/wpack/m4ofhMwnMZq4vLR6WosFDeYAG7o>
Subject: [Wpack] WPACK and Web Archiving-focused bundle format
X-BeenThere: wpack@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Web Packaging <wpack.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/wpack>, <mailto:wpack-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/wpack/>
List-Post: <mailto:wpack@ietf.org>
List-Help: <mailto:wpack-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/wpack>, <mailto:wpack-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 05 Oct 2021 19:18:55 -0000
Hello, We would like to start the process of standardizing a format that fully supports all the requirements for bundling web data, http request/response pairs, page lists, and associated metadata for web archiving use cases, a 'web archive bundle' format. I wanted to inquire if the WPACK working group would be the right place for introducing such an effort, or if it would be out of scope. Looking at the charter, there is definitely overlap with some, but not all of the goals. In particular, the bullet points #1, 2, 3, 5 from https://datatracker.ietf.org/doc/charter-ietf-wpack/ are shared goals of the web archiving bundle format, while some of the other goals are less important. The current CBOR-based bundle proposal is not sufficient to address the full scope of web archiving requirements, and that's totally fine, as key goals for that format are quite different. I am wondering if this group would be open to accepting a proposal for a different standard specifically geared towards addressing all of the web archiving use cases, or if we should pursue other paths, such as other working groups/new working group within IETF for such standardization efforts. Here is a (very) brief summary of some of the requirements for web archiving that will be fulfilled in this new format: - forwards and backwards compatibility with existing ISO WARC format, which will be used to store the raw http request/response data. - an index based on URL + timestamp - index support for multiple request/response pairs of same URL, at same or different timestamp - support for storing request body, eg. for POST requests - random-access based URL+timestamp index that can be partially loaded via random access. - support for different size web archive bundles, from a single page to very large archive bundles consisting of many GBs of data or hundreds of pages that are loaded entirely via random access. - support for text index to allow full-text search - support for combining multiple web archive bundles for a growing web archive collection. We of course plan to elaborate on all of these but first want to understand if this group would be the appropriate place for this work, or not. Any advice/additional guidance on this would be appreciated, or any next steps to pursue this effort, if this is the right place for this work. Thank you, Ilya Webrecorder
- [Wpack] WPACK and Web Archiving-focused bundle fo… Ilya Kreymer
- Re: [Wpack] WPACK and Web Archiving-focused bundl… Larry Masinter
- Re: [Wpack] WPACK and Web Archiving-focused bundl… Sean Turner
- Re: [Wpack] WPACK and Web Archiving-focused bundl… Ilya Kreymer