Re: JPEG-XL as Content-Encoding?

Alex Deymo <deymo@google.com> Fri, 21 August 2020 14:40 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1B7143A098C for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 21 Aug 2020 07:40:55 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.695
X-Spam-Level:
X-Spam-Status: No, score=-2.695 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_ADSP_CUSTOM_MED=0.001, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=fail (2048-bit key) reason="fail (body has been altered)" header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Z2g5aI2dYW-z for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Fri, 21 Aug 2020 07:40:52 -0700 (PDT)
Received: from lyra.w3.org (lyra.w3.org [128.30.52.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9D8783A097C for <httpbisa-archive-bis2Juki@lists.ietf.org>; Fri, 21 Aug 2020 07:40:50 -0700 (PDT)
Received: from lists by lyra.w3.org with local (Exim 4.92) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1k98BA-00088c-N8 for ietf-http-wg-dist@listhub.w3.org; Fri, 21 Aug 2020 14:38:04 +0000
Resent-Date: Fri, 21 Aug 2020 14:38:04 +0000
Resent-Message-Id: <E1k98BA-00088c-N8@lyra.w3.org>
Received: from www-data by lyra.w3.org with local (Exim 4.92) (envelope-from <deymo@google.com>) id 1k98B9-00087y-FR for ietf-http-wg@listhub.w3.org; Fri, 21 Aug 2020 14:38:03 +0000
Received: from mimas.w3.org ([128.30.52.79]) by lyra.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <deymo@google.com>) id 1k98AK-00084s-DD for ietf-http-wg@listhub.w3.org; Fri, 21 Aug 2020 14:37:12 +0000
Received: from mail-ej1-x62f.google.com ([2a00:1450:4864:20::62f]) by mimas.w3.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from <deymo@google.com>) id 1k98AI-0003ZI-2h for ietf-http-wg@w3.org; Fri, 21 Aug 2020 14:37:12 +0000
Received: by mail-ej1-x62f.google.com with SMTP id t10so2528850ejs.8 for <ietf-http-wg@w3.org>; Fri, 21 Aug 2020 07:37:09 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:cc; bh=rLuuoq295mKM7GdiyBJgpawXn3yUnwmJLhze+8UgoEk=; b=azO2W8YaDFALtl2CmrkxbFoDSrfdAJlpPz1Mwbky9d8wSuvUJ/duEnqHZIM5mnJ780 +XBMBG/kwhf2IemfW0oZ964RENFbEaY/8njWogukOfc1s784AGjderMCXkMyJcOgl8gb wE8J6jckNDE8fcbF9bdTXJpAT15Sk4ZEt2fr+YUh1PwF99qMuvCGGEMSISNNwLUSgPsw VnFAcre/Ar5lYbXNsyBoz4c8ATUQSEbL6qDikKta9ETJOZ8X2fCSY1JehxSRgxxp97kB G9Zs7eYfJvDqHEjLrSr3Qy1VZok7P0iLp0IGgCd5FHFMc92nCLDEWUzNlKrD6O5eWjYH P3VA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:cc; bh=rLuuoq295mKM7GdiyBJgpawXn3yUnwmJLhze+8UgoEk=; b=Lz8G3AqyLDQiVtJudDomrLK8n5jDnvymHs/DDl6Jg10HK+bWVLivq1Zgfm9kt2rYHN uB+zxuRh+O7+Ukp7mZx8zYm32b1Kmx0rkRqACV+Sc1DLHEb1ScT4JvhDqWIlaJsMngsQ ZD+eozP4fosbWj8VQju+qzZHb9EJQMbnzlqHYNo5iHwQ3DbHbwGwiwyoSdZ+LpGaD358 ucWqPz/VQeNQouut5AImeDHpirxTeTjuD9eYSDQpqKL2zCCY1C6TcJRVZ27L5pxeO8tS y99vx88udIGU6xQbS9FFs7mliAb2O63BC1s7SRBQR8UhLQa0D6gzLx3cec92iQ0A3lgi EuqQ==
X-Gm-Message-State: AOAM530hbYHm2vh26RVxOprR3+dyC6e13U2Hd54uWeBW9XgPNl586+5N iYBKF0yJyiq1baVFUcibWATIVuxAKa3NSVv5r8vA40Kcdew=
X-Google-Smtp-Source: ABdhPJyQRlenrPneitFce8NT13S18Wg09tuXzX8cROMsF6cEqjGjZ9r5eRW+X4ZQJXT3MYiIZMEPxPGJRsVdrNq+AcM=
X-Received: by 2002:a17:906:393:: with SMTP id b19mr2183187eja.268.1598020618068; Fri, 21 Aug 2020 07:36:58 -0700 (PDT)
MIME-Version: 1.0
References: <CACj=BEjdwH1OtS=uQXsgPN3XVJvVEUeisjeF5_iro1vg0omqWQ@mail.gmail.com> <20200820151401.GB21689@1wt.eu> <20200820183008.GA8086@lubuntu> <18159.1597960275@critter.freebsd.dk> <CADR0UcWkxb4ZtMgjqDeAv=m6G3ks2P75L-5pvt-ctz8WyJF29g@mail.gmail.com> <CAGd9gwhR5zTjsCugrZeSr7Yt_N6wxv7k5evrLBGW=dkKt257ZA@mail.gmail.com> <8f2e15c4-1b49-2ef9-3346-baf9bc610d14@gmx.de>
In-Reply-To: <8f2e15c4-1b49-2ef9-3346-baf9bc610d14@gmx.de>
From: Alex Deymo <deymo@google.com>
Date: Fri, 21 Aug 2020 16:36:46 +0200
Message-ID: <CAGd9gwjWriKCRNNDkjfx0ME0L8v5qT3mO=6X1U2tDNYyXLtZ9w@mail.gmail.com>
Cc: HTTP Working Group <ietf-http-wg@w3.org>
Content-Type: multipart/alternative; boundary="0000000000004916ba05ad642b63"
Received-SPF: pass client-ip=2a00:1450:4864:20::62f; envelope-from=deymo@google.com; helo=mail-ej1-x62f.google.com
X-W3C-Hub-Spam-Status: No, score=-18.6
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, MISSING_HEADERS=1.021, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5, W3C_AA=-1, W3C_WL=-1
X-W3C-Scan-Sig: mimas.w3.org 1k98AI-0003ZI-2h b4ab7773cce0c157152ddbd404f75dd0
X-caa-id: 386ccfd979
X-Original-To: ietf-http-wg@w3.org
Subject: Re: JPEG-XL as Content-Encoding?
Archived-At: <https://www.w3.org/mid/CAGd9gwjWriKCRNNDkjfx0ME0L8v5qT3mO=6X1U2tDNYyXLtZ9w@mail.gmail.com>
To: ietf-http-wg@w3.org
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/37949
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <https://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Le ven. 21 août 2020 à 14:27, Julian Reschke <julian.reschke@gmx.de> a
écrit :

> > However, on top of that, the lossless recompression of JPEG files allows
> > you to get this ~20% gain for existing files. When you deploy a new
> > lossy codec there is the question of what to do with the existing
> > images. If you have a website with photos and want to convert your
> > already lossy JPEG files to a new codec to save storage and bandwidth
> > and you decide to decode them to pixels and encode them back to the new
> > format you will end up with more artifacts or worse compression density
> > trying to accurately represent the JPEG artifacts in the new codec,
> > whatever the codec is. It's impractical to do this lossy transcoding to
> > a new codec at large scale on existing images, each application would
> > need to evaluate whether they want to do this for existing images. This
> > story is different if you start with a large and high quality image
> > (like a JPEG from a camera) and want to encode in a smaller form for the
> > web, since there you already have a high quality file.
>
> That makes it sound a bit as if a losslessly-re-encoded JPG file is not
> a valid JXL file. Is that the case?
>

A losslessly recompressed JPEG is a valid JXL file. There's value in
conserving your JPEG files as lossless recompressed versions outside the
Content-Encoding world (like, converting your existing library in your
hard-drive).

What I meant here is that if you start with a large high quality image
(JPEG or RAW) and you encoded it in the past long time ago to a lower
resolution or lower quality JPEG for the web application, you introduced
certain specific JPEG-artifacts and discarded information about the
original file. In some sense, the damage to the image is done. If you
already did this, then you are limited in your options on how to further
compress this file because you don't know what the original file looked
like so you might be trying to accurately reproduce JPEG artifacts with a
new codec instead of accurately reproduce original image features, this is
where lossless recompression is a good idea.
Instead, if you still have the original file, you can produce a lower
quality or lower resolution JXL that's visually similar to the original
file (not visually similar to the low res JPEG in the previous case). This
would give you a better compression ratio for the visual quality (but it
would not give you a JPEG file right away).

What's not true is the opposite statement, and maybe that's where the
confusion is. Not every JXL is a losslessly-re-encoded JPEG, although you
can always do stuff like decode any JXL to pixels and encode it back to
JPEG but it would largely depend on how you encode back to JPEG what file
you end up with. The lossless recompression feature limits the options when
encoding the JXL and adds extra information to be able to deterministically
produce a certain JPEG file.


> ...
> > I think the only shocking thing about a content-encoding for JPEGs is
> > that it can't encode any arbitrary file only JPEGs, but if you look at
> > "general purpose" compressors like Brotli they still can't compress to a
> > smaller file every file; many binary files that are already compressed
> > like .zip or even a JPEG files (unless they have a large ICC) won't
> > compress to a smaller file so you just don't do it even if Brotli is
> > able to compress them to a ~similar size file.
> > ...
>
> That's indeed a concern. For the other currently registered encodings,
> you *can* apply them, but they do not necessarily help.
>
> This one can't be applied to any file type. One way to address this
> would be to tune the format that it *can* handle any file type (by just
> adding a tiny wrapper around it and preserving the actual octet stream
> within).


Yes you could add a tiny frame around to tell whether this was lossless
recompressed or not (maybe paying ~1 more byte), but isn't this basically
what the Content-Encoding header in the response is for anyway? I don't see
an application where this frame would help, the server side is not forced
to use the content-encoding and sending a file wrapped into another format
that adds no benefit would be a bit of a waste:
1. If we don't have this frame, you can call the function to do the
lossless encoding, if it returns with an error (like if the file is not a
JPEG) then you don't set Content-Encoding to jxl.
2. If we do have this frame, you always set the content encoding as jxl,
and then the function that would do the encoding does exactly the same
logic but stores the "jxl or raw" bit of information in the first byte
depending on whether it was able to re-encode it.
There's very little difference in how much you can already send to the
client before the encoding is done in either case and in general you know
very quickly whether the file can be encoded or not. Maybe all we need is a
function to tell very quickly whether we *can* encode it. I think this is
possible and relatively easy.  But I understand that this limitation may
need changes in how your server integrates a new content encoding since it
is not the same way that brotli for example was integrated; this is
something that can be addressed at the time of implementing support for
this content encoding in your server-side software.
My idea of how this would be implemented is more along the lines of already
having the jxl lossless file for static content and just serving it on
request or decoding+serving for clients not supporting it, given that you
get a significant benefit in storage size of static content (similar to
brotli_static setting in Nginx brotli).

That said, I should probably mention that according to the spec draft a
valid JPEG-1 file is also a valid JXL file, so it is really the non-JPEG
files that you can't re-encode and that part you can tell by looking at the
first few bytes, so we already have this frame information for old JPEG1 vs
JXL file.