Re: ETag specification: load balance friendly and merge with Digest header from

Roberto Polli <> Mon, 13 July 2020 14:52 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id A3D923A0A29 for <>; Mon, 13 Jul 2020 07:52:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -3.018
X-Spam-Status: No, score=-3.018 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id 7pO2Q5tQjtkE for <>; Mon, 13 Jul 2020 07:52:55 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 553C33A09D3 for <>; Mon, 13 Jul 2020 07:52:55 -0700 (PDT)
Received: from lists by with local (Exim 4.92) (envelope-from <>) id 1juzm9-0005Qw-TE for; Mon, 13 Jul 2020 14:49:49 +0000
Resent-Date: Mon, 13 Jul 2020 14:49:49 +0000
Resent-Message-Id: <>
Received: from ([]) by with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from <>) id 1juzm8-0005QB-Pd for; Mon, 13 Jul 2020 14:49:48 +0000
Received: from ([2607:f8b0:4864:20::144]) by with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from <>) id 1juzm6-0003VC-5G for; Mon, 13 Jul 2020 14:49:48 +0000
Received: by with SMTP id a11so11415875ilk.0 for <>; Mon, 13 Jul 2020 07:49:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jr45U5GerVwSEVb24MHDJ1VPQGxFqYy68qqiEX5EU/M=; b=qYk0h5wj2l8B3YSwKVQqiilekpNtxm2ViajVxe0elIj/zDb9kKSmmAEWs9YPoqWUjc WXuM1Us30j23KwZRZ/gDHcSsChDplnQfdH3nB/Q4WiO6fO+t8d13VzWOGk3kfHI5AfGo BxER1AzLqQgoRc8VYjFMvmopvrOh+EcePrjIn9/+pcM9nlJ0uq/TE4zRsrPyETpToc9O wrqcgXzCVcIU7TxnWtIpKeEwXvrBAgpAlEnjueaodw7pwOSO2cT2xE7eJEWL+mA1cwbe ZGUl4xMkRrZgXqFKpoQrvvwm572xlHpHs1o+upZwijNWUXUhcMvyNHCrfbq2p+lJxx/q bRmg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jr45U5GerVwSEVb24MHDJ1VPQGxFqYy68qqiEX5EU/M=; b=hRMpYC/ekT4eZdFdDbdVQn2TwQsCMzSYIPghtQYjTS916heB/152Ck6pypt9Olh3IC yEsKd6VRwHQtCYkd97374WJCMpUZpZ1WCWYunW23xJarT3zVJlCMfX/vGpXvtkpZJbiE hO5zSOJdqXWhfx2mBvDHsUcHT6iQgH34vtu2JsAXiB1ijAeuqZQDRtrSA5Sbr8Wq5hl7 Sz2XeSfsad0KUUPAa5sjH2r4eoYAZIhbrSHzdXT1qGjGInX+gQCSAYz3vAX6GEADX2nZ Tl3kgOsx0lE0LLbJjyyELsuGZBOYdhA2imu++/mBFWnonh/HU0MW/ERZMv5C9OUS+u/P K6aQ==
X-Gm-Message-State: AOAM533OxqkHdlRkVtfLbXUKVpSY751Pt/IApAvfNoT44rvX/NuEEOGE a6BgnNvUuJodh2QAA7AWnODSNpQ6lhCcpkyHV4slfA==
X-Google-Smtp-Source: ABdhPJx7E7DRiE5h5efPZS78QhyY6/PwEP4ojkxDTO1ODRtIqUhTn/qXlA7TPzy4a03Zt7XmVwYngT/FhcHSfqXMJIk=
X-Received: by 2002:a92:bb98:: with SMTP id x24mr39528ilk.270.1594651774956; Mon, 13 Jul 2020 07:49:34 -0700 (PDT)
MIME-Version: 1.0
References: <>
In-Reply-To: <>
From: Roberto Polli <>
Date: Mon, 13 Jul 2020 16:49:22 +0200
Message-ID: <>
To: Sergey Ponomarev <>
Cc: HTTP Working Group <>
Content-Type: multipart/alternative; boundary="000000000000963a4305aa53cc78"
Received-SPF: pass client-ip=2607:f8b0:4864:20::144;;
X-W3C-Hub-Spam-Status: No, score=-5.1
X-W3C-Hub-Spam-Report: BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_DB=-1, W3C_WL=-1
X-W3C-Scan-Sig: 1juzm6-0003VC-5G 5f42f9b5e12506019f78c0be293182d3
Subject: Re: ETag specification: load balance friendly and merge with Digest header from
Archived-At: <>
X-Mailing-List: <> archive/latest/37879
Precedence: list
List-Id: <>
List-Help: <>
List-Post: <>
List-Unsubscribe: <>

Hi Sergey,

Digest header was introduced long ago via Rfc3230. We are just updating

It's goal is different from etag though, but you can use digest-algorithms
to compute strong etags. Consider though that digest changes when
localising resources (eg. Via content-language) while weak etags probably

If someone thinks we should describe the relationship between digest and
etags in the new spec we can do it.

Have a nice day,

Il lun 13 lug 2020, 02:28 Sergey Ponomarev <> ha scritto:

> Hi,
> I just implemented ETag caching for BusyBox httpd which is a http server
> for embedded devices like WiFi routers.
> While implementing I had to choose what exactly should be generated as
> ETag.
> ETag is specified in as
> an opaque value and a server is free to generate it as it needs.
> In the Conditional
> Requests are better explained strategies to generate and compare ETags.
> But even in the upcoming HTTP Caching draft-ietf-httpbis-cache-09 no any
> practical details about ETag generation.
> I did small research and found out that all web servers do it in their own
> way and this causes several problems:
> 1. ETag may be badly or even wrongly generated.
> 2. When two different servers e.g. Apache and Nginx are behind load
> balancer then their ETags will be always discarded because they are
> generated differently. That's why some sysadmins disable ETag on one of the
> servers.
> These problems can be easily fixed if HTTP specification will provide a
> recommended way to generate ETags while keeping freedom of choice.
> Typical ETag is based on file's Last Modification Time and Size which can
> be easily retrieved from the file system but can be a more strict hash or
> checksum and sometimes a semantic version.
> Just a quick overview of typical algorithms used in webservers.
> Consider  we have a file with
> * Size 1047 i.e. 417 in hex.
> * MTime i.e. last modification on Mon, 06 Jan 2020 12:54:56 GMT which
> is 1578315296 milliseconds in unix time or 1578315296666771000 nanoseconds.
> * Inode which is a physical file number 66 i.e. 42 in hex
> Different webservers returns ETag like:
> Nginx: "5e132e20-417"                         i.e.
> "hex(MTime)-hex(Size)". Not configurable.
> Apache/2.2: "42-417-59b782a99f493"  i.e.  "hex(INode)-hex(Size)-hex(MTime
> in nanoseconds)". Can be configured but MTime anyway will be in nanos
> Apache/2.4: "417-59b782a99f493"       i.e.  "hex(Size)-hex(MTime in
> nanoseconds)" i.e. without INode which is friendly for load balancing when
> identical file have different INode on different servers.
> OpenWrt uhttpd: "42-417-5e132e20"    i.e.
> "hex(INode)-hex(Size)-hex(MTime)". Not configurable.
> Tomcat 9: W/"1047-1578315296666"   i.e.  Weak"Size-MTime in Nanoseconds".
> This is incorrect ETag because it should be strong as for a static file
> i.e. octal compatibility.
> LightHTTPD:  most weird:  "hashcode(42-1047-1578315296666771000)" i.e.
> INode-Size-MTime but then reduced to a simple integer by hashcode. Can be
> configured but you can only disable one part (etag.use-inode = "disabled")
> Hex numbers are used here so often because it's cheap to convert a decimal
> number to a shorter hex string.
> Inode while adding more guarantees makes load balancing not possible and
> very fragile if you simply copied the file during application redeploy.
> MTime in nanoseconds is not available on all platforms and we don't need
> such granularity. Apache have reported bugs on this like
> The order MTime-Size or Size-MTime  is also matters because MTime is more
> likely changed so comparing ETag string may be faster for a dozen
> CPU cycles.
> Even if this is not a full checksum hash but definitely not a weak ETag.
> This is enough to show that we expect octal compatibility for Range
> requests.
> Apache and Nginx shares almost all trafik in Internet but most static
> files are shared via Nginx and it is not configurable.
> If I am not missing anything then it looks like Nginx uses the most
> reasonable schema. And I used it for BusyBox httpd.
> The whole ETag generated by printf("\"%" PRIx64 "-%" PRIx64 "\"",
> last_mod, file_size)
> My proposition is to take Nginx schema and make it as a recommended
> ETag algorithm. Or at least just to mention in rfc7232 as an example.
> And other servers should have at least possibility to configure such ETag
> form.
> I'll try to engage other web servers teams into the discussion and 'll try
> to create patches for them.
> While having the simple MTime-Size ETag algorithm solves a bunch of
> problems but some systems wants to have more guarantees and they need hash
> based ETags.
> Any hash even MD5 or CRC32 is great to use as ETag.
> There is a draft of Digest Headers
> .
> It's idea is similar to Subresource Integration (SRI).
> And in fact instead of introducing the new Digest header we can just reuse
> ETag header with prefix.
> Respectively instead of:
>     Digest: sha-256=4REjxQ4yrqUVicfSKYNO/cF9zNj5ANbzgDZt3/h3Qxo=
> We can use
>     ETag: "sha-256=4REjxQ4yrqUVicfSKYNO/cF9zNj5ANbzgDZt3/h3Qxo="
> Client can easily parse ETag header and by prefix determine the way to
> validate.
> We'll have "structured ETag" and they are already supported by proxies.
> For the same file server can send two comma separated ETags: one MTimeSize
> and additional digest based. Old clients just resend them via
> If-None-Match. If a server like BusyBox can only validate MTimeSize Etag it
> will validate it and ignore sha256 based ETag.
> BTW the file hashes can be stored ext4 in extended attributes to avoid
> recalculating.
> Please tell your thoughts and opinions and share best practice for ETags.
> See also:
> Apache code to generate ETag
> LightHTTPD
> --
> Sergey Ponomarev <>, skype:stokito