Re: draft-ietf-httpbis-cache-digest-00 comments

Mark Nottingham <mnot@mnot.net> Wed, 24 August 2016 06:55 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 44C8D12D746 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 23 Aug 2016 23:55:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.469
X-Spam-Level:
X-Spam-Status: No, score=-7.469 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.548, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qZhWRnYzy0jh for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 23 Aug 2016 23:55:55 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0D9D312D0B7 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 23 Aug 2016 23:55:55 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1bcS1z-0002zK-MD for ietf-http-wg-dist@listhub.w3.org; Wed, 24 Aug 2016 06:51:23 +0000
Resent-Date: Wed, 24 Aug 2016 06:51:23 +0000
Resent-Message-Id: <E1bcS1z-0002zK-MD@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <mnot@mnot.net>) id 1bcS1s-0002wx-8z for ietf-http-wg@listhub.w3.org; Wed, 24 Aug 2016 06:51:16 +0000
Received: from mxout-07.mxes.net ([216.86.168.182]) by maggie.w3.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from <mnot@mnot.net>) id 1bcS1o-0005y9-QV for ietf-http-wg@w3.org; Wed, 24 Aug 2016 06:51:15 +0000
Received: from [192.168.3.104] (unknown [124.189.98.244]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id DCC6422E259; Wed, 24 Aug 2016 02:50:48 -0400 (EDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <CANatvzyJ54sYRCFZHnAEriNnH-fA7wN8MAHqx3hVkYK8V0a_zg@mail.gmail.com>
Date: Wed, 24 Aug 2016 16:50:46 +1000
Cc: Martin Thomson <martin.thomson@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <A913EDAA-16C0-41CB-8C4D-729C468E7889@mnot.net>
References: <CABkgnnU3Vf5c6NjDXgUtYKRikhQcJZDdFhKUhRBr8nXC-+8XyA@mail.gmail.com> <CANatvzyJ54sYRCFZHnAEriNnH-fA7wN8MAHqx3hVkYK8V0a_zg@mail.gmail.com>
To: Kazuho Oku <kazuhooku@gmail.com>
X-Mailer: Apple Mail (2.3124)
Received-SPF: pass client-ip=216.86.168.182; envelope-from=mnot@mnot.net; helo=mxout-07.mxes.net
X-W3C-Hub-Spam-Status: No, score=-8.4
X-W3C-Hub-Spam-Report: AWL=1.201, BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1bcS1o-0005y9-QV 8b815b58a77a41feb12b670f08f7af85
X-Original-To: ietf-http-wg@w3.org
Subject: Re: draft-ietf-httpbis-cache-digest-00 comments
Archived-At: <http://www.w3.org/mid/A913EDAA-16C0-41CB-8C4D-729C468E7889@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/32346
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

Sorry for the delay, been travelling, then stuck, then sick, then catching up.


> On 14 Jul 2016, at 5:24 PM, Kazuho Oku <kazuhooku@gmail.com> wrote:
> 
> Hi,
> 
> Thank you for your comments.
> 
> The comments below are mine, and Mark might have different opinions.
> 
> 2016-07-13 11:18 GMT+09:00 Martin Thomson <martin.thomson@gmail.com>:
>> As I've said before, this is really interesting work, I'm very much
>> interested in seeing this progress.  However, I found a lot of issues
>> with the current draft.
>> 
>> The latest version seems to be a bit of a regression.  In particular,
>> the addition of all the flags makes it a lot more complicated, and I'm
>> already concerned about managing complexity here, especially since
>> this is an optimization.
>> 
>> The draft doesn't actually say where this frame should be sent - on a
>> stream that carries a request, or on stream 0.
> 
> In section 2.1 the draft states: A CACHE_DIGEST frame can be sent from
> a client to a server on any stream in the “open” state. My
> understanding is that it would be enough to indicate that the frame
> should be sent on a stream that carries a request as well as when it
> should be sent.

Right after that, it says:

> ... and conveys a digest of the contents of the client’s cache for associated stream.

That probably should say "contents of the client's cache for the *origin* of the associated stream.

The other obvious design would be to put them on stream 0 and then have an explicit Origin field.

Do we anticipate C_D being sent before a stream is opened for a given origin?	


>> This is important
>> because there are several mentions of origin.  For instance, the new
>> RESET flag talks about clearing digests for the "applicable origin".
>> That establishes a large point of confusion about the scope that a
>> digest is assumed to apply to; by their nature, this isn't necessarily
>> fatal, until you want to talk about RESET and COMPLETE.
>> 
>> To up-level a bit on this general issue, I'd like to see a better
>> formulated description of the information that clients and servers are
>> expected to maintain.  There seem to be multiple objects that are
>> stored, but I'm not sure what scope they are maintained in; is the
>> scope an origin?
> 
> Yes.

+1. We should rewrite to clarify this.See also <https://github.com/httpwg/http-extensions/issues/216>.


>> Assuming a particular scope, are there two objects, or four?  That is,
>> is there could be four stores:
>> 
>> 1. assumed fresh by URL
>> 2. assumed fresh by URL and etag
>> 3. assumed stale by URL
>> 4. assumed stale by URL and etag
>> 
>> Or are 1+2 and 3+4 combined?  The definition of RESET implies that all
>> four stores are cleared.  The definition of COMPLETE implies that only
>> 1+2 or 3+4 are affected.
> 
> There are four objects, which are grouped into two.
> 
> Your reading is correct that RESET flag clears all of them, and that
> the COMPLETE flag implies to either 1+2 or 3+4.

+1


>> The draft doesn't talk about URL normalization.  That is a risk to the
>> feasibility of this; fail to do something sensible here and you could
>> get a lot of spurious misses.  Given that it is just an optimization,
>> we don't need 100% agreement for this to work, but saying something is
>> probably wise.  We can probably get away with making some lame
>> recommendations about how to encode a URL.  Here's a rough cut of
>> something that didn't make the draft deadline this time around:
>> https://martinthomson.github.io/http-miser/#rfc.section.2.1
> 
> Thank you for the suggestion.
> 
> I have a mixed feeling about this; in section 2.2.1 the current draft
> says "Effective Request URI of a cached response" should be used.
> 
> So the cache digest would work without URL normalization if both of
> the following conditions are met:
> * if the client caches a response NOT normalizing the request URI into some form
> * if the server looks up the cache digest using a URI that a client would send
> 
> For example, if a HTML with a script tag specifying /%7Efoo/script.js
> is served to the client, then the draft excepts the client to use that
> value (including %7E) to be used as the key, and that the server
> should test the digest using the exact same form.
> 
> The pros of this approach would be that it would be easier to
> implement. The cons is that it would be fragile due to no
> normalization.
> 
> And I agree with you that in case we go without normalization we
> should warn the users that the paths should be same in terms of
> octets.

My inclination would be to do no more normalisation than caches are normally doing, at least to start with.


>> I don't see any value in COMPLETE.  Even if we accept that there is
>> only one connection from this client to this server, the benefit in
>> knowing that the digest is complete is marginal at best.  Is there
>> just one extra resource missing, or thousands.  As such, it changes
>> the probability by some unknown quantity, which isn't actionable.
> 
> I do find value in COMPLETE.
> 
> For a server with the primary goal to minimize B/W consumption and the
> second goal to minimize latency, it is wise to push responses that are
> known NOT to be cached by a client.
> 
> That's what the COMPLETE flag can be used for. Without the flag, a
> server can only tell if a response is already cached or _might_ by
> cached.
> 
>> Can a frame with the RESET flag include a digest as well?
> 
> Yes. That is the intention of the draft.
> 
>> N and P could fit into a single octet.  Since you are into the flags
>> on the frame anyway, reduce N and P to 4 bits apiece and use flags to
>> fill the upper bits as needed.  But I don't think that they will be
>> needed.  At the point that you have more than 2^16 entries in the
>> digest, you are probably not going to want to use this.  Even with a
>> tiny P=3 - which is too high a false positive probability to be useful
>> - with N=2^16 you still need 32K to send the digest.  You could safely
>> increase the floor for P before you might need or want higher bits
>> (and make the minimum higher than 2^0, which is probably too high a
>> false-positive probability in any case).
> 
> I would argue that P=1 would still be useful in some cases. For
> example if 10 resources are missing on the client side, it would mean
> that a server can detect 5 of them missing and push them in case P=1
> is used.
> 
> And considering the fact that we would nevertheless have read-n-bits
> operation while decoding the Golomb-encoded values, I do not see
> strong reason to squash N and P into a single octet.

+1

>> Is the calculation of N really round(log2(urls.length)).  I thought
>> that you would want to use ceil() instead.  Is the word "up" missing
>> from step 1 in Section 2.1.1?
> 
> That draft has intentionally been written to use round.
> 
> The numbers that matter when using Golomb-coded sets are:
> 
>    P: divisor used to divide bits that are unary-encoded and binary-encode
>    N*P: range of the encoded values
> 
> For efficiency, both P and N*P must be powers of 2.
> 
> To encode effectively, the real probability should be near to the
> value of P. And that in turn means that N*P should be
> round_to_power_of_two(urls.length * P) rather than
> round_up_to_power_of_two(urls.length * P).
> 
>> The draft never actually mentions that it uses [Rice-]Golomb-coding
>> until the appendix.  Including a reference to the Rice paper would
>> help people implementing this understand where this comes from, as
>> well as leading them to being able to find the relevant research.
>> (nit: Spelling Golomb correctly would help here.)
> 
> I agree. Thank you for noticing that!

+1, see <https://github.com/httpwg/http-extensions/issues/230>

> 
> -- 
> Kazuho Oku

--
Mark Nottingham   https://www.mnot.net/