Re: draft-ietf-httpbis-cache-digest-00 comments
Mark Nottingham <mnot@mnot.net> Wed, 24 August 2016 06:55 UTC
Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 44C8D12D746 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 23 Aug 2016 23:55:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.469
X-Spam-Level:
X-Spam-Status: No, score=-7.469 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-0.548, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qZhWRnYzy0jh for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 23 Aug 2016 23:55:55 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0D9D312D0B7 for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 23 Aug 2016 23:55:55 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1bcS1z-0002zK-MD for ietf-http-wg-dist@listhub.w3.org; Wed, 24 Aug 2016 06:51:23 +0000
Resent-Date: Wed, 24 Aug 2016 06:51:23 +0000
Resent-Message-Id: <E1bcS1z-0002zK-MD@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <mnot@mnot.net>) id 1bcS1s-0002wx-8z for ietf-http-wg@listhub.w3.org; Wed, 24 Aug 2016 06:51:16 +0000
Received: from mxout-07.mxes.net ([216.86.168.182]) by maggie.w3.org with esmtps (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from <mnot@mnot.net>) id 1bcS1o-0005y9-QV for ietf-http-wg@w3.org; Wed, 24 Aug 2016 06:51:15 +0000
Received: from [192.168.3.104] (unknown [124.189.98.244]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.mxes.net (Postfix) with ESMTPSA id DCC6422E259; Wed, 24 Aug 2016 02:50:48 -0400 (EDT)
Content-Type: text/plain; charset="utf-8"
Mime-Version: 1.0 (Mac OS X Mail 9.3 \(3124\))
From: Mark Nottingham <mnot@mnot.net>
In-Reply-To: <CANatvzyJ54sYRCFZHnAEriNnH-fA7wN8MAHqx3hVkYK8V0a_zg@mail.gmail.com>
Date: Wed, 24 Aug 2016 16:50:46 +1000
Cc: Martin Thomson <martin.thomson@gmail.com>, HTTP Working Group <ietf-http-wg@w3.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <A913EDAA-16C0-41CB-8C4D-729C468E7889@mnot.net>
References: <CABkgnnU3Vf5c6NjDXgUtYKRikhQcJZDdFhKUhRBr8nXC-+8XyA@mail.gmail.com> <CANatvzyJ54sYRCFZHnAEriNnH-fA7wN8MAHqx3hVkYK8V0a_zg@mail.gmail.com>
To: Kazuho Oku <kazuhooku@gmail.com>
X-Mailer: Apple Mail (2.3124)
Received-SPF: pass client-ip=216.86.168.182; envelope-from=mnot@mnot.net; helo=mxout-07.mxes.net
X-W3C-Hub-Spam-Status: No, score=-8.4
X-W3C-Hub-Spam-Report: AWL=1.201, BAYES_00=-1.9, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1bcS1o-0005y9-QV 8b815b58a77a41feb12b670f08f7af85
X-Original-To: ietf-http-wg@w3.org
Subject: Re: draft-ietf-httpbis-cache-digest-00 comments
Archived-At: <http://www.w3.org/mid/A913EDAA-16C0-41CB-8C4D-729C468E7889@mnot.net>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/32346
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>
Sorry for the delay, been travelling, then stuck, then sick, then catching up. > On 14 Jul 2016, at 5:24 PM, Kazuho Oku <kazuhooku@gmail.com> wrote: > > Hi, > > Thank you for your comments. > > The comments below are mine, and Mark might have different opinions. > > 2016-07-13 11:18 GMT+09:00 Martin Thomson <martin.thomson@gmail.com>: >> As I've said before, this is really interesting work, I'm very much >> interested in seeing this progress. However, I found a lot of issues >> with the current draft. >> >> The latest version seems to be a bit of a regression. In particular, >> the addition of all the flags makes it a lot more complicated, and I'm >> already concerned about managing complexity here, especially since >> this is an optimization. >> >> The draft doesn't actually say where this frame should be sent - on a >> stream that carries a request, or on stream 0. > > In section 2.1 the draft states: A CACHE_DIGEST frame can be sent from > a client to a server on any stream in the “open” state. My > understanding is that it would be enough to indicate that the frame > should be sent on a stream that carries a request as well as when it > should be sent. Right after that, it says: > ... and conveys a digest of the contents of the client’s cache for associated stream. That probably should say "contents of the client's cache for the *origin* of the associated stream. The other obvious design would be to put them on stream 0 and then have an explicit Origin field. Do we anticipate C_D being sent before a stream is opened for a given origin? >> This is important >> because there are several mentions of origin. For instance, the new >> RESET flag talks about clearing digests for the "applicable origin". >> That establishes a large point of confusion about the scope that a >> digest is assumed to apply to; by their nature, this isn't necessarily >> fatal, until you want to talk about RESET and COMPLETE. >> >> To up-level a bit on this general issue, I'd like to see a better >> formulated description of the information that clients and servers are >> expected to maintain. There seem to be multiple objects that are >> stored, but I'm not sure what scope they are maintained in; is the >> scope an origin? > > Yes. +1. We should rewrite to clarify this.See also <https://github.com/httpwg/http-extensions/issues/216>. >> Assuming a particular scope, are there two objects, or four? That is, >> is there could be four stores: >> >> 1. assumed fresh by URL >> 2. assumed fresh by URL and etag >> 3. assumed stale by URL >> 4. assumed stale by URL and etag >> >> Or are 1+2 and 3+4 combined? The definition of RESET implies that all >> four stores are cleared. The definition of COMPLETE implies that only >> 1+2 or 3+4 are affected. > > There are four objects, which are grouped into two. > > Your reading is correct that RESET flag clears all of them, and that > the COMPLETE flag implies to either 1+2 or 3+4. +1 >> The draft doesn't talk about URL normalization. That is a risk to the >> feasibility of this; fail to do something sensible here and you could >> get a lot of spurious misses. Given that it is just an optimization, >> we don't need 100% agreement for this to work, but saying something is >> probably wise. We can probably get away with making some lame >> recommendations about how to encode a URL. Here's a rough cut of >> something that didn't make the draft deadline this time around: >> https://martinthomson.github.io/http-miser/#rfc.section.2.1 > > Thank you for the suggestion. > > I have a mixed feeling about this; in section 2.2.1 the current draft > says "Effective Request URI of a cached response" should be used. > > So the cache digest would work without URL normalization if both of > the following conditions are met: > * if the client caches a response NOT normalizing the request URI into some form > * if the server looks up the cache digest using a URI that a client would send > > For example, if a HTML with a script tag specifying /%7Efoo/script.js > is served to the client, then the draft excepts the client to use that > value (including %7E) to be used as the key, and that the server > should test the digest using the exact same form. > > The pros of this approach would be that it would be easier to > implement. The cons is that it would be fragile due to no > normalization. > > And I agree with you that in case we go without normalization we > should warn the users that the paths should be same in terms of > octets. My inclination would be to do no more normalisation than caches are normally doing, at least to start with. >> I don't see any value in COMPLETE. Even if we accept that there is >> only one connection from this client to this server, the benefit in >> knowing that the digest is complete is marginal at best. Is there >> just one extra resource missing, or thousands. As such, it changes >> the probability by some unknown quantity, which isn't actionable. > > I do find value in COMPLETE. > > For a server with the primary goal to minimize B/W consumption and the > second goal to minimize latency, it is wise to push responses that are > known NOT to be cached by a client. > > That's what the COMPLETE flag can be used for. Without the flag, a > server can only tell if a response is already cached or _might_ by > cached. > >> Can a frame with the RESET flag include a digest as well? > > Yes. That is the intention of the draft. > >> N and P could fit into a single octet. Since you are into the flags >> on the frame anyway, reduce N and P to 4 bits apiece and use flags to >> fill the upper bits as needed. But I don't think that they will be >> needed. At the point that you have more than 2^16 entries in the >> digest, you are probably not going to want to use this. Even with a >> tiny P=3 - which is too high a false positive probability to be useful >> - with N=2^16 you still need 32K to send the digest. You could safely >> increase the floor for P before you might need or want higher bits >> (and make the minimum higher than 2^0, which is probably too high a >> false-positive probability in any case). > > I would argue that P=1 would still be useful in some cases. For > example if 10 resources are missing on the client side, it would mean > that a server can detect 5 of them missing and push them in case P=1 > is used. > > And considering the fact that we would nevertheless have read-n-bits > operation while decoding the Golomb-encoded values, I do not see > strong reason to squash N and P into a single octet. +1 >> Is the calculation of N really round(log2(urls.length)). I thought >> that you would want to use ceil() instead. Is the word "up" missing >> from step 1 in Section 2.1.1? > > That draft has intentionally been written to use round. > > The numbers that matter when using Golomb-coded sets are: > > P: divisor used to divide bits that are unary-encoded and binary-encode > N*P: range of the encoded values > > For efficiency, both P and N*P must be powers of 2. > > To encode effectively, the real probability should be near to the > value of P. And that in turn means that N*P should be > round_to_power_of_two(urls.length * P) rather than > round_up_to_power_of_two(urls.length * P). > >> The draft never actually mentions that it uses [Rice-]Golomb-coding >> until the appendix. Including a reference to the Rice paper would >> help people implementing this understand where this comes from, as >> well as leading them to being able to find the relevant research. >> (nit: Spelling Golomb correctly would help here.) > > I agree. Thank you for noticing that! +1, see <https://github.com/httpwg/http-extensions/issues/230> > > -- > Kazuho Oku -- Mark Nottingham https://www.mnot.net/
- Re: draft-ietf-httpbis-cache-digest-00 comments Kazuho Oku
- draft-ietf-httpbis-cache-digest-00 comments Martin Thomson
- Re: draft-ietf-httpbis-cache-digest-00 comments Mark Nottingham