draft-ietf-httpbis-cache-digest-00 comments

Martin Thomson <martin.thomson@gmail.com> Wed, 13 July 2016 02:24 UTC

Return-Path: <ietf-http-wg-request+bounce-httpbisa-archive-bis2juki=lists.ie@listhub.w3.org>
X-Original-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Delivered-To: ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1C7B612DB80 for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Jul 2016 19:24:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -8.308
X-Spam-Level:
X-Spam-Status: No, score=-8.308 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.287, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id roojR2AQsazc for <ietfarch-httpbisa-archive-bis2Juki@ietfa.amsl.com>; Tue, 12 Jul 2016 19:24:05 -0700 (PDT)
Received: from frink.w3.org (frink.w3.org [128.30.52.56]) (using TLSv1.2 with cipher DHE-RSA-AES128-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7CE0F12DAEE for <httpbisa-archive-bis2Juki@lists.ietf.org>; Tue, 12 Jul 2016 19:24:05 -0700 (PDT)
Received: from lists by frink.w3.org with local (Exim 4.80) (envelope-from <ietf-http-wg-request@listhub.w3.org>) id 1bN9lr-0000d3-E0 for ietf-http-wg-dist@listhub.w3.org; Wed, 13 Jul 2016 02:19:31 +0000
Resent-Date: Wed, 13 Jul 2016 02:19:31 +0000
Resent-Message-Id: <E1bN9lr-0000d3-E0@frink.w3.org>
Received: from maggie.w3.org ([128.30.52.39]) by frink.w3.org with esmtps (TLS1.2:DHE_RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <martin.thomson@gmail.com>) id 1bN9lm-0000c1-Ri for ietf-http-wg@listhub.w3.org; Wed, 13 Jul 2016 02:19:26 +0000
Received: from mail-qk0-f179.google.com ([209.85.220.179]) by maggie.w3.org with esmtps (TLS1.2:RSA_AES_128_CBC_SHA1:128) (Exim 4.80) (envelope-from <martin.thomson@gmail.com>) id 1bN9lh-0007D0-Gp for ietf-http-wg@w3.org; Wed, 13 Jul 2016 02:19:25 +0000
Received: by mail-qk0-f179.google.com with SMTP id s63so31737844qkb.2 for <ietf-http-wg@w3.org>; Tue, 12 Jul 2016 19:19:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=qe8Ely7R593N1Ae0C3aQ0IOgTZDY7941mKJ1oE4+/MQ=; b=QqhJU4ZWzTNT9Wg0NbzHEIO1mWOHPCnCLwN7E+adl51TRh/bJf8cmUctQIvrgEBDBY 1METqZynF0nUVeg08oyv5qyuvgBHDsk2f/MskdD/crWGrXLPehLB1xmGDbEdJ0bqgKGT calO3MwJP6NUnUON0BymUj7VeduQJEYX4ZyZT+uozkxMZDMfkPerye/umOlWFl2aGvLP dZcGyLTuQUBr5mSuztznEPTmvzEgBUdpRXnfJ68GYQRzSFFkIHd4qf2oT+stnq9bQCJ6 3Qbpk592rL7cGlE9oL/ktwPvzVqqOLSGvSVLb23AZncjqzhWo9RIbEU89jU5AgrKsGKi ngcQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=qe8Ely7R593N1Ae0C3aQ0IOgTZDY7941mKJ1oE4+/MQ=; b=jFQCpkvqDcbWc48ieYEwvG9DGlzeINLrNU7sDveD1I4sVQ+uR0XLnLszYjaNtnI7VN DByLAOUAcECZrZcw+IhfU93o/CWCfExx2bI0JaOxylZDZGv9wyoxA2G6GToZlCPsI2zk VRavO95B2pu246R6cOahdmJDpKC+M2r96nkK9qNs7cNjFDPwl6YOgcAgFfqe51WFtBD8 BCbB/ub+KQJBWZrTixI35zZrfHE/PtrCXyi1JwiuzK6zDiojfv2ovAMIJo8GmaZUFgm4 uofooyKCkK7d/Qb4b9tRujI7taRXjBw/2Hy36tJUptCgyr9P165hE3xtrqYX28YaoeEx LAQw==
X-Gm-Message-State: ALyK8tLx3ZCKhlVHlZl9+AtKVwIefxkikRmZztH4fEZ/f6MjJNTW45A89IBQ+VpEdDV394H8+G7NyTvLe95Lyw==
X-Received: by 10.55.214.77 with SMTP id t74mr7439985qki.80.1468376335450; Tue, 12 Jul 2016 19:18:55 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.140.22.38 with HTTP; Tue, 12 Jul 2016 19:18:55 -0700 (PDT)
From: Martin Thomson <martin.thomson@gmail.com>
Date: Wed, 13 Jul 2016 12:18:55 +1000
Message-ID: <CABkgnnU3Vf5c6NjDXgUtYKRikhQcJZDdFhKUhRBr8nXC-+8XyA@mail.gmail.com>
To: HTTP Working Group <ietf-http-wg@w3.org>, Mark Nottingham <mnot@mnot.net>, Kazuho Oku <kazuhooku@gmail.com>
Content-Type: text/plain; charset="UTF-8"
Received-SPF: pass client-ip=209.85.220.179; envelope-from=martin.thomson@gmail.com; helo=mail-qk0-f179.google.com
X-W3C-Hub-Spam-Status: No, score=-7.9
X-W3C-Hub-Spam-Report: AWL=1.832, BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001, W3C_AA=-1, W3C_DB=-1, W3C_IRA=-1, W3C_IRR=-3, W3C_WL=-1
X-W3C-Scan-Sig: maggie.w3.org 1bN9lh-0007D0-Gp e2d399b42ed65108f400ebf59e3a60a5
X-Original-To: ietf-http-wg@w3.org
Subject: draft-ietf-httpbis-cache-digest-00 comments
Archived-At: <http://www.w3.org/mid/CABkgnnU3Vf5c6NjDXgUtYKRikhQcJZDdFhKUhRBr8nXC-+8XyA@mail.gmail.com>
Resent-From: ietf-http-wg@w3.org
X-Mailing-List: <ietf-http-wg@w3.org> archive/latest/31940
X-Loop: ietf-http-wg@w3.org
Resent-Sender: ietf-http-wg-request@w3.org
Precedence: list
List-Id: <ietf-http-wg.w3.org>
List-Help: <http://www.w3.org/Mail/>
List-Post: <mailto:ietf-http-wg@w3.org>
List-Unsubscribe: <mailto:ietf-http-wg-request@w3.org?subject=unsubscribe>

As I've said before, this is really interesting work, I'm very much
interested in seeing this progress.  However, I found a lot of issues
with the current draft.

The latest version seems to be a bit of a regression.  In particular,
the addition of all the flags makes it a lot more complicated, and I'm
already concerned about managing complexity here, especially since
this is an optimization.

The draft doesn't actually say where this frame should be sent - on a
stream that carries a request, or on stream 0.  This is important
because there are several mentions of origin.  For instance, the new
RESET flag talks about clearing digests for the "applicable origin".
That establishes a large point of confusion about the scope that a
digest is assumed to apply to; by their nature, this isn't necessarily
fatal, until you want to talk about RESET and COMPLETE.

To up-level a bit on this general issue, I'd like to see a better
formulated description of the information that clients and servers are
expected to maintain.  There seem to be multiple objects that are
stored, but I'm not sure what scope they are maintained in; is the
scope an origin?

Assuming a particular scope, are there two objects, or four?  That is,
is there could be four stores:

1. assumed fresh by URL
2. assumed fresh by URL and etag
3. assumed stale by URL
4. assumed stale by URL and etag

Or are 1+2 and 3+4 combined?  The definition of RESET implies that all
four stores are cleared.  The definition of COMPLETE implies that only
1+2 or 3+4 are affected.

The draft doesn't talk about URL normalization.  That is a risk to the
feasibility of this; fail to do something sensible here and you could
get a lot of spurious misses.  Given that it is just an optimization,
we don't need 100% agreement for this to work, but saying something is
probably wise.  We can probably get away with making some lame
recommendations about how to encode a URL.  Here's a rough cut of
something that didn't make the draft deadline this time around:
https://martinthomson.github.io/http-miser/#rfc.section.2.1

I don't see any value in COMPLETE.  Even if we accept that there is
only one connection from this client to this server, the benefit in
knowing that the digest is complete is marginal at best.  Is there
just one extra resource missing, or thousands.  As such, it changes
the probability by some unknown quantity, which isn't actionable.

Can a frame with the RESET flag include a digest as well?

N and P could fit into a single octet.  Since you are into the flags
on the frame anyway, reduce N and P to 4 bits apiece and use flags to
fill the upper bits as needed.  But I don't think that they will be
needed.  At the point that you have more than 2^16 entries in the
digest, you are probably not going to want to use this.  Even with a
tiny P=3 - which is too high a false positive probability to be useful
- with N=2^16 you still need 32K to send the digest.  You could safely
increase the floor for P before you might need or want higher bits
(and make the minimum higher than 2^0, which is probably too high a
false-positive probability in any case).

Is the calculation of N really round(log2(urls.length)).  I thought
that you would want to use ceil() instead.  Is the word "up" missing
from step 1 in Section 2.1.1?

The draft never actually mentions that it uses [Rice-]Golomb-coding
until the appendix.  Including a reference to the Rice paper would
help people implementing this understand where this comes from, as
well as leading them to being able to find the relevant research.
(nit: Spelling Golomb correctly would help here.)