[Ppm] Batch selection and use cases for DAP

Christopher Patton <cpatton@cloudflare.com> Wed, 29 June 2022 23:22 UTC

MIME-Version: 1.0
From: Christopher Patton <cpatton@cloudflare.com>
Date: Wed, 29 Jun 2022 16:21:47 -0700
Message-ID: <CAG2Zi212sWmk3Piuu4Q0YE+wcqhgObx9F7r=SJV5d3Xqy8tFkQ@mail.gmail.com>
To: ppm <ppm@ietf.org>
Content-Type: multipart/alternative; boundary="000000000000697d1205e29e6b23"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ppm/NKPIIm5HvZ1p8EkS3tmwj6DBXDs>
Subject: [Ppm] Batch selection and use cases for DAP
Precedence: list

Hi all,

The current version of DAP prescribes a particular method of sorting
reports into batches for aggregation. There are a couple of GitHub issues
that describe use cases for which this method is not well-suited.
First-class support for these use cases would require protocol changes.
While considering if/how to change it, I think it would be helpful to take
a step back and ask ourselves if there are any additional use cases to
consider.

First, a quick recap of how batches are currently defined:
* Reports are generated by Clients and uploaded to the Leader. Each
`Report` has a timestamp.
* In its collect request (i.e., HTTP request with an `CollectReq`), the
Collector specifies a "batch interval", which defines the start and end
time for reports that will be aggregated.
* Upon receiving this request from the Collector, the Leader picks a batch
of reports with timestamps that fall in the batch interval.
* The leader and Helper aggregate the batch. (Aggregation manifests as a
sequence of `AggregateInitializeReq` / `AggregateContinueReq` flows,
followed by a single `AggregateShareReq` in order to get the Helper's
aggregate share for the entire batch.)

Observe that the batch itself is chosen by the Leader; the Collector merely
specifies criteria for reports that are "valid" for that batch -- namely,
that the report timestamp falls in the batch interval. Other criteria for
selecting batches are possible, as I'll explain below.

Use case #1
The current "batch selector" is well-suited for telemetry use cases where
DAP is used to aggregate long-running time-series data. (For those familiar
with Prometheus (https://prometheus.io), think of a dashboard you would
build to monitor how long it takes browsers to download and render a page
of your website.) However it has some limitations that make other use cases
much more difficult.

Use case #2
As EKR points out in issue183 (
https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/issues/183), it would
also be useful to be able to "filter" aggregate results based on metadata
associated with reports. For example, one might need to break down results
by user agent, geographical region, software version, etc. Supporting this
functionality requires a different "batch selector", one that also accounts
for additional dimensions along which batches can be sliced.

Use case #3
In issue273 (https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/issues/273),
Shan Wang points to an altogether different (and, arguably, much simpler)
batch selection criterion: Instead of sorting reports into batch intervals,
we may simply want to ensure that batches are pairwise disjoint. Moreover,
our application might require that each batch is exactly the same size (or
at least within some small threshold).

Today, DAP only has first-class support for #1. Use cases #2 and #3 can
kind of be implemented, but it would be painful. For my part, I would be in
favor of adding protocol mechanisms in order to provide first-class support
for additional use cases that are likely to be common. As a straw man,
consider the following revised `CollectReq`:

```
+ enum {
+   reserved(0),
+   interval(1), // For use case #1
+   interval-metadata(2), // Use case #2
+   fixed(3), // Use case #3
+ } BatchSelector;

struct {
  TaskID task_id;
- Interval batch_interval;
+ BatchSelector batch_selector;
+ select (batch_selector) {
+   case interval:
+      Interval batch_interval;
+   case interval-metadata:
+      Interval batch_interval;
+      opaque metadata<0..2^8-1>; // "User-Agent", etc.
+   case fixed:
+      uint64 batch_id;
+ };
  opaque agg_param<0..2^16-1>; // VDAF aggregation parameter
} CollectReq;
```

What this expresses is that the "batch interval" has been replaced by one
of several "batch selectors", each designed to support a different (set of)
use case(s). Each has some associated parameters used by the Leader to
guide report selection. For example, the `fixed` selector encodes the
"batch ID", as defined in issue273. It seems to me that something like this
could work. Things to consider:
* Both Aggregators need to be able to enforce that the batch meets the
criteria specified by the Collector.
* There are implications for storage requirements for the Aggregators.
* There is also issue issue195 (
https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/issues/195) in which
Chris Wood points out some privacy implications regarding the flexibility
afforded to the Collector in choosing the batch selection criteria. (This
issue needs to be addressed in any case.)

Anyway ... thoughts? Specifically:
(a) Is there a use case we're missing here?
(b) What do you think of making changes to the protocol in order to support
additional use cases?

Cheers,
Chris P.

[Ppm] Batch selection and use cases for DAP Christopher Patton
Re: [Ppm] Batch selection and use cases for DAP Shan Wang
Re: [Ppm] Batch selection and use cases for DAP Simon Friedberger
Re: [Ppm] Batch selection and use cases for DAP Tim Geoghegan
Re: [Ppm] Batch selection and use cases for DAP Shan Wang
Re: [Ppm] Batch selection and use cases for DAP Christopher Patton
Re: [Ppm] Batch selection and use cases for DAP Christopher Patton
Re: [Ppm] Batch selection and use cases for DAP Christopher Patton