Re: [Ppm] Batch selection and use cases for DAP

Christopher Patton <cpatton@cloudflare.com> Fri, 15 July 2022 01:00 UTC

Return-Path: <cpatton@cloudflare.com>
X-Original-To: ppm@ietfa.amsl.com
Delivered-To: ppm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8D458C16ECF3 for <ppm@ietfa.amsl.com>; Thu, 14 Jul 2022 18:00:54 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.106
X-Spam-Level:
X-Spam-Status: No, score=-2.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=unavailable autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=cloudflare.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ray77iWolcHY for <ppm@ietfa.amsl.com>; Thu, 14 Jul 2022 18:00:50 -0700 (PDT)
Received: from mail-ej1-x62b.google.com (mail-ej1-x62b.google.com [IPv6:2a00:1450:4864:20::62b]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D7880C16ECFB for <ppm@ietf.org>; Thu, 14 Jul 2022 18:00:50 -0700 (PDT)
Received: by mail-ej1-x62b.google.com with SMTP id fy29so5246102ejc.12 for <ppm@ietf.org>; Thu, 14 Jul 2022 18:00:50 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudflare.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=TdP6F9ytUQVTQ1renTYXbxghMXyAO54TsD6kwtqXvDM=; b=EVeMpYOiMS65F4w9u6xOhZxRtPmPuwic0VxqV9BriWzcxCFY2OCmOoGnxWcwkAfv6c AIXBaXmPA78NEvqMVutcEvDej27NNSnJlkSa4Pp2xFulIjzUi42BFqA2YVFEPYuj9was sQ04bTWY5Fr2tOhL7RSGrlr+phHgSYinlMfe8=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=TdP6F9ytUQVTQ1renTYXbxghMXyAO54TsD6kwtqXvDM=; b=2f7FedGOcmZ8nG4pwHyxOSq0mk6KFSkRp3flk3KGfWueBkCh4xC4zIj8juuyiCG1dc fpTWMboaVneRuzMB94niBsb8/RVRHsECrCLpng36CirsLT5kXX6x3DVwGj7qrl+Le8on UWa7YagGgHtBC7cf+9nOn5dYTINaz8THt5HPqkBoa6PHnkdCzyAuHrw68zZdy5Y3h0nD UctlxCORGccKLnPG36hOJs/FaKDe/0Ngtg/NIkadvqCAWswSUQk46LxjuyEaaI9lJeP9 PPn6REU5+Y928Z8g6W1C8M+XP0jrMfKALz/cU4/aUSq5NshA1KT/pcCUNNeo5YJJEiXv qkXw==
X-Gm-Message-State: AJIora9mvFRtMwIlBiovwuuxmiLOBFIsmXZD3zXIs49WdZ27wCZyGPyc 0iaQ8/cA+CRSvY5UeWnhV4q0O9y9I9ALANr3srMvZDGqV3OoTQ==
X-Google-Smtp-Source: AGRyM1uz5RzJCo2Xc4aGMXxeO3piBg2QCghM2Q2liIev7X8NgpbPESnRAyV0e1uWtwJFq1Uqwbhvan0iemnAcZgE8M4=
X-Received: by 2002:a17:907:60d1:b0:726:b893:ea7e with SMTP id hv17-20020a17090760d100b00726b893ea7emr11261049ejc.165.1657846848945; Thu, 14 Jul 2022 18:00:48 -0700 (PDT)
MIME-Version: 1.0
References: <mailman.88.1656615603.44393.ppm@ietf.org> <4FAC07F7-85DA-498C-8A4A-880A08BB499C@apple.com>
In-Reply-To: <4FAC07F7-85DA-498C-8A4A-880A08BB499C@apple.com>
From: Christopher Patton <cpatton@cloudflare.com>
Date: Thu, 14 Jul 2022 18:00:38 -0700
Message-ID: <CAG2Zi227W9LhSbe3b0rm_bOe6MqyiJhpKT6ubC58ky_ub3QvJw@mail.gmail.com>
To: Shan Wang <shan_wang=40apple.com@dmarc.ietf.org>
Cc: ppm@ietf.org
Content-Type: multipart/alternative; boundary="000000000000863aff05e3cd8cb8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ppm/Mrb8o-F1YxMXESRS-GROBAYxW1A>
Subject: Re: [Ppm] Batch selection and use cases for DAP
X-BeenThere: ppm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Privacy Preserving Measurement technologies <ppm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ppm>, <mailto:ppm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ppm/>
List-Post: <mailto:ppm@ietf.org>
List-Help: <mailto:ppm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ppm>, <mailto:ppm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Jul 2022 01:00:54 -0000

Hi Shan (please forgive the late reply!),


> I definitely support adding first class support for different batch
> selector in the protocol, especially for the size based selector. I can see
> it benefits the following use cases:
>
> 1). Tasks that need aggregate result as soon as privacy guarantee is met,
> rather than waiting for a fixed interval to elapse.
> 2). Tasks that need similar sized batches (therefore similar
> signal-to-noise ratio) to compare results of different batches.
> 3). Tasks that need strong privacy guarantee based on batch size, for e.g.
> central differential privacy.
>
> With batch size based collection, aggregators just need to make sure each
> client report falls in only one batch, and only emit a batch when its size
> has met `min_batch_size`. This selection is easier to understand and
> arguably simpler to implement, also it doesn't subject to privacy
> concerns associated with interval slicing, as mentioned by issue195 (
> https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/issues/195)
>

Thanks for pointing to these use cases. And I agree, this seems somewhat
conceptually simpler than what we have today.


> I think Chris's straw man design is a good starting point. For
> `BatchSelector.fixed` type, collector creates a `batch_id` to identify a
> collection, so multiple request to the same `batch_id` return the same
> result (if `max_batch_lifetime` > 1).
>
> Upon receiving `CollectReq` of fixed type, leader does
> {{batch-parameter-validation}} but skipping any checks to do with interval,
> instead it checks if the same batch_id has been collected before. Then
> leader begins working with the helpers to prepare shares for this
> `batch_id` (or continues this process, depending on the VDAF).
>
> In {{aggregate-flow}}, instead of sorting each report share into interval
> based buckets, leader and helper can save them to `aggregate_share`
> identified by `aggregate_job_id`. The leader can send `batch_id` along with
> `aggregate_job_id` to helpers using `AggregateReq`, so helper can associate
> `batch_id` with multiple `aggregate_job_ids`. The same `batch_id` can be
> used to collect prepared `aggregate_shares` during {{collect-aggregate}}
> flow.
>
> Alternatively leader can associate `batch_id` with `aggregate_job_ids`
> internally, then send helpers the list of `aggregate_job_ids` for a
> `batch_id` during {{collect-aggregate}} flow. This way `AggregateReq` can
> remain unchanged.
>
> Also `batch_interval` is used as AAD in Aggregate Share Encryption (
> https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/blob/main/draft-ietf-ppm-dap.md#aggregate-share-encryption-aggregate-share-encrypt). For
> the `fixed` selector, `batch_id` or hash of `aggregate_job_ids` list should
> be used instead.
>

Yup, this all sounds reasonable to me. We can work out the details in a PR.

Chris P.