Re: [Ppm] Batch selection and use cases for DAP

Shan Wang <shan_wang@apple.com> Tue, 12 July 2022 23:58 UTC

Return-Path: <shan_wang@apple.com>
X-Original-To: ppm@ietfa.amsl.com
Delivered-To: ppm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 02BD6C14F742 for <ppm@ietfa.amsl.com>; Tue, 12 Jul 2022 16:58:11 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.686
X-Spam-Level:
X-Spam-Status: No, score=-2.686 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.582, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=apple.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id qnYEG4lkmRSI for <ppm@ietfa.amsl.com>; Tue, 12 Jul 2022 16:58:07 -0700 (PDT)
Received: from rn-mailsvcp-ppex-lapp34.apple.com (rn-mailsvcp-ppex-lapp34.rno.apple.com [17.179.253.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 39D53C157B53 for <ppm@ietf.org>; Tue, 12 Jul 2022 16:58:04 -0700 (PDT)
Received: from pps.filterd (rn-mailsvcp-ppex-lapp34.rno.apple.com [127.0.0.1]) by rn-mailsvcp-ppex-lapp34.rno.apple.com (8.16.1.2/8.16.1.2) with SMTP id 26CNsbWF017107 for <ppm@ietf.org>; Tue, 12 Jul 2022 16:58:03 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apple.com; h=from : content-type : mime-version : subject : date : references : to : in-reply-to : message-id; s=20180706; bh=Ue6HtqyUE5g7+x/NodYnvpyVJYg+pAPAqs7/FrxivbA=; b=k85waZc+UQHEE7VOf52oYYrJb1T2AwiztIRU71zXflXVUM7bd5Lss/InWGwwc9THIqpW K2YMSpREE32A2ZciYJMZR7Kfn6Xic3n9xgy2FCmRq5BH2aDr0eyuYpaSpV3w+EXdPsQs Yld6kizC+MrjjFfwP8EMlytMF1JMRynRHl4q0LS6SV+wd04soRMgBjgSa/wrAvnG48hf WaLhRhOc/aZZbgpukPNe1StmS7atE99zFyJJhSC+xKjly5weQZCqqYBWazm/oL7hHQx6 4myQOo19X0vcSUAQdPiVijv7Eq1cZ951Y5NurhAG5VKZtuG6RleUomGwOM+MAvCzKMz/ /g==
Received: from crk-mailsvcp-mta-lapp01.euro.apple.com (crk-mailsvcp-mta-lapp01.euro.apple.com [17.66.55.13]) by rn-mailsvcp-ppex-lapp34.rno.apple.com with ESMTP id 3h75h1m5jq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NO) for <ppm@ietf.org>; Tue, 12 Jul 2022 16:58:02 -0700
Received: from crk-mailsvcp-mmp-lapp03.euro.apple.com (crk-mailsvcp-mmp-lapp03.euro.apple.com [17.72.136.17]) by crk-mailsvcp-mta-lapp01.euro.apple.com (Oracle Communications Messaging Server 8.1.0.18.20220407 64bit (built Apr 7 2022)) with ESMTPS id <0REX0082DMKPSG00@crk-mailsvcp-mta-lapp01.euro.apple.com> for ppm@ietf.org; Wed, 13 Jul 2022 00:58:01 +0100 (IST)
Received: from process_milters-daemon.crk-mailsvcp-mmp-lapp03.euro.apple.com by crk-mailsvcp-mmp-lapp03.euro.apple.com (Oracle Communications Messaging Server 8.1.0.18.20220407 64bit (built Apr 7 2022)) id <0REX00800LZ8WG00@crk-mailsvcp-mmp-lapp03.euro.apple.com> for ppm@ietf.org; Wed, 13 Jul 2022 00:58:01 +0100 (IST)
X-Va-A:
X-Va-T-CD: 15d91a843370bb16a07b18b1a805455a
X-Va-E-CD: 81e01362b7c6686d180e57a53d2ec502
X-Va-R-CD: 5739430f3b8110afd18ee0823f2f97a7
X-Va-CD: 0
X-Va-ID: 177aeab4-bd0d-4505-9885-92ff3cb4946a
X-V-A:
X-V-T-CD: 15d91a843370bb16a07b18b1a805455a
X-V-E-CD: 81e01362b7c6686d180e57a53d2ec502
X-V-R-CD: 5739430f3b8110afd18ee0823f2f97a7
X-V-CD: 0
X-V-ID: ca8f646d-3b7e-4612-bb5c-6c18b873bd09
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517, 18.0.883 definitions=2022-07-12_12:2022-07-12, 2022-07-12 signatures=0
Received: from smtpclient.apple ([17.232.76.80]) by crk-mailsvcp-mmp-lapp03.euro.apple.com (Oracle Communications Messaging Server 8.1.0.18.20220407 64bit (built Apr 7 2022)) with ESMTPSA id <0REX00A5AMKO7I00@crk-mailsvcp-mmp-lapp03.euro.apple.com> for ppm@ietf.org; Wed, 13 Jul 2022 00:58:01 +0100 (IST)
From: Shan Wang <shan_wang@apple.com>
Content-type: multipart/alternative; boundary="Apple-Mail=_D2ACF523-917D-4B4C-9548-698D0726E389"
MIME-version: 1.0 (Mac OS X Mail 15.0 \(3693.20.0.1.32\))
Date: Wed, 13 Jul 2022 00:57:59 +0100
References: <mailman.5473.1657038747.982.ppm@ietf.org>
To: ppm@ietf.org
In-reply-to: <mailman.5473.1657038747.982.ppm@ietf.org>
Message-id: <19B4EE55-A63D-4E25-8235-1AC63BA57F15@apple.com>
X-Mailer: Apple Mail (2.3693.20.0.1.32)
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.517, 18.0.883 definitions=2022-07-12_12:2022-07-12, 2022-07-12 signatures=0
Archived-At: <https://mailarchive.ietf.org/arch/msg/ppm/6gb0soUdqlBGWhjz8gklBjUn1H8>
Subject: Re: [Ppm] Batch selection and use cases for DAP
X-BeenThere: ppm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: Privacy Preserving Measurement technologies <ppm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ppm>, <mailto:ppm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ppm/>
List-Post: <mailto:ppm@ietf.org>
List-Help: <mailto:ppm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ppm>, <mailto:ppm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 12 Jul 2022 23:58:11 -0000

> With that in mind, I wonder if the existing `AggregateShareReq.batch_interval` sent from leader to helper suffices for #3 (this is an alternative to Shan Wang's idea of surfacing the aggregation job ID <=> batch ID mapping in the protocol messages). We can impose a total ordering on reports, so once the leader has identified a set of reports that satisfy the desired batch size, it should be able to devise an interval that captures at least (or I think exactly) those reports and send that to the helper. My thinking is that we can minimize the protocol text and code needed for the helper to support these different modes of aggregations.


There are a few problems with this: 1). The interval that leader devised must also meet requirement with min_batch_duration, so the devised interval cannot end at the moment `min_batch_size` is met, it has to have an ending boundary that aligns with `min_batch_duration`. This extra duration becomes a dead zone, any reports coming in that duration will be dropped on the floor. 2). It complicates the collect logic, since collector no longer knows what interval to include in `CollectReq`. Therefore, I think it's better to surface this requirement on the protocol level.

I agree leader shouldn't send helper a long list of report IDs, and in theory leader can include just one report in each aggregation job, making the aggregation job ID list very long too. But this perhaps should be guarded by a "SHOULD" in protocol text? 

For interval-metadata, one concern I have is that the possibility of metadata is pretty much endless, to guarantee any form of privacy on any of the chosen query metadata combinations will be extremely hard. I think designing separate tasks or encoding metadata in the measurement are better ways to provide such utility.

> 
> From: Tim Geoghegan <timg@letsencrypt.org>
> Subject: Re: [Ppm] Batch selection and use cases for DAP
> Date: 5 July 2022 at 17:32:09 BST
> To: Simon Friedberger <simon@mozilla.com>
> Cc: Christopher Patton <cpatton=40cloudflare.com@dmarc.ietf.org>, ppm <ppm@ietf.org>
> 
> 
> I have one hair to split with Chris' summary of the current state of play:
> 
> > Upon receiving this request from the Collector, the Leader picks a batch of reports with timestamps that fall in the batch interval.
> 
> The important nuance is that the leader and helper independently figure out what set of reports fall within the collector's chosen batch interval. Obviously, the helper's view of what reports exist is determined by what reports the leader forwards to it (that is, until we move to a split upload model[1]), but the leader doesn't send the helper a list of report IDs (I believe this is an important property because that list would be quite large). I bring this up so that everyone remembers that the protocol must guarantee that leader and helper agree on the set of reports included in an aggregation.
> 
> With that in mind, I wonder if the existing `AggregateShareReq.batch_interval` sent from leader to helper suffices for #3 (this is an alternative to Shan Wang's idea of surfacing the aggregation job ID <=> batch ID mapping in the protocol messages). We can impose a total ordering on reports, so once the leader has identified a set of reports that satisfy the desired batch size, it should be able to devise an interval that captures at least (or I think exactly) those reports and send that to the helper. My thinking is that we can minimize the protocol text and code needed for the helper to support these different modes of aggregations.
> 
> I have two thoughts on use case #2: First, if we want the leader to be able to select reports based on metadata provided by the collector, then I think the `Report`[2] message needs to include some metadata that can be matched against `CollectReq.interval-metadata.metadata`. Would we then also need DAP to define some kind of query language, so that a collector could express something like "aggregate over the reports where `report.foo = bar && 0 <= report.qux <= 100`"?
> 
> Second, I worry that this means aggregators can't begin accumulating prepared output shares into aggregates until they get the `CollectReq.interval-metadata.metadata` value. If a deployment knows it wants to break out aggregations by something like the client's country, could it not define distinct tasks for each value? I'm perfectly willing to be convinced that this is not practical, but I feel it's important to consider doing nothing, and also I think it'd be nice to have potential deployers of DAP spell out more explicitly what kinds of aggregation use cases they have.
> 
> Thanks,
> Tim
> 
> [1] https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/issues/130 <https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/issues/130>
> [2] https://datatracker.ietf.org/doc/html/draft-ietf-ppm-dap#section-4.2.2 <https://datatracker.ietf.org/doc/html/draft-ietf-ppm-dap#section-4.2.2>
>