Re: [Ppm] taskprov: In-band Task Provision Extension

Interesting! I think that cross-protocol attack can also be ruled out by
having an aggregator require that task IDs are _always_ generated from a
TaskConfig and never directly chosen by any task author. Further,
aggregators could provide an endpoint where they advertise parameters for
the tasks they have configured, much like they already advertise HPKE
configs. Then peer aggregators and even clients could check what parameters
an aggregator is using for some task ID before they consent to upload or
run aggregation jobs.

Tim

On Wed, Oct 25, 2023 at 3:40 PM David Cook <dcook@divviup.org> wrote:

> Tim,
> There is a cross-protocol attack on the parameter binding version of the
> strawman protocol. The client cannot know whether each aggregator derived
> its task ID from the task's parameters in TaskConfig form, or if it
> accepted a task ID verbatim and a different set of parameters from a
> dishonest task author, using a different task provisioning protocol. This
> can be addressed by adding a DAP extension to reports (with a payload of
> zero length) that serves as an indication that the corresponding task
> should have been set up with this particular provisioning protocol. If an
> aggregator that doesn't support the protocol receives such a report, it
> must refuse to process the report, since it contains an unsupported
> extension. If an aggregator that does support the provisioning protocol
> receives such a report, they should check that the task was set up with
> that provisioning protocol, or alternatively re-derive the task ID from the
> task's parameters and confirm that it matches. In effect, the presence of
> the report extension binds the report to the intent that the task ID should
> be bound to the task parameters.
>
> See https://github.com/wangshan/draft-wang-ppm-dap-taskprov/issues/39 for
> a similar conversation as applied to Taskprov. The same zero-length report
> extension fix has been applied to the taskprov editor's copy.
>
> Beyond this issue, the strawman provisioning flow makes sense. I think
> separating out the commitment to/transparency of task parameters from the
> actual provisioning of tasks is a useful distinction, and the commitment
> property could be useful to a broader set of deployment scenarios.
>
> --David
>
> On Wed, Oct 25, 2023 at 5:03 PM Tim Geoghegan <timgeog+ietf@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Thanks for sharing this updated draft. When a previous iteration of
>> taskprov was discussed on the list ([1]), I noted two drawbacks to the
>> design:
>>
>> 1. There is a significant amount of bandwidth wasted, because a
>> `TaskConfig` structure will be transmitted for every single report in the
>> task, although it's only acted upon once in each aggregator (the first time
>> the leader sees a report with a given `TaskConfig` and the first time the
>> helper sees an `AggregationJobInitReq` with the `TaskConfig`).
>>
>> 2. Aggregators are forced to allow data plane components to configure
>> themselves (and in the leader case, this may be based on _untrusted_
>> inputs). Generally, I believe it is a design error to put control plane
>> concerns into a system's data plane, as this ends up introducing awkward
>> constraints for implementations. For example, with taskprov, it's no longer
>> possible to run an aggregator that has a read-only view of what tasks it
>> runs because it might need to expand that list in response to data plane
>> traffic.
>>
>> (1) has gotten much better since November 2022: the `TaskConfig`
>> structure is no longer replicated as an extension in each report share in
>> an `AggregationJobInit`. But there's still waste there: a single task could
>> see many aggregation jobs over its lifetime, and the `TaskConfig` is
>> useless every time past the first. And of course there's still the
>> per-report waste of including `TaskConfig` in uploads. (2) remains a
>> problem in the current draft.
>>
>> Everything in engineering is a tradeoff, so these problems aren't
>> necessarily disqualifying. Maybe it's worth eating the bandwidth overhead
>> and security risk in exchange for taskprov's upsides. But it's been
>> difficult thus far to weigh the tradeoffs because taskprov's goals have not
>> been clearly articulated.
>>
>> Exactly what threats does taskprov mitigate? In what ways does taskprov
>> simplify the operation of DAP? Most importantly: can we achieve taskprov's
>> wins without incurring its downsides?
>>
>> I've been thinking about taskprov for a while now, have talked to its
>> authors, have reviewed its implementation in Janus ([2]) and have built and
>> deployed an alternate, OOB task provisioning scheme that has been working
>> quite well for us so far ([3], [4]) so I'm going to try to try to produce a
>> list of taskprov's goals as I understand them. I hope taskprov's editors
>> will let me know whether I have stated them correctly and completely.
>>
>> But first, to contextualize this discussion, I want to propose a strawman
>> for an out-of-band task provisioning system, so that we can compare
>> taskprov to it.
>>
>> # Strawman: naive OOB task provisioning
>>
>> Aggregators are required to expose an HTTP endpoint `PUT
>> {aggregator}/tasks/{task-id}`. The request of that body is taskprov's
>> `TaskConfig` message, with one extra field in it to indicate whether the
>> recipient is meant to act as the task's Leader or Helper. Upon receipt, the
>> aggregator provisions the task into itself and responds with 201 Created.
>> Subsequently DAP API requests may be made that reference the task ID.
>>
>> In this setting, creating a new task looks like:
>>
>> 1. Task author chooses task parameters and `task-id` and constructs a
>> `TaskConfig`.
>> 2. Task author does `PUT {leader}/tasks/{task-id}` to send `TaskConfig`
>> to the leader.
>> 3. Task author does `PUT {helper}/tasks/{task-id}` to send `TaskConfig`
>> to the helper.
>> 4. Having verified that the task is provisioned into both aggregators,
>> the task author distributes task ID and `TaskConfig` to many clients.
>> 5. (DAP protocol begins) Clients begin uploading reports by doing `PUT
>> {leader}/tasks/{task-id}/reports`.
>> 6. Aggregation and collection protocol continues.
>>
>> Now, let's use that baseline to consider taskprov's goals as I understand
>> them.
>>
>> # In-band task provisioning
>>
>> draft-wang-ppm-dap-taskprov-05's introduction states: "[taskprov's] key
>> feature is that task configuration is performed completely in-band, via
>> HTTP request headers." ([5]). And in prior messages to the list, taskprov's
>> editors have stressed that taskprov is in-band. But it's not explained why
>> in-band task provisioning is inherently desirable. Maybe it'd be
>> inappropriate for a standards document which ought to be concerned mainly
>> with "how" to discuss "why". But at least in this side discussion, could
>> the editors please clarify what's so bad about out-of-band mechanisms?
>> Especially since the introduction soon after states: "There is no need for
>> out-of-band task orchestration between Leader and Helpers, therefore making
>> adoption of DAP easier." ([6])
>>
>> ... and that is not true! taskprov itself tells us that "[t]he
>> Aggregators are presumed to have securely exchanged a pre-shared secret
>> out-of-band" ([7]; this is the `verify_key_init` used to derive VDAF
>> verification keys). Further, in any real deployment, I'd expect the leader
>> and aggregator to have to coordinate so that the leader can authenticate
>> the requests it makes to the helper. This would involve exchanging bearer
>> tokens, or deploying a PKI together, or something. So even with taskprov,
>> you absolutely need to coordinate between aggregators, and you'll need
>> _ongoing_ coordination to do things like rotate credentials and bill each
>> other.
>>
>> I suspect that what the introduction meant to say is that "there is no
>> need for out-of-band *per-*task orchestration between Leader and Helpers".
>> i.e., you coordinate once and then past that you can create however many
>> tasks you want via taskprov. Having conceded that, let's dig into what does
>> have to happen for each new task in taskprov. Assuming that
>> `verify_key_init` has already been distributed to the aggregators:
>>
>> 1. Task author chooses task parameters and constructs a `TaskConfig`.
>> 2. Task author distributes `TaskConfig` to many clients.
>> 3. (DAP protocol begins) Clients begin uploading reports by doing `PUT
>> {leader}/tasks/{task-id}/reports`; `TaskConfig` is in an HTTP header.
>> 4. Leader provisions itself with a task derived from the `TaskConfig`.
>> 5. Leader begins aggregation sub-protocol with helper; transmits
>> `TaskConfig` in an `AggregationJobInitReq`.
>> 6. Helper provisions itself with a task derived from the `TaskConfig`.
>> 7. Aggregation and collection protocol continues.
>>
>> The difference between this and the strawman OOB task provisioning is
>> that the task author is spared two HTTP requests to provision the task into
>> the aggregators. Why is that desirable? In taskprov, the task author has to
>> initiate the provisioning of each task anyway, so what difference do those
>> two requests make to the automation-friendliness of task provisioning?
>>
>> (n.b. This has been a long-winded restatement of Watson Ladd's succinct
>> question upthread but I hope laboriously spelling all this out makes the
>> comparison between taskprov and other approaches to task provisioning more
>> clear.)
>>
>> # Cryptographic binding to task parameters
>>
>> The idea is that since the task ID is derived from the task parameters,
>> then participation in a task is evidence that a client or aggregator has
>> seen the task parameters and consented to them. That's a desirable
>> property, but it's not clear to me that an in-band mechanism for task
>> provisioning is required to achieve this. To be clear, I don't think the
>> taskprov editors have ever claimed that it is, it's just that this binding
>> scheme has always been discussed in the context of the taskprov draft.
>>
>> Let's return to the OOB strawman task provisioning flow from above and
>> see if we can tack on parameter binding:
>>
>> 1. Task author chooses task parameters and constructs a `TaskConfig`.
>> 2. Task author does `PUT {leader}/tasks` to send `TaskConfig` to the
>> leader.
>> 3. Leader derives a task ID from `TaskConfig` and provisions that task
>> into itself.
>> 4. Task author does `PUT {helper}/tasks` to send `TaskConfig` to helper.
>> 5. Helper derives a task ID from `TaskConfig` and provisions that task
>> into itself.
>> 6. Having verified that the task is provisioned into both aggregators,
>> the task author distributes `TaskConfig` to many clients.
>> 7. Clients derive task ID from `TaskConfig` and begin uploading reports
>> by doing `PUT {leader}/tasks/{task-id}/reports`. No `TaskConfig` is
>> included in the upload.
>> 8. Aggregation and collection protocol continues.
>>
>> Now, just as in taskprov, we are assured that all participants are using
>> the same task parameters, because otherwise they wouldn't have converged on
>> the same task ID. And we can infer that all participants consent to those
>> task parameters because otherwise they wouldn't have uploaded, or sent an
>> aggregation job request (depending on which participant we're talking
>> about).
>>
>> In summary: it seems to me that if we take the parameter
>> commitment/transparency bits from taskprov and combine them with an OOB
>> task provisioning mechanism, we achieve all the wins of taskprov without
>> its downsides. What am I wrong about? Are there attacks I'm not seeing that
>> are mitigated by taskprov? Operational considerations that I have missed?
>>
>> Thanks,
>> Tim
>>
>> [1]:
>> https://mailarchive.ietf.org/arch/msg/ppm/4nceGeK6noA1a9eRR_xK6aQr_Ys/
>> [2]: https://github.com/divviup/janus
>> [3]: https://github.com/divviup/janus/issues/1486
>> [4]: https://github.com/divviup/janus/tree/main/aggregator_api
>> [5]:
>> https://datatracker.ietf.org/doc/html/draft-wang-ppm-dap-taskprov-05#section-1-2
>> [6]:
>> https://datatracker.ietf.org/doc/html/draft-wang-ppm-dap-taskprov-05#section-1-3
>> [7]:
>> https://datatracker.ietf.org/doc/html/draft-wang-ppm-dap-taskprov-05#section-3.2
>>
>> On Mon, Oct 23, 2023 at 4:36 PM Christopher Patton <cpatton=
>> 40cloudflare.com@dmarc.ietf.org> wrote:
>>
>>> HI Watson, that's not exactly the problem we're solving here. The main
>>> goal is an automated mechanism by which all parties can be configured
>>> in-band. There are many ways to do this, this draft being one of them.
>>>
>>> A nice feature of this draft is the manner in which the task ID is
>>> derived. Basically, agreement on the task parameters is enforced by the
>>> cryptography, not an out-of-band mechanism. See
>>> https://github.com/ietf-wg-ppm/draft-ietf-ppm-dap/issues/500 for
>>> additional discussion.
>>>
>>> Chris P.
>>>
>>> On Fri, Oct 20, 2023 at 1:34 PM Watson Ladd <watsonbladd@gmail.com>
>>> wrote:
>>>
>>>> Dear Shan,
>>>>
>>>> I read the draft and I don't understand why you need this. Why is it
>>>> hard to update the Collector and Leader with the tasks before handing
>>>> it out to the clients?
>>>>
>>>> Sincerely,
>>>> Watson
>>>>
>>>>
>>>>
>>>> --
>>>> Astra mortemque praestare gradatim
>>>>
>>>> --
>>>> Ppm mailing list
>>>> Ppm@ietf.org
>>>> https://www.ietf.org/mailman/listinfo/ppm
>>>>
>>> --
>>> Ppm mailing list
>>> Ppm@ietf.org
>>> https://www.ietf.org/mailman/listinfo/ppm
>>>
>> --
>> Ppm mailing list
>> Ppm@ietf.org
>> https://www.ietf.org/mailman/listinfo/ppm
>>
>