Re: [Ohai] DoubleCheck and CONNECT-UDP

Matthew Finkel <sysrqb@apple.com> Thu, 28 July 2022 22:51 UTC

From: Matthew Finkel <sysrqb@apple.com>
Message-id: <7988FB58-4909-40FE-9C7A-7BDE0C9DE554@apple.com>
Content-type: multipart/alternative; boundary="Apple-Mail=_9B48A5EC-76C5-49B0-B012-685315D7EF06"
MIME-version: 1.0 (Mac OS X Mail 16.0 \(3729.0.22.1.1\))
Date: Thu, 28 Jul 2022 18:51:28 -0400
In-reply-to: <CAHbrMsA-EAU9aTce8piHhx2CCyH7Y5BYPxyXstr2ZKGt_MSBWQ@mail.gmail.com>
Cc: Matthew Finkel <sysrqb=40apple.com@dmarc.ietf.org>, ohai@ietf.org
To: Ben Schwartz <bemasc=40google.com@dmarc.ietf.org>
References: <20220727040122.bpumth5yamgmlgbb@localhost> <CAPJ=pvS3sOkttASUjcLvXBXxky=Ob903qb92c-jQii0cEKRwoA@mail.gmail.com> <CAPJ=pvTsEj6zX2Tk92jTV88mZpskTL-iHhc+6uq1CEPCPUG+iQ@mail.gmail.com> <CAPJ=pvSfCc3X4+nqwbcuNohO-Z2ZBsf6MGHAyoJjDxcpDqaxdQ@mail.gmail.com> <EF9174FA-7D60-4F80-9C5F-605E2BD70FB4@apple.com> <CAHbrMsA-EAU9aTce8piHhx2CCyH7Y5BYPxyXstr2ZKGt_MSBWQ@mail.gmail.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/ohai/VQUzh6I55y10QhnKFYEfzvUQ_VY>
Subject: Re: [Ohai] DoubleCheck and CONNECT-UDP
Precedence: list


> On Jul 27, 2022, at 7:35 PM, Ben Schwartz <bemasc=40google.com@dmarc.ietf.org> wrote:
> 
> On Wed, Jul 27, 2022 at 3:42 PM Matthew Finkel <sysrqb=40apple.com@dmarc.ietf.org <mailto:40apple.com@dmarc.ietf.org>> wrote:
>> On Tue, Jul 26, 2022 at 10:57:33PM -0400, Ben Schwartz wrote:
>> > During my presentation today, I suggested that the DoubleCheck procedure might not need CONNECT-UDP if it's acceptable for the Service Description Host (SDH), which is presumed to be affiliated with the gateway, to learn the pool of client IPs.  On second thought, I believe the CONNECT-UDP tunnel is indeed necessary.
>> >
>> > Without the CONNECT-UDP tunnel, clients would authenticate the KeyConfig by fetching it directly from the SDH.   In this arrangement, it is often possible for the gateway to identify the precise client IP that issued a particular OHTTP request, due to timing correlation.  If the rate of queries to this gateway (through this relay) is relatively low (<1 QPS), correlation is trivial.  The gateway can constantly rotate its KeyConfig, forcing clients to reauthenticate it before each OHTTP request.
>> >
>> > At higher query rates, other tricks are possible, such as rotating the KeyConfig and then slow-walking the authentication responses.  By sending authentication responses to each client "one at a time", the SDH and Gateway can observe the OHTTP request that follows each authentication response.
>> >
>> > In other words, OHTTP requests are linkable (via timing) to the corresponding KeyConfig authentication requests, so these authentication requests must not be linkable to anything else.  This means authentication requests from all clients on the relay must be indistinguishable, so they must all go through a single proxy.
>> 
>> I agree, thanks for thinking through this attack and writing a
>> concrete proposal for achieving key consistency and correctness. We
>> should probably update the referenced informational KCCS document with
>> some of these insights, as well. I also appreciate how this design
>> leverages multiple discovery methods.
>> 
>> As for feedback on the draft:
>> 1) Is there a specific reason why this depends on the Access Service
>> Descriptions? I see why they are useful, but they seem more like a
>> convenience than a required dependency, is this true?
> 
> I would describe it as an important convenience.
> 
> This draft uses Access Service Descriptions in two ways:
> 
> 1. To describe the Target service.  This is important because we need to apply our consistency rules to three separate pieces of information:
> - The Gateway URL
> - The KeyConfig
> - The Target service (however it is described)
> 
> If these aren't described by a single resource, we need to apply the DoubleCheck procedure independently to each resource.  I think this is probably possible to do safely, but it seems inefficient.  However, I have heard arguments for splitting up this information (e.g. adding a layer of indirection between the Access Description and the KeyConfig), and that may be an OK solution.
> 
> 2. To describe the Relay.  This is important because the client needs to be configured with the Relay/cache, an affiliated transport proxy (e.g. CONNECT-UDP), and ideally an affiliated DoH server.  I think we can reasonably ask users to paste one magic URL into a settings field somewhere, but I don't think we can ask them to copy-paste three magic URLs.

Yes, that’s a reasonable use case, however as a result of that, this proposal accomplishes two goals and combines them: “(gateway) key discovery and consistency” and “relay service discovery” - thus, potentially adding unnecessary overhead if the client has sufficient hard-coded (or out-of-band) relay information and only needs to learn the gateway information. This may be acceptable in some implementations.

> 
>> 2) The concern described at the end of the introduction around not
>> using the same update mechanism for key rotation as the one used for
>> configuring the "Service Description" seems like a bit of a stretch.
>> First, key rotation should not be occurring at such a high frequency
>> that a slower (software release process) update mechanism is
>> inadequate.
> 
> I'm not sure I agree with this.  Would you rely on the Android OS update system to rotate a compromised key?  Are you sure your Windows updates take effect without a reboot?
> 
> I agree that the example is artificial.  Perhaps a more compelling example would be a public ODoH server, represented by a string (e.g. a URL) copy-pasted from the public documentation into a settings panel.  The string comes from a trusted source, but there is no live channel to update it, so it can't contain the KeyConfig.

Obviously I agree we should aim at minimizing the time between key rotation and when a client begins using the new key, however ~monthly security updates may be sufficient in some cases (depending on timing), and some software systems have processes for faster emergency releases. These are probably way more complicated and slower than updating a single resource on a web site, but software updates offer a simpler and stronger mechanism for key rotation and consistency (if you trust the update process won’t target specific users). However, with that being said, I agree this proposal is reasonable for users who want to configure (additional) services that aren’t already provided by the software.


>  
>> Second, as mentioned in the KCCS document, we recommend
>> specifying a "minimum validity period" for these keys such that a
>> service can't easily partition users based on which key they know/use.
>> While we didn't recommend any specific minimum period length, I would
>> certainly recommend a lower-bound on the order of days (with typical
>> rotation on the order of weeks, at a minimum). Providing more
>> flexibility and agility opens more avenues for abuse, but I am
>> interested in hearing any trade-offs you thought about for this, too.
> 
> Counterintuitively (at least to me), I believe DoubleCheck guarantees authenticity and consistency regardless of validity period.  However, extremely short validity periods might generate excessive revalidation overhead.

Yes, that’s a fair point, I agree with that.

> 
>> 3) Not directly feedback on this draft, but most obviously, the
>> consistency guarantees are only limited to users of the same relay (I
>> believe you mentioned this, but I can't find it now), so we may be
>> missing "global" consistency (if the service is used via
>> multiple/arbitrary relays). As discussed in other contexts, "global
>> consistency" really reduces to ensuring each client request originates
>> from a sufficiently large client anonymity set. Defining "sufficiently
>> large" is likely context and application-specific, but it should be
>> explicitly considered in deployments. This is related to "precondition
>> (1)", but it is not the same. Do you have thoughts you can share on
>> this?
> 
> I've been assuming that each client only accesses the service via one relay (at least for an extended period of time).  If that's true, it follows that the user's anonymity set is already reduced to the number of users on the relay, and our challenge is to avoid reducing it further.

Yes, but I find this difficult to reason about because it does depend heavily on the deployment scenario. I do believe this proposal is useful in the proposed use case, so I’m not trying to debate that.

> 
> Spreading requests across multiple distinct relays is an intriguing idea, but I haven't given it much thought.

I suppose to some extent you already touched on this with:

       If the cache must
   be partitioned for architectural or performance reasons, operators
   SHOULD keep the number of users in each partition as large as
   possible.

But, that’s an interesting problem that doesn’t necessarily reduce down to the single-relay case depending on how users are distributed.

> 
>> 4) A possible TODO: Regarding the response by a Service when the
>> Service Description changes but a request's If-Match header identifies
>> a previous version of the resource: Should the Service cache old
>> versions for a certain time? Maybe something like an additional
>> max-age length? The draft doesn't seem to specify when the Service
>> should stop returning a success response for old resources.
> 
> The draft says
> 
>    If the Service Description changes, and the resource receives a
>    request whose "If-Match" header identifies a previously served
>    version that has not yet expired, it MUST return a success response
>    containing the previous version.
> 
> So basically it's just the max-age.

Ah. I see now. I did read that sentence, but I guess the-“has not yet expired” did not register - thanks.

> 
>> 5) A possible TODO: What should be the client's behavior if the
>> response from the relay is rejected or if establishing the CONNECT-UDP
>> tunnel fails? Fail-open? Fail-closed? Attempt check via alternative
>> relay and/or choose different service? This could be
>> application-specific logic, but it could be worth describing
>> trade-offs of different behaviors in this situation.
> 
> Agreed!  I haven't thought much about what to do in this situation.
>  
>> In particular,
>> the relay is in a position to maliciously perform a denial of service
>> during this check and it could force the client to choose a different
>> service (possibly one that will collude with the relay).
> 
> Interesting.
> 
>> On a more nitty-note, I'm not sure how much you like the "DoubleCheck"
>> name, but as another option we've referred to similar behavior as
>> "multi-path verification".
> 
> I'm happy to adjust the name, but I do think we should try to make it clear that this is a specific interoperable protocol, not just a strategy. 
> 

I agree.

> "Multi-path validation" attempts to solve more or less the same problem, but it relies on different assumptions and achieves a different consistency guarantee.

Yes, that is a good clarification/correction - the fact that one fetch is from a cache and the other is from the service is important. No need to bike shed the name :)

[Ohai] DoubleCheck and CONNECT-UDP Ben Schwartz
Re: [Ohai] DoubleCheck and CONNECT-UDP Tommy Pauly
Re: [Ohai] DoubleCheck and CONNECT-UDP Ben Schwartz
Re: [Ohai] DoubleCheck and CONNECT-UDP Matthew Finkel
Re: [Ohai] DoubleCheck and CONNECT-UDP Tommy Pauly
Re: [Ohai] DoubleCheck and CONNECT-UDP Ben Schwartz
Re: [Ohai] DoubleCheck and CONNECT-UDP Ben Schwartz
Re: [Ohai] DoubleCheck and CONNECT-UDP Tommy Pauly
Re: [Ohai] DoubleCheck and CONNECT-UDP Matthew Finkel
Re: [Ohai] DoubleCheck and CONNECT-UDP Christopher Wood
Re: [Ohai] DoubleCheck and CONNECT-UDP Eric Orth
Re: [Ohai] DoubleCheck and CONNECT-UDP Christopher Wood
Re: [Ohai] DoubleCheck and CONNECT-UDP Ben Schwartz
Re: [Ohai] DoubleCheck and CONNECT-UDP Christopher Wood