Re: [Masque] Encrypting between client and proxy

Hi Ben,    (sorry for the delay, I was on vacation)

I like your idea. Since the first 16 bytes after the CID are already
encrypted, they are pseudo-random so the probability of 128-bit conflict is
virtually zero. Using the output of that (using the input would also work,
we'll need our cryptography enthusiasts to tell us which is best) to seed
AES-CTR allows encrypting the rest of the packet and the bits in the first
byte. This should give us the obfuscation properties we want with the
smallest possible performance impact.

David

On Mon, Jul 3, 2023 at 12:15 PM Ben Schwartz <bemasc@meta.com> wrote:

> If the goal is only to withstand a purely passive attacker, then a single
> pass is sufficient.  For example, we can (re-)encrypt the first 16 bytes
> using a block cipher (AES), and also use those 16 output bytes to
> initialize a stream cipher (AES-CTR) for the remainder*.  Single-pass (i.e.
> bounded latency) encryption can also be done using only stream ciphers (though
> less efficiently)**.
>
> If the goal is to withstand an active attacker (ignoring statistical and
> timing attacks), then I believe a Pseudorandom Permutation is inevitably
> required.  However, an active attacker has many other powerful attacks,
> such as simply reissuing an observed input packet many times and checking
> which output packet is similarly duplicated.  For this reason, I think our
> best option is to exclude active attackers from the threat model.
>
> --Ben Schwartz
>
> * With some fiddliness around re-encryption of certain header bits.
>
> ** For payload lengths >32, this is straightforward: use the first block
> (A) to initialize a stream cipher over the remainder, and use the first
> block of this output (B) to initialize a stream cipher that re-encrypts A
> (resulting in 32 bytes of latency).  For shorter lengths, the input can be
> rotated by (length-16) bytes, repeating until everything is re-encrypted.
> For length=20 (worst case?), this requires 5 AES-CTR cycles.  A streaming
> implementation (presumably in hardware) can thus operate on large packets
> with O(1) delay.
> ------------------------------
> *From:* David Schinazi <dschinazi.ietf@gmail.com>
> *Sent:* Thursday, June 29, 2023 8:08 PM
> *To:* masque@ietf.org <masque@ietf.org>
> *Cc:* Martin Thomson <mt@lowentropy.net>; Eric Rosenberg <
> eric_rosenberg@apple.com>; Alex Chernyakhovsky <achernya@google.com>;
> Benjamin Schwartz <ietf@bemasc.net>; Tommy Pauly <tpauly@apple.com>
> *Subject:* Re: [Masque] Encrypting between client and proxy
>
> Resurrecting this thread as we prepare to discuss this at IETF 117. I was
> thinking about MT's proposal using AES-CTR above and I think we can make
> this work. The main concern was for short packets but I'm not too worried,
> I have a proposal
> ZjQcmQRYFpfptBannerStart
> This Message Is From an External Sender
>
> ZjQcmQRYFpfptBannerEnd
> Resurrecting this thread as we prepare to discuss this at IETF 117.
>
> I was thinking about MT's proposal using AES-CTR above and I think we can
> make this work. The main concern was for short packets but I'm not too
> worried, I have a proposal below that could resolve it. To recap the issue,
> we need to run two passes of a header-protection-like algorithm that both
> need a sample that can be fed to AES-CTR as IV. In an ideal world, AES-CTR
> expects a 16-byte IV, which is the sample size we use for header
> protection. We can however use a smaller sample - the risk is that this
> increases the risk of collision, and if there is a collision then an
> attacker can XOR two ciphertexts to get the XOR of two plaintexts.
>
> The important first point is that this attack isn't horrible: our
> plaintexts are encrypted with regular encryption so learning the XOR of two
> plaintexts provides limited information. Second, for the attacker to pull
> off this attack, they need to know that there is a collision. We can use
> our two-pass design to our advantage here. Let's say we do the weaker
> sample on the inside and the stronger one on the outside. In other words,
> we reduce the size of the first sample to min(16,
> protected_payload.length() - 16). We then only use the first sample to
> encrypt the last 16 bytes of the packet. Those bytes then become our sample
> to encrypt everything else. That way, if there's a collision on the first
> sample, an attacker won't notice it because the first sample will be itself
> encrypted.
>
> Regarding the keying material, I'd suggest generating two new keys k1 and
> k2 per connection and then using them regardless of which CID is in use,
> and just place those keys in the CID entry next to the VCID.
>
> When we discussed this at 116 we thought it might be worth forming a
> design team on this topic. I propose we ask this question formally at 117
> if the adoption call looks positive.
>
> David
>
> On Wed, Nov 16, 2022 at 11:16 AM Eric Rosenberg <eric_rosenberg=
> 40apple.com@dmarc.ietf.org> wrote:
>
> The two main advantages from forwarding mode are avoiding cumulative MTU
> loss and making it easy for the proxy to skip parts of the send and receive
> path. If we can employ per-hop encryption for forwarded mode packets in a
> way that doesn’t suffer from cumulative MTU loss, that sounds like a pretty
> compelling way to solve the “I want to chain many proxies” problem while
> removing the “observers on both sides of the proxy” caveat that’s included
> in the current draft’s security considerations.
>
> Assuming encryption can be done in a way that doesn’t incur cumulative MTU
> loss, it’s worth considering how it affects the efficiency of
> implementations. Taking our current implementation as an example, we’ve
> opted to perform all forwarding at Linux’s XDP hook with an eBPF program.
> This allows us to run a custom program for each packet we receive. In this
> program, we only actually need to process up to the QUIC packet header in
> order to perform a rewrite and immediately send the packet out. How this
> program is actually executed is dependent on the network card and driver,
> but it can avoid copies, be executed before the Linux networking stack, and
> even be completely offloaded to the NIC. The numbers presented were not
> from full NIC offload, but rather “Native XDP” where the program runs on
> the main CPU, but may take advantage of direct memory access, avoid copies,
> etc.). Continuing with our implementation as an example, introducing
> encryption would likely require APIs (bpf helper functions) that aren’t
> available from mainline Linux kernels. We would likely have to patch our
> kernel to expose cryptographic operations to eBPF programs or we’d have to
> bring our forwarding implementation into userspace - both of which seem
> possible, but not something we’ve explored and it’s hard to say what the
> impact on performance/efficiency would be without testing. When considering
> full NIC offload, I wonder if network cards are less likely to expose
> cryptographic operations than they are to expose the very basic packet
> inspection and byte replacement for header rewriting required by the
> non-encrypting approach.
>
> Thanks,
> Eric
>
> On Nov 16, 2022, at 2:18 AM, Alex Chernyakhovsky <
> achernya=40google.com@dmarc.ietf.org> wrote:
>
>
>
> On Tue, Nov 15, 2022 at 8:57 PM Ben Schwartz <bemasc=
> 40google.com@dmarc.ietf.org> wrote:
>
> On Tue, Nov 15, 2022 at 6:44 PM Martin Thomson <mt@lowentropy.net> wrote:
> ...
>
> The protected payload is high entropy, being comprised of ciphertext and
> is anywhere from 20 to 65506 bytes.  This payload can be split into two,
> with a 12 byte sample being used as a nonce for AES-CTR.  The remainder of
> the payload, plus part of the first byte, could then be protecting using
> AES-CTR.  The remaining 8+ bytes could again be sampled again as a nonce to
> protect the first sample.
>
>
> The thing you are looking for is called a Pseudo-Random Permutation [1].
> I would encourage you to use HCTR2 [2].  (I learned a bit about these while
> working on [3].)
>
> [1] https://en.wikipedia.org/wiki/Pseudorandom_permutation
> [2] https://github.com/google/hctr2
> [3] https://datatracker.ietf.org/doc/draft-cpbs-pseudorandom-ctls/
>
> ...
>
> [1] A few people have noted that timing tends to be a dead giveaway here,
> as does packet size.  I have some ideas about how that might be managed,
> but let's start with the basics.
>
> I think enough of our threat model has to consider side-channels like
> packet size as out of scope (although I believe we do have the opportunity
> to add chaff to less-than-MTU packets); we can certainly do something about
> timing analysis. I would expect most interesting deployments of such a
> proxy to be very high throughput which would mean some amount of artificial
> queuing and mixing would go a long way towards increasing the search space
> for correlation.
>
>
> Personally, I think this is probably fatal, so the encryption is not
> really buying you anything.   However, I would still support adding
> encryption here, at least optionally, for the sake of making the attacker's
> job a bit harder.  It's not the easiest thing to specify in a threat model,
> but there is a real difference between "instant undeniable confirmation"
> and "strong statistical inference" for an attacker.
>
> I am similarly in favor of trying to protect the packet further, but my
> understanding from the presentation at the session was that a lot of the
> CPU wins came from NIC offloads. If we end up hitting the host CPU to do
> the decryption, we should compare the implementation to the full
> CONNECT-UDP tunneling mode and see if it's worthwhile.
>
>
> ...
> --
> Masque mailing list
> Masque@ietf.org
> https://www.ietf.org/mailman/listinfo/masque
>
> --
> Masque mailing list
> Masque@ietf.org
> https://www.ietf.org/mailman/listinfo/masque
>
>
> --
> Masque mailing list
> Masque@ietf.org
> https://www.ietf.org/mailman/listinfo/masque
>
>