Re: [kitten] Comments on draft-ietf-kitten-password-storage-04

steve@tobtu.com Fri, 02 April 2021 20:17 UTC

Return-Path: <steve@tobtu.com>
X-Original-To: kitten@ietfa.amsl.com
Delivered-To: kitten@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1B5193A2081 for <kitten@ietfa.amsl.com>; Fri, 2 Apr 2021 13:17:02 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_NONE=0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id c7zEVpVXEoFL for <kitten@ietfa.amsl.com>; Fri, 2 Apr 2021 13:16:57 -0700 (PDT)
Received: from mout.perfora.net (mout.perfora.net [74.208.4.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 4C6E13A2213 for <kitten@ietf.org>; Fri, 2 Apr 2021 13:16:57 -0700 (PDT)
Received: from oxusgaltgw13.schlund.de ([10.72.72.59]) by mrelay.perfora.net (mreueus002 [74.208.5.2]) with ESMTPSA (Nemesis) id 0LoFXL-1m83Qf1grZ-00gLDA for <kitten@ietf.org>; Fri, 02 Apr 2021 22:16:56 +0200
Date: Fri, 2 Apr 2021 15:16:55 -0500 (CDT)
From: steve@tobtu.com
To: KITTEN Working Group <kitten@ietf.org>
Message-ID: <1510312202.215184.1617394615886@email.ionos.com>
In-Reply-To: <37ae1f6c-2c39-4a76-995c-642a91131553@www.fastmail.com>
References: <E4D53992-EFFD-4938-8427-D276B5A0A178@bluepopcorn.net> <2110984725.110415.1616290531763@email.ionos.com> <37ae1f6c-2c39-4a76-995c-642a91131553@www.fastmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Priority: 3
Importance: Normal
X-Mailer: Open-Xchange Mailer v7.10.4-Rev20
X-Originating-Client: open-xchange-appsuite
X-Provags-ID: V03:K1:Bmz7vJqvVNXaktdPLcE5Teagl5+1tpOQkvm5n+PNkPDgMwK36fm WpfSiqlrhZFDyMfYCtZgSIIUMjlNUx6+XWt6PtDCIv+8ze9KK++MjB190GAlrttjoiFCBgu lvZuxRotNE9YCQh5lb7S6FY/H7Av4dnhSJzcWTga0yNv9H/iBSf8RNOr17/msBYX+8ma3pM +bf0yJNTtlxrLpdlfaLXg==
X-UI-Out-Filterresults: notjunk:1;V03:K0:ljhBHFyu/wk=:y3a0e98X4Jfeg+q7ThwYu6 yMYyAshZI46WvS4EkOdFjvP9Mv7WcyogMsrvRX9DdkBuKAEqIjgm4Id3AYRQhWzXzMBjm+Ocw 3UFdfaSKsGH5FltLklMt+IMFmU6+SDnj0rLj5uSx/h4dMksJz17KL0giK3COY/vIn6P6TPAE/ jhjYyZ0jjtg8E+JI7XRvFfCD4AT6X7A5gLh67fj6RUTe88P1LfHaCtjaPv4U5AOf9XfMcapCa 77A6vxf0axFBnesU/++BwIhN4AoiTEnphPc4GHwhYMYa3C3Z+Td69N4NuhpXUDSsCVgl6jRXA +5hbgxRj5/yEpkyMFga7LnYbNzhWzES+f6NgzCE39+Hb355Hawyi+WvIMot9potsqxWfUJQIY lhmfi/TtkNHlNVvh76VvbCkMcvTHhEp8hu8rnFzihvwANDrwPTYDyNsrUaPFmXRrsSx/KCZW4 SH6pyj311mFevf36GE17r7UhL7Vp6+wn3Gzqf6lhaLoY4dlj7OTL
Archived-At: <https://mailarchive.ietf.org/arch/msg/kitten/0Uj5lF3dzADDqXpXGxcTBBDXfoM>
Subject: Re: [kitten] Comments on draft-ietf-kitten-password-storage-04
X-BeenThere: kitten@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Common Authentication Technologies - Next Generation <kitten.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/kitten>, <mailto:kitten-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/kitten/>
List-Post: <mailto:kitten@ietf.org>
List-Help: <mailto:kitten-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/kitten>, <mailto:kitten-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 02 Apr 2021 20:17:02 -0000

> On 04/01/2021 5:57 PM Sam Whited <sam@samwhited.com> wrote:
> 
> 
> Out of curiosity, why the lower memory size and a single lane, as
> opposed to the m=2GiB, t=1, p=4 from
> https://tools.ietf.org/html/draft-irtf-cfrg-argon2-13#section-7.3
> 

My suggestion of "Argon2id m=37 MiB, t=1, p=1 or m=15 MiB, t=2, p=1" is based on facts. "Argon2id m=2 GiB, t=1, p=4" is based on feelings. Also I think I gave like 4 or 5 and OWASP went with the first two. The third one "m=10 MiB, t=3, p=1" is important since those settings can be used with Argon2i (because t≥3).

https://tools.ietf.org/html/draft-irtf-cfrg-argon2-13#section-7.4:
"The Argon2id variant with t=1 and 2GiB memory is FIRST RECOMMENDED option and is suggested as a default setting for all environments. This setting is secure against side-channel attacks and maximizes adversarial costs on dedicated bruteforce hardware.  The Argon2id variant with t=3 and 64 MiB memory is SECOND RECOMMENDED option and is suggested as a default setting for memory-constrained environments."

Let's break that down. (Note section 4 states p=4 for both)

FIRST RECOMMENDED option: Argon2id m=2 GiB, t=1, p=4
SECOND RECOMMENDED option: Argon2id m=64 MiB, t=3, p=4

For an attacker with enough memory, these settings should be similarly hard. In reality the one with lower memory usage should be harder for an attacker with enough memory. *BUT* "m=2 GiB, t=1" reads/writes 4 GiB of memory and "m=64 MiB, t=3" reads/writes 0.5 GiB of memory. So "m=2 GiB, t=1" is 8x harder than "m=64 MiB, t=3" given an attacker with enough memory. To match memory reads/writes you need "m=64 MiB, t=22".

So their two recommended options don't match in quality.

"[Argon2id m=2 GiB, t=1, p=4] is suggested as a default setting for all environments." Looking at VPS services with 4 GiB and both sites I checked have 4 GiB, 2 CPU cores (yes there are 8 GiB, 4 CPU cores, but I'm going with just enough memory to run it). Which means you should have p=2. Also unless you implement a queuing system you should not run "m=2 GiB, t=1, p=4". Since an attacker can easily DoS by exceeding memory usage with only a few request per second. P.S. This takes 1.4 seconds on my computer which is a very long time for authentication.

Now "This setting is secure against side-channel attacks..." [citation needed]. Do side-channel attacks stop working when 4 threads are randomly reading in 1 GiB of memory? I'd say no. Argon2id with side-channel attacks drops it to Argon2i m=m'/4*(1+x/p), t=1 where x is 1 or 2. x is 2 with probability of ((p-1)/p)^p. Also you'll never need to keep more than m/p/2 amount of memory. The actual amount is less and based on probabilities. So "Argon2id m=2 GiB, t=1, p=4" with a side-channel attack turns into "Argon2i m=640 MiB, t=1" with 68.4% probability or "Argon2i m=768 MiB, t=1" with 31.6% probability. I'm not aware of Argon2i t=1 attacks that just disappear when m is large-ish like 640 MiB or 768 MiB.

----

OK now why is "Argon2id m=37 MiB, t=1, p=1; m=15 MiB, t=2, p=1; or m=10 MiB, t=3, p=1" based on facts? Also these are minimum good settings. You can go higher.

After searching for CPUs, GPUs, and FPGAs, the best attacker is GPUs. FPGAs have 1/2 to 1/3 the memory bandwidth as GPUs. CPUs have 1/4 to 1/20 the memory bandwidth as GPUs.

GPUs:
RTX 3080: 10 GiB, 760.0 GB/s
RTX 3090: 24 GiB, 935.8 GB/s (takes 3 "PCI slots" vs 2)
Radeon VII: 16 GiB, 1028 GB/s

There are these GPUs. Cost is unknown but a system with 8 A100 cards is $200k:
A100 80GB​: 80 GiB, 2039 GB/s
A100: 40 GiB, 1555 GB/s

Since the current best cost/performance GPU for password cracking is the RTX 3080, I'll go with that. Memory hard algorithms cracking speed is based on memory bandwidth if there's enough computing power. At low memory usage we can assume it gets near max bandwidth. The goal for a good password hash/KDF for auth is <10 kH/s/GPU. Basically <10 kH/s/GPU is good and ≥10 kH/s/GPU is not good. Therefore a memory hard algorithm needs to read/write at least 72.5 MiB (760,000,000,000/10,000/1024/1024). "m=37 MiB, t=1" does 74 MiB, "m=15 MiB, t=2" does 75 MiB, and "m=10 MiB, t=3" does 80 MiB. These are just higher than 72.5 MiB which means the theoretical max is <10 kH/s/GPU. You could argue that an attacker could buy Radeon VII's which would push those to read/write at least 98.1 MiB. I'd argue that people aren't building password cracking rigs specifically for memory hard algorithms, but let's say you win:
m=50 MiB, t=1: does 100 MiB
m=20 MiB, t=2: does 100 MiB
m=13 MiB, t=3: does 104 MiB


Now p=1, I've made this argument in a few places here's the latest with a few minor edits:

Stealing a quote from myself "For authentication you should use p=1 because a lot of people are running a VPS with a single CPU core. Even if not, one could benchmark this and think they can go higher on settings than they should because they are not thinking about throughput. Also with memory hard algorithms, it would be wise to limit the number of simultaneous instances. An attacker can likely send more requests per second than a server to do. Which will make the server exhaust all memory if there isn't a limit." There's another place I mention that with scrypt even though you can run it with multiple threads, don't. You can think of it as t in Argon2. But if you do set up a queue or mutex, you can use multiple threads (if you are mindful of throughput). This actually makes it a better experience since it's just as strong but during off-peak times latency is faster. Using a queue is better than a mutex. Also have a max queue length and serve a 503 error if queue is too long. Otherwise the connection could timeout and you'll probably still end up doing the work.

That's only for when the server is running the password hash. If the client is running a password KDF for a PAKE or some other auth scheme then yes increase p.