[Cfrg] On the (non-)randomness of the S-box of Streebog and Kuznyechik

Leo Perrin <leo.perrin@inria.fr> Mon, 05 August 2019 15:35 UTC

Return-Path: <leo.perrin@inria.fr>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 50C0E120294 for <cfrg@ietfa.amsl.com>; Mon, 5 Aug 2019 08:35:12 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.898
X-Spam-Level:
X-Spam-Status: No, score=-6.898 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_HI=-5, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id t-7568cxA5Rd for <cfrg@ietfa.amsl.com>; Mon, 5 Aug 2019 08:35:08 -0700 (PDT)
Received: from mail2-relais-roc.national.inria.fr (mail2-relais-roc.national.inria.fr [192.134.164.83]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 3D66312028F for <cfrg@irtf.org>; Mon, 5 Aug 2019 08:35:08 -0700 (PDT)
X-IronPort-AV: E=Sophos;i="5.64,350,1559512800"; d="scan'208,217";a="394534064"
X-MGA-submission: MDEXnGgXTUE4aOdiFCRgaw42H2/g5fCRXTxWj9UJ/DcU2BPqlq1Y9U17FDnySQs3T/iHgEcebo7FfFTZuMAG0pLUhnPHg67LXg7eGzidLpBTF1/XTgCyhDZ9Q5rqVILJ1iCwFGikBVow5C2oTeRcH5JKOl18FpTktpqd3pes4u+oCA==
Received: from zcs-store2.inria.fr ([128.93.142.29]) by mail2-relais-roc.national.inria.fr with ESMTP; 05 Aug 2019 17:35:06 +0200
Date: Mon, 05 Aug 2019 17:35:06 +0200
From: Leo Perrin <leo.perrin@inria.fr>
To: cfrg <cfrg@irtf.org>
Message-ID: <1327417226.25659372.1565019306532.JavaMail.zimbra@inria.fr>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="=_05563046-1904-4228-8cbd-92d02b8529d3"
X-Originating-IP: [128.93.64.216]
X-Mailer: Zimbra 8.7.11_GA_3800 (ZimbraWebClient - FF68 (Linux)/8.7.11_GA_3800)
Thread-Index: bDYedNTOJoRP1dYtcL4yJJBpuMFb1g==
Thread-Topic: On the (non-)randomness of the S-box of Streebog and Kuznyechik
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/nwQ3ZcPlCwaWTGRw0ozWHiB0vjM>
Subject: [Cfrg] On the (non-)randomness of the S-box of Streebog and Kuznyechik
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 05 Aug 2019 15:35:17 -0000

Dear CFRG members, 

I would like to follow up on the short talk I gave several days ago. That talk was itself a follow-up to a previous e-mail [0] which presented a strange structure I found in pi, a component (S-box) which is shared by both Streebog (RFC 6986) and Kuznyechik (RFC 7801). 
At the time of writing this previous e-mail, I did not know that the designers of these algorithms were still claiming that they had generated their S-box randomly. However, they do! Here are links to relevant ISO documents that were leaked: 
- [ https://cdn.virgilsecurity.com/assets/docs/memo-on-kuznyechik-s-box.pdf | https://cdn.virgilsecurity.com/assets/docs/memo-on-kuznyechik-s-box.pdf ] 
- [ https://cdn.virgilsecurity.com/assets/docs/meeting-report-for-the-discussion-on-kuznyechik-and-streebog.pdf | https://cdn.virgilsecurity.com/assets/docs/meeting-report-for-the-discussion-on-kuznyechik-and-streebog.pdf ] 
As we can see in the first one, the designers of pi did claim that they obtained this component by generating many random S-boxes and then choosing the best according to some (fair and usual) criteria. In fact, as we can see in the second, they insisted that they used this method even *after* the publication of my latest results [1]. 

That is hardly possible. Let us see why. 

There are 2^{L+1}-1 bitstrings of length at most L. Thus, the total number of permutations with an implementation fitting in at most L bits in a given language (e.g. in C) is upper bounded by 2^{L+1}. The shortest implementation of pi in C we are aware of was found by ceilingcat [2] who improved a previous implementation by Xavier Bonnetain. That implementation is 139 ASCII characters long, i.e. 973 bits long. Thus, there are fewer than 2^{974} permutations with an implementation this simple. As there are 256! = 2^{1684} permutations operating on 8 bits, we easily conclude that the probability that a random permutation has a C implementation at most 973 bits long is smaller than 2^{974-1684} = 2^{-710}, which is beyond negligible. If we look at languages more compact than C (see [2]) then we can deduce even lower probabilities. In other words, the set of the permutations as simple as pi for a very broad definition of simplicity is extremely small, and thus the probability that we fall into it by chance is negligible. More detailed explanations are available at [3]. 

In light of this argument, only two possibilities are left: 
1. the designers of pi really used a random process to generate their S-box, they were just lucky enough to find one of the very (very very very) few permutations with a short implementation, or 
2. they used a TKlog deliberately but decided to hide it, the TKlog being the structure I identified in [1]. 
The probability that the first one is true can be upper bounded using the discussion above: it is at most 2^{-710}. For comparison, there are about 10^80 = 2^266 atoms in the universe. The only reasonable conclusion is then that we are in the second case: the use of the TKlog is a deliberate choice by its designers. Note that this reasoning is based purely on the simplicity of the TKlog structure. It does not take into account its very peculiar interactions with the finite field, nor does it consider the fact that the finite field used to define it is the same as the one used in the linear layer of Streebog. It is also unlikely that the differential and linear properties of pi can be encountered in a feasibly large set of random S-boxes [4]. 

Because of these observations, I insist that the designers of these algorithms have provided cryptanalysts with misleading information. There cannot be any good reason for this, which is why I think these algorithms cannot be trusted, and thus that RFC 6986 and 7801 should be deprecated. 

Cheers, 

/Léo 

PS: Remember that the golfed implementations listed in [2] are only intended to be small. They are *not* intended to be secure! Their running time in CPU cycles is directly proportionnal to the discrete logarithm of their input, meaning that they are the opposite of constant time. An implementation using them would be trivially vulnerable to some side channel attacks. 

[0] https://mailarchive.ietf.org/arch/msg/cfrg/4PmssKzCBsxTmLCieDgqD7Nynwg?fbclid=IwAR3JIGXt1_6aFwNOjDb-p7oC3ezHRuEEreznt-atyY7Clah2QZU_uU7FKS8 
[1] [ https://tosc.iacr.org/index.php/ToSC/article/view/7405 | https://tosc.iacr.org/index.php/ToSC/article/view/7405 ] 
[2] [ https://codegolf.stackexchange.com/questions/186498/proving-that-a-russian-cryptographic-standard-is-too-structured | https://codegolf.stackexchange.com/questions/186498/proving-that-a-russian-cryptographic-standard-is-too-structured ] 
[3] [ https://who.rocq.inria.fr/Leo.Perrin/code-golf.html | https://who.rocq.inria.fr/Leo.Perrin/code-golf.html ] 
[4] https://eprint.iacr.org/2019/528