Re: [CFRG] Threshold Sig required - Random bit flip hits Cert Transparency Log

Phillip Hallam-Baker <phill@hallambaker.com> Thu, 08 July 2021 04:54 UTC

Return-Path: <hallam@gmail.com>
X-Original-To: cfrg@ietfa.amsl.com
Delivered-To: cfrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F0E173A0766 for <cfrg@ietfa.amsl.com>; Wed, 7 Jul 2021 21:54:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.401
X-Spam-Level:
X-Spam-Status: No, score=-1.401 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.248, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.248, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xbyD-1c9aqb5 for <cfrg@ietfa.amsl.com>; Wed, 7 Jul 2021 21:54:20 -0700 (PDT)
Received: from mail-yb1-f181.google.com (mail-yb1-f181.google.com [209.85.219.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 159A63A0688 for <cfrg@irtf.org>; Wed, 7 Jul 2021 21:54:19 -0700 (PDT)
Received: by mail-yb1-f181.google.com with SMTP id b13so6955128ybk.4 for <cfrg@irtf.org>; Wed, 07 Jul 2021 21:54:19 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=5/hmZH0kwtMteIWplo6R9VkG3IrtOj+o6rSViscZ9Nw=; b=IkSkYD1IUsiSYjHsBjc24M25ak4XuirErkoOvYr9m3mkP1WZcBvBwb2PPt0sXM1jY8 OKT5UdPaf/yxUuW1Coh72cegx5ei43yxNdzizfBf9Hlc3s+nvVgja5IkpNhR8pTH+Yev nbGqW6YE8ODukz3qJ4Cis0NFPEQj86BFgAJHt/Rt1UZ5loXJSsMfxpFB9BGtHpez3bZ1 ZwA5q7I7XyugUXr9muihknYtKwgxa9MkGgnni8gLsTzSttIJTJp9M05DjQjPvIgWUouy gv3VwFqMYO7SBi5NIfFnoON7HpOZM8Zjcgmk0T/w1+Q6BT2Cc1qaU5uGAh0FeDpcbnkv zJow==
X-Gm-Message-State: AOAM533dmiOlalFeHSQNzJTi92msBilnAEYzpIuznnmXw5l2V1yuIEg/ Wj+pmV+1e+pbpY2RxOdRN2sCi2NwLVdt1XqgUfk=
X-Google-Smtp-Source: ABdhPJxiW55iv3+ojTmJ4RJc7xy5GEHl2VTWb7LOVTIiRb34zLDz77wgqv7Dj2R5ppU3vb8WU5SqNSPEzt3i202imyw=
X-Received: by 2002:a25:cb82:: with SMTP id b124mr20335132ybg.56.1625720058683; Wed, 07 Jul 2021 21:54:18 -0700 (PDT)
MIME-Version: 1.0
References: <CAMm+Lwjh29Eugv=HO-yL8fXW_xh7a=4vVgCKYWdRvGW9dU9o7A@mail.gmail.com> <CAD5Uzx-JuZVyygfQ8SgmDd5dSWTbE6PXseAUmyThh3dJRdmR_g@mail.gmail.com>
In-Reply-To: <CAD5Uzx-JuZVyygfQ8SgmDd5dSWTbE6PXseAUmyThh3dJRdmR_g@mail.gmail.com>
From: Phillip Hallam-Baker <phill@hallambaker.com>
Date: Thu, 08 Jul 2021 00:54:09 -0400
Message-ID: <CAMm+LwgfmOh7WTcmTv9FVkAsNz8SLH8ufgNtyRQ6Hgd+TfZGWQ@mail.gmail.com>
To: Tim Dierks <tim@dierks.org>
Cc: IRTF CFRG <cfrg@irtf.org>
Content-Type: multipart/alternative; boundary="0000000000009a241a05c6957203"
Archived-At: <https://mailarchive.ietf.org/arch/msg/cfrg/FDPj7V9yCgybEY7zkw-QrLrM1PA>
Subject: Re: [CFRG] Threshold Sig required - Random bit flip hits Cert Transparency Log
X-BeenThere: cfrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Crypto Forum Research Group <cfrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/cfrg>, <mailto:cfrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/cfrg/>
List-Post: <mailto:cfrg@irtf.org>
List-Help: <mailto:cfrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/cfrg>, <mailto:cfrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Thu, 08 Jul 2021 04:54:25 -0000

If you have a single CPU, you will always have the possibility of an error.
Not all faults are transient. If the data was corrupted in the cache, it is
going to be corrupted both times it is hashed. And optimizing compilers can
screw you in really imaginative ways.

I have had RAID 6 arrays die in production. These things really do happen.
If the controller card goes, all your copies end up corrupted.

Three copies, two different media, one offsite.

Threshold sigs might not be the only way to do it but it is an elegant way
to get to six nines.


On Wed, Jul 7, 2021 at 10:36 PM Tim Dierks <tim@dierks.org> wrote:

> I haven't looked at the code in question for the Merkle tree in the
> certificate transparency logs, but we have solved similar problems [1]
> through the simpler mechanism of just validating one's answers before one
> releases them. If, in this case, the log construction system had signed the
> tree, then validated the correctness of the signature before releasing the
> signature, it could have just done it again when it discovered that the
> tree correctness could no longer be validated.
>
> If I understand correctly [2] the hash of a leaf node in the Merkle tree
> had one bit flipped before the hash was incorporated into the parent node.
> The best way to avoid this failure depends on the structure of the
> distributed systems, but double-checking that the data in memory hasn't
> changed (through keeping a simple CRC32 or similar), and validating that
> cryptographic outputs are reproducible should be sufficient; e.g.:
>
>  * when producing leaf nodes:
>   1. first hash the source data with SHA-256
>   2. then validate it for inclusion in the tree
>   3. then make a crc32 hash of the SHA-256 hash
>   4. then rehash the source data with SHA-256
>   5. validate that the hashes in step 1 and 4 are the same
>   6. retain in memory the hash from step 1 or 4 along with the crc from
> step 3
>  * when incorporating leaf nodes into tree nodes:
>   7. calculate the SHA-256 over the child nodes
>   8. go through each of the child nodes and validate that you can
> reproduce the crc for their hashes
>   9. calculate the crc32 for the hash from step 7
>   10. rehash the child nodes
>   11. validate that the hashes from step 7 and 10 are the same
>   12. retain the hash from step 7 or 10 along with the crc from step 9
>  * when doing a signature over a tree
>   13. create a signature over the tree hash
>   14. take a crc32 of the signature
>   15. validate that the crc32 on the tree hash can be computed from the
> tree hash used as input to step 13
>   16. validate that the signature from step 13 can be validated as signing
> the tree hash
> Then send the pair of (signature, crc32) to the requesting system.
> (Sending the crc32 can be avoided if the requesting system can validate the
> correctness of the signature in other ways.)
>
> I think by so having a validator which covers every step and which can be
> checked, you can avoid ever having exposure to a bit error be undetectable.
> The cost of the crc32s and duplication of all cryptographic operations is a
> small price to pay, and much cheaper and simpler than fully-redundant
> notaries.
>
> If you need protection of storage or across networks, it can be done by
> storing or sending crc32s along with data. While disks and networks have
> correctness checks of their own (or error recovery), and with TLS, network
> integrity is cryptographic, there's no protection for data as it moves
> through API stacks before it gets protected; for this reason, managing the
> redundancy checks as a part of the full data lifecycle is necessary to
> provide full protection against memory errors or processor flaws.
>
> Note: for processor flaws we were concerned about reproducible error (in
> this case, hypothetically that calculating the SHA-256 would produce the
> wrong value repeatedly), so for cryptographic operations we have considered
> using two independent implementations with independent intermediate data
> structures (such as key schedules) to mitigate this risk.
>
>  - Tim
>
> [1]
> https://sigops.org/s/conferences/hotos/2021/papers/hotos21-s01-hochschild.pdf
> [2]
> https://groups.google.com/a/chromium.org/g/ct-policy/c/PCkKU357M2Q/m/xbxgEXWbAQAJ
>
> On Wed, Jul 7, 2021 at 12:04 PM Phillip Hallam-Baker <
> phill@hallambaker.com> wrote:
>
>> So it has actually happened, a one in a billion computing error has
>> caused a cert transparency log to become corrupted due to a bit flip error.
>> There is a discussion of the issue here:
>>
>> Single random bit flip causes error in certificate transparency log |
>> Hacker News (ycombinator.com)
>> <https://news.ycombinator.com/item?id=27728287>
>>
>> The solution they obsess over (ECC RAM) is actually irrelevant to the
>> error in that case as it was an even rarer CPU error. Which means that what
>> I considered to be a more or less theoretical concern about signing append
>> only logs turns out to have actually occurred. Do things billions of times
>> and billion to one chances will happen.
>>
>>
>> The only robust solution to this issue is for redundant notaries to sign
>> the log.
>>
>> Consider the case where we have an append only log that is authenticated
>> by means of a Merkle tree with the apex of the tree being signed at 10
>> minute intervals. If we have a single server doing the signing, any error
>> that occurs will lead to the log becoming invalid. This condition cannot be
>> distinguished from a notary default.
>>
>> But consider the case where there are three notaries each signing the
>> log, which private key should they use?
>>
>> All three signers use the same key means that if an error occurs, we risk
>> having a correct and incorrect version of the same log being signed. That
>> means there is a real risk of the incorrect log and signature leaking
>> somehow.
>>
>> All three signers using different keys is also bad because now we have
>> three independent notaries and the relying party has to do the job of
>> deciding which one to trust. There is an even greater risk of the wrong log
>> being relied on at some point.
>>
>> A threshold scheme with three shares and a quorum of 2 solves the problem
>> very neatly. The possibility of an undetected error is now much smaller as
>> two signers must be hit with an error having exactly the same effect at the
>> same time. That is a very remote possibility unless the error is somehow
>> caused by an architectural defect in the CPU. So we should probably choose
>> separate chipset architectures (Intel, AMD, ARM?) if we want to get to
>> maximum robustness.
>>
>>
>> So how is the FROST draft coming?
>>
>>
>>
>> _______________________________________________
>> CFRG mailing list
>> CFRG@irtf.org
>> https://www.irtf.org/mailman/listinfo/cfrg
>>
>