Re: [CFRG] Threshold Sig required - Random bit flip hits Cert Transparency Log

Tim Dierks <> Thu, 08 July 2021 02:36 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id C6CA33A0AC4 for <>; Wed, 7 Jul 2021 19:36:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id MvxDKs-GQs_D for <>; Wed, 7 Jul 2021 19:36:34 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4864:20::b2f]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 9BACE3A0ABE for <>; Wed, 7 Jul 2021 19:36:34 -0700 (PDT)
Received: by with SMTP id o139so6498056ybg.9 for <>; Wed, 07 Jul 2021 19:36:34 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=nP+cNrpNak5gRfIb+Mpd/CoahneynDiyHIjFM5mhRFw=; b=E6UhvgWSrPb0CqbyO50/A2Ho4dltqfNHandVFw3J+tv2a7JeqMMXnoyramtG5yXiHr ApzuqIkEZy4ItUu+b8Zk+ZXIxiXWtoMmMLXB1KH0u/8qwyCKH/S2tlZVoLCoD6fX9onF PIiKOV1CxXFu1I9F8uqtji1QQIYmaVZ7B3XH/oGJetfkli/RgbpvBkKvzsYwx2gVZ4Vp Kixt3p/yOPJEuB20snoIBdTsY2uYbBWsgndseFMjo6Qq33di2RAo8SQmE9QRykMgU9ce bKMZt5oDtMtVc5dqPzAtTxGBvHwsAzVPoGvFPUeEPfQHlbRYDEj3tr/64a9WaIoVo9h7 IvGA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nP+cNrpNak5gRfIb+Mpd/CoahneynDiyHIjFM5mhRFw=; b=IFZohqxFR6mIM1Au2D2mAJs/jFTfe7manvkvGnMaxgxI1rtDDc7WGou4UxZeenFPZ3 v62PAuUMptcmJdS6pE0ALqaLQlv60uKpwJqO/C6Z8LhX5EZmffzibEVQhN9sAYarWT2p /Q665NrTQ/9PB/q0AX7INVLhdRYoPaG2ChDQLxghWL1baXxIdgPkyb2P31VapHhr7+Hk jBxvevglSehf863LwjHN5RO2V1tP8KsEfRB+LdFC2vkJSTjWnP7TjcBiZPbMVJ/ElcCF 8mS3IWT3y5FopfJcQ5wIBujTgo5u6YnY9i4S5NksPzhc7Ezx5Tz6p0nChsa6b7Y2fHoA RB2g==
X-Gm-Message-State: AOAM533uOH+sWyv13uWsl+rqSnFczQXWQA+54IIsRhSdZtW83p0r/Nka JPG0qjpGgvLAk60oqL2UW6mXySvrAtS4d2rbH7RiCg==
X-Google-Smtp-Source: ABdhPJx+sEVHtdDQ2ZE8pblY9oF9yySPFw7/Xpa5/6a3QAQmHKWPjQmYef1nso7mOULMqNUGnIK5VjicbnD6Pnz5ZCk=
X-Received: by 2002:a25:c285:: with SMTP id s127mr33361209ybf.437.1625711792650; Wed, 07 Jul 2021 19:36:32 -0700 (PDT)
MIME-Version: 1.0
References: <>
In-Reply-To: <>
From: Tim Dierks <>
Date: Wed, 07 Jul 2021 22:36:21 -0400
Message-ID: <>
To: Phillip Hallam-Baker <>
Content-Type: multipart/alternative; boundary="000000000000e88fba05c6938522"
Archived-At: <>
Subject: Re: [CFRG] Threshold Sig required - Random bit flip hits Cert Transparency Log
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Crypto Forum Research Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Thu, 08 Jul 2021 02:36:40 -0000

I haven't looked at the code in question for the Merkle tree in the
certificate transparency logs, but we have solved similar problems [1]
through the simpler mechanism of just validating one's answers before one
releases them. If, in this case, the log construction system had signed the
tree, then validated the correctness of the signature before releasing the
signature, it could have just done it again when it discovered that the
tree correctness could no longer be validated.

If I understand correctly [2] the hash of a leaf node in the Merkle tree
had one bit flipped before the hash was incorporated into the parent node.
The best way to avoid this failure depends on the structure of the
distributed systems, but double-checking that the data in memory hasn't
changed (through keeping a simple CRC32 or similar), and validating that
cryptographic outputs are reproducible should be sufficient; e.g.:

 * when producing leaf nodes:
  1. first hash the source data with SHA-256
  2. then validate it for inclusion in the tree
  3. then make a crc32 hash of the SHA-256 hash
  4. then rehash the source data with SHA-256
  5. validate that the hashes in step 1 and 4 are the same
  6. retain in memory the hash from step 1 or 4 along with the crc from
step 3
 * when incorporating leaf nodes into tree nodes:
  7. calculate the SHA-256 over the child nodes
  8. go through each of the child nodes and validate that you can reproduce
the crc for their hashes
  9. calculate the crc32 for the hash from step 7
  10. rehash the child nodes
  11. validate that the hashes from step 7 and 10 are the same
  12. retain the hash from step 7 or 10 along with the crc from step 9
 * when doing a signature over a tree
  13. create a signature over the tree hash
  14. take a crc32 of the signature
  15. validate that the crc32 on the tree hash can be computed from the
tree hash used as input to step 13
  16. validate that the signature from step 13 can be validated as signing
the tree hash
Then send the pair of (signature, crc32) to the requesting system. (Sending
the crc32 can be avoided if the requesting system can validate the
correctness of the signature in other ways.)

I think by so having a validator which covers every step and which can be
checked, you can avoid ever having exposure to a bit error be undetectable.
The cost of the crc32s and duplication of all cryptographic operations is a
small price to pay, and much cheaper and simpler than fully-redundant

If you need protection of storage or across networks, it can be done by
storing or sending crc32s along with data. While disks and networks have
correctness checks of their own (or error recovery), and with TLS, network
integrity is cryptographic, there's no protection for data as it moves
through API stacks before it gets protected; for this reason, managing the
redundancy checks as a part of the full data lifecycle is necessary to
provide full protection against memory errors or processor flaws.

Note: for processor flaws we were concerned about reproducible error (in
this case, hypothetically that calculating the SHA-256 would produce the
wrong value repeatedly), so for cryptographic operations we have considered
using two independent implementations with independent intermediate data
structures (such as key schedules) to mitigate this risk.

 - Tim


On Wed, Jul 7, 2021 at 12:04 PM Phillip Hallam-Baker <>

> So it has actually happened, a one in a billion computing error has caused
> a cert transparency log to become corrupted due to a bit flip error. There
> is a discussion of the issue here:
> Single random bit flip causes error in certificate transparency log |
> Hacker News (
> <>
> The solution they obsess over (ECC RAM) is actually irrelevant to the
> error in that case as it was an even rarer CPU error. Which means that what
> I considered to be a more or less theoretical concern about signing append
> only logs turns out to have actually occurred. Do things billions of times
> and billion to one chances will happen.
> The only robust solution to this issue is for redundant notaries to sign
> the log.
> Consider the case where we have an append only log that is authenticated
> by means of a Merkle tree with the apex of the tree being signed at 10
> minute intervals. If we have a single server doing the signing, any error
> that occurs will lead to the log becoming invalid. This condition cannot be
> distinguished from a notary default.
> But consider the case where there are three notaries each signing the log,
> which private key should they use?
> All three signers use the same key means that if an error occurs, we risk
> having a correct and incorrect version of the same log being signed. That
> means there is a real risk of the incorrect log and signature leaking
> somehow.
> All three signers using different keys is also bad because now we have
> three independent notaries and the relying party has to do the job of
> deciding which one to trust. There is an even greater risk of the wrong log
> being relied on at some point.
> A threshold scheme with three shares and a quorum of 2 solves the problem
> very neatly. The possibility of an undetected error is now much smaller as
> two signers must be hit with an error having exactly the same effect at the
> same time. That is a very remote possibility unless the error is somehow
> caused by an architectural defect in the CPU. So we should probably choose
> separate chipset architectures (Intel, AMD, ARM?) if we want to get to
> maximum robustness.
> So how is the FROST draft coming?
> _______________________________________________
> CFRG mailing list