I haven't looked at the code in question for the Merkl=
e tree in the certificate transparency logs, but we have solved similar pro=
blems [1] through the simpler mechanism of just validating one's answer=
s before one releases them. If, in this case, the log construction system h=
ad signed the tree, then validated the correctness of the signature before =
releasing the signature, it could have just done it again when it discovere=
d that the tree correctness could no longer be validated.

If I understand correctly [2] the hash of a leaf node in the Merkle tree =
had one bit flipped before the hash was incorporated into the parent node. =
The best way to avoid this failure depends on the structure of the distribu=
ted systems, but double-checking that the data in memory hasn't changed=
(through keeping a simple CRC32 or similar), and validating that cryptogra=
phic=C2=A0outputs are reproducible should be sufficient; e.g.:

=C2=A0* when producing leaf nodes:

=C2=A0 1. first=
=C2=A0hash the source data with SHA-256

=C2=A0 2. then validate i=
t for inclusion in the tree

=C2=A0 3. then make a crc32 hash of t=
he SHA-256 hash

=C2=A0 4. then rehash the source data with SHA-25=
6

=C2=A0 5. validate that the hashes in step 1 and 4 are the same=

=C2=A0 6. retain in memory the hash from step 1 or 4 along with =
the crc from step 3

=C2=A0* when incorporating leaf nodes into tr=
ee nodes:

=C2=A0 7. calculate the SHA-256 over the child nodes

=

=C2=A0 8. go through each of the child nodes and validate that you =
can reproduce the crc for their hashes

=C2=A0 9. calculate the cr=
c32 for the hash from step 7

=C2=A0 10. rehash the child nodes

=C2=A0 11. validate that the hashes from step 7 and 10 are the same=

=C2=A0 12. retain the hash from step 7 or 10 along with the crc =
from step 9

=C2=A0* when doing a signature over a tree

=
=C2=A0 13. create a signature over the tree hash

=C2=A0 14. take =
a crc32 of the signature

=C2=A0 15. validate that the crc32 on th=
e tree hash can be computed from the tree hash used as input to step 13

=C2=A0 16. validate that the signature from step 13 can be validated=
as signing the tree hash

Then send the pair of (signature, crc32=
) to the requesting system. (Sending the crc32 can be avoided if the reques=
ting system can validate the correctness of the signature in other ways.)

<=
div>[2]=C2=A0https://groups.google.com/a/chromium.org/g/=
ct-policy/c/PCkKU357M2Q/m/xbxgEXWbAQAJ

I think by so having a validator which covers every=
step and which can be checked, you can avoid ever having exposure to a bit=
error be undetectable. The cost of the crc32s and duplication of all crypt=
ographic operations is a small price to pay, and much cheaper and simpler t=
han fully-redundant notaries.

If you need protecti=
on of storage or across networks, it can be done by storing or sending crc3=
2s along with data. While disks and networks have correctness checks of the=
ir own (or error recovery), and with TLS, network integrity is cryptographi=
c, there's no protection for data as it moves through API stacks before=
it gets protected; for this reason, managing the redundancy checks as a pa=
rt of the full data lifecycle is necessary to provide full protection again=
st memory errors or processor flaws.

Note: for pro=
cessor flaws we were concerned about reproducible error (in this case, hypo=
thetically that calculating the SHA-256 would produce the wrong value repea=
tedly), so for cryptographic operations we have considered using two indepe=
ndent implementations with independent intermediate data structures (such a=
s key schedules) to mitigate this risk.

=C2=A0- Ti=
m

On Wed, Jul 7, 2021 at 12:0=
4 PM Phillip Hallam-Baker <phil=
l@hallambaker.com> wrote:

So it has actually happened, a one in a bill= ion computing error has caused a cert transparency log to become corrupted = due to a bit flip error. There is a discussion of the issue here:The solution they obses= s over (ECC RAM) is actually irrelevant to the error in that case as it was= an even rarer CPU error. Which means that what I considered to be a more o= r less theoretical concern about signing append only logs turns out to have= actually occurred. Do things billions of times and billion to one chances = will happen. The only robust soluti= on to this issue is for redundant notaries to sign the log.

Consider the case where we have an appen=
d only log that is authenticated by means of a Merkle tree with the apex of=
the tree being signed at 10 minute intervals. If we have a single=C2=A0ser=
ver doing the signing, any error that occurs will lead to the log becoming =
invalid. This condition=C2=A0cannot be distinguished from a notary default.=

But consider the case wh=
ere there are three notaries each signing the log, which private key should=
they use?

=

All three sign=
ers use the same key means that if an error occurs, we risk having a correc=
t and incorrect version of the same log being signed. That means there is a=
real risk of the incorrect log and signature leaking somehow.=C2=A0

<=
div class=3D"gmail_default" style=3D"font-size:small">All three signers using differ=
ent keys is also bad because now we have three independent notaries and the=
relying party has to do the job of deciding which one to trust. There is a=
n even greater risk of the wrong log being relied on at some point.

A threshold scheme with three =
shares and a quorum of 2 solves the problem very neatly. The possibility of=
an undetected error is now much smaller as two signers must be hit with an=
error having exactly the same effect at the same time. That is a very remo=
te possibility unless the error is somehow caused by an architectural defec=
t in the CPU. So we should probably choose separate chipset architectures (=
Intel, AMD, ARM?) if we want to get to maximum robustness.

So how is the FROST draft coming?

CFRG mailing list

CFRG@irtf.org

https://www.irtf.org/mailman/listinfo/cfrg

--000000000000e88fba05c6938522--