Re: [SCITT] Consistency of the registry append-only log.

Ray Lutz <raylutz@citizensoversight.org> Wed, 24 May 2023 16:09 UTC

Return-Path: <raylutz@citizensoversight.org>
X-Original-To: scitt@ietfa.amsl.com
Delivered-To: scitt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 34DF4C1D9FD6 for <scitt@ietfa.amsl.com>; Wed, 24 May 2023 09:09:58 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.074
X-Spam-Level:
X-Spam-Status: No, score=-2.074 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, T_KAM_HTML_FONT_INVALID=0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=citizensoversight.org
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Y0TxeNG92UAM for <scitt@ietfa.amsl.com>; Wed, 24 May 2023 09:09:53 -0700 (PDT)
Received: from vps5.cognisys.com (vps5.cognisys.com [69.73.173.164]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7A659C1D9FC7 for <scitt@ietf.org>; Wed, 24 May 2023 09:09:52 -0700 (PDT)
Received: from [192.168.123.225] (ip174-65-13-111.sd.sd.cox.net [174.65.13.111]) by vps5.cognisys.com (Postfix) with ESMTPSA id 30FA3251F1; Wed, 24 May 2023 12:09:50 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=citizensoversight.org; s=default; t=1684944590; bh=GlirmlnF+cqQ4EF+k9kkysNo2J/1jEqoWPGEHcgXMyI=; l=70999; h=Subject:To:From; b=c/NG78hTAdSEhBqRUIVCVRE1GHfFnRy9mYmHeDnjnf/YvWICfxNta8azmDxqZBg/U KENAYxxB+pg5gobE8fE9vn3HlFBzUBcoLT1UI2smkrVzQYscaJhGfZAFJsVFLYu5OI VEnhEREupS9BeV0Rsuq+jJHQm40VYAPhhTaMtsns=
Content-Type: multipart/alternative; boundary="------------WGAIm2WvYUUotPuembBD3Vgw"
Message-ID: <3bcd8717-5f59-42ce-4e73-71ac04b860d2@citizensoversight.org>
Date: Wed, 24 May 2023 09:09:48 -0700
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0
Content-Language: en-US
To: dick@reliableenergyanalytics.com, scitt@ietf.org
References: <1dfae262-2802-6036-7382-3f5496a3e186@citizensoversight.org> <071401d98d70$3255fd10$9701f730$@reliableenergyanalytics.com>
From: Ray Lutz <raylutz@citizensoversight.org>
In-Reply-To: <071401d98d70$3255fd10$9701f730$@reliableenergyanalytics.com>
Archived-At: <https://mailarchive.ietf.org/arch/msg/scitt/sGwj3l2cPALPoGJ-i_LQB5sfTAI>
Subject: Re: [SCITT] Consistency of the registry append-only log.
X-BeenThere: scitt@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Supply Chain Integrity, Transparency, and Trust" <scitt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/scitt>, <mailto:scitt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/scitt/>
List-Post: <mailto:scitt@ietf.org>
List-Help: <mailto:scitt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/scitt>, <mailto:scitt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 May 2023 16:09:58 -0000

I think this can be boiled down to a simple requirement.

1. First statement A is submitted to the log about artifact 1.
2. Second statement B is submitted to the log about artifact 1.

Statement B can mention A, so if you know about B, you can find A.
But if you only know A was submitted, then can you find B?

Yes, if the artifact is the same and it is exposed as metadata in the 
submission, then we can find both of them.

One solution is to provide an id when the first artifact is submitted.  
Then it is used for all later submissions.
This puts the responsibility of submitting with the same id on the 
submitter, and does not expose any
metadata of the artifact, bc the id is generated by the registry.
Anyone can find all the submissions with the same id.
And it differs from using the hash of the content as the id because that 
won't work with the case where the product is the same but the artifact 
is different. For example, if the new record is not about artifact 1 
(the weight) but is about artifact 2 (the blood pressure).

Alternatively, the ID can be provided by the submitter, and used for all 
similar submissions. This would be like the name of the product (or the 
person).

But let's extend your example Dick, to have the blood pressure recorded. 
These are both about the same person. So if you want to find all the 
statements about that person, it is handy to identify the person. If the 
id is returned by the registry, then the work to link them is done by 
the submitter so they can be easily found later.

I have as a goal to reduce the ability of a given identified entity to 
submit false claims. To avoid false claims of any kind, we have to have 
at least one thing held constant. So if we choose that the id of the 
entity submitting the claims is held constant (i.e. they will always 
identify as the same submitting entity), then we can at least find all 
the entries submitted by that entity. And if the entity makes a claim, 
then we can compare that with prior claims by the same entity. And the, 
I also claim that we need to know the subject of the submission, to 
whatever degree makes sense.

In the case of supply chains, I think we can assume we always need to 
know 1) the product and 2) who is responsible for it. There may also be 
a submitter of the record to the log.

These could be exposed in the scitt log so users of the log can 
understand them.

Let's consider black-box submissions that are managed by a returned ID, 
and the users can submit a 'group-with' ID, so the records will be 
grouped together, and when a request for a given id is received, then 
all the records with that group-with ID can be returned. Then the SCITT 
log knows not too much about what is being submitted, and mapping the 
group-with id with the entity and product can be added in another layer.

The questions that need to be considered is how does DID and PURL, etc 
solve any of this, and is this sufficient to provide "consistency" of 
the log, such that an identified but dishonest submitter can lie about 
what is in the log.

--Ray





On 5/23/2023 5:14 AM, Dick Brooks wrote:
>
> Ray,
>
> IMO, we could examine the SCITT architectural model, especially the 
> Transparency Service using a simple, easy to understand use case that 
> would serve as a starting point to flush out the details we have been 
> discussing.
>
> I recommend we discuss the functionality needed from a SCITT 
> Transparency Service using “Person Information” as a simple use case 
> to validate and enhance design details. For example, we start simple 
> by evaluating how SCITT could be used to register “Person Weight” 
> information over time. This has many of the characteristics of other 
> temporal information which SCITT may be used to report on, such as 
> software products.
>
> The scenario I proposed goes like this:
>
> A Transparency Service is used to report “Person Information” as the 
> registration policy. Only information pertaining to a Person may be 
> registered in this Transparency Service.
>
> Each Day a person, the SUBJECT, is weighed by an authorized party, the 
> STATEMENT ISSUER, who submits this information to the Transparency 
> Service that registers Person Information. Log Records are never 
> deleted, the log is append only.
>
> The Transparency Service “logs” this information, for example:
>
> StatementType: PERSON WEIGHT
>
> Person: Dick Brooks
>
> UniqueID: SHA-256 Hash Value to identify Dick Brooks uniquely
>
> Weight: 180
>
> UnitOfMeasure: US Pounds
>
> EffectiveDateTimeUTC: 19790101T14:00:00Z
>
> StatementType: PERSON WEIGHT
>
> Person: Dick Brooks
>
> UniqueID: SHA-256 Hash Value to identify Dick Brooks uniquely
>
> Weight: 201
>
> UnitOfMeasure: US Pounds
>
> EffectiveDateTimeUTC: 19850101T14:00:00Z
>
> And this continues for the life of the SUBJECT recording different 
> Statement Types at increasing dates/times, i.e., Height, Salary, Job 
> Title, etc. Each one is a different “STATEMENT TYPE”, in SAG-CTR we 
> call these ‘Trust Declaration Types”
>
> A party interested in querying the “Person Information Transparency 
> Service” can query for a persons weight or other info, for example:
>
> https://person.transparencyservice/getPersonWeight?UniqueID=hashvalue
>
> returning the weight information for all weight records in the log 
> associated with the UniqueID.
>
> IMO, this abstraction makes it easier to think through how the model 
> works using a simple, easy to understand use case that correlates to 
> the software use case. A software product is identified by 3 data 
> elements, based on NTIA minimum elements: 
> SupplierName/ProductName/ProductVersion and a unique SHA-256 hash 
> value of the software product installation object, i.e. container, 
> install file, tar file, etc.
>
> This approach would help us define a process to evaluate other use 
> cases, which we would document and follow for all use cases in SCITT.
>
> Thanks,
>
> Dick Brooks
>
> /Active Member of the CISA Critical Manufacturing Sector, /
>
> /Sector Coordinating Council – A Public-Private Partnership/
>
> */Never trust software, always verify and report! 
> <https://reliableenergyanalytics.com/products>/*™
>
> http://www.reliableenergyanalytics.com 
> <http://www.reliableenergyanalytics.com/>
>
> Email: dick@reliableenergyanalytics.com 
> <mailto:dick@reliableenergyanalytics.com>
>
> Tel: +1 978-696-1788
>
> *From:*SCITT <scitt-bounces@ietf.org> *On Behalf Of *Ray Lutz
> *Sent:* Tuesday, May 23, 2023 1:20 AM
> *To:* scitt@ietf.org
> *Subject:* [SCITT] Consistency of the registry append-only log.
>
> We were talking today about how to deal with various related records 
> that may be submitted at various times in the log.
> I will outline some thoughts here with the caveat that I have to spend 
> a night thinking about it.
>
> I thought at first that the log could be quite brain-dead about this, 
> and just accept a payload and secure it with a receipt. If the user 
> wants to group them together, then the submitter can submit additional 
> entries that provide the grouping. But I don't think this is good 
> enough for most use cases.
>
> First, we must admit that records related to specific supply chain 
> products will not all be submitted at the same time. In the SW use 
> case, there are various points in the development and release and 
> testing cycle when submissions about the same product must be 
> submitted separately, but thought of as a set.
>
> For a simple sw product, there may be a release of the source code, 
> and also later several related submissions, such as testing results.
>
> For the election use case I am currently working to resolve, there 
> appear to be several release stages:
> 1. Release of a set of all the public keys of the devices that will be 
> deployed.
> 2. Incremental releases of cryptographic evidence related to scanning 
> of ballots (without releasing the ballots or the vote).
> 3. Release of all the evidence for auditing and review.
> 4. Final release of the result of the election upon certification.
> 5. Submission of auditing results based on all the evidence found.
>
> There are similar scenarios in the sw space, usually related to 
> critical code that must be tested to some agreed criteria. The example 
> I used in the call was of say self-driving vehicle software that must 
> be tested to a standard test suite or the mfr will be liable for 
> crashes. We can't trust them to tell the truth about prior 
> submissions, that is one of the great things about the append-only log.
>
> These are all logically all part of the same election, so we need a 
> way to submit incrementally.
> But one threat scenario is a compromised official or hacker, who may 
> want to hope that we will forget about the original public keys, and 
> substitute a compromised set so the results can be modified. Thus, we 
> can't rely on the submitter to be honest about what was submitted, but 
> must build that in.
>
> this is a useful Chat GPT response.
>
> The attribute you are describing is called consistency in the context 
> of a registry built on an append-only log. Consistency ensures that 
> the registry maintains a reliable and accurate state, where all 
> related records are present and there are no additional or missing 
> entries.
>
> In an append-only log, new entries are added to the end of the log, 
> preserving the order in which they were appended. To achieve 
> consistency, the registry must adhere to certain principles:
>
>  1. Append-only: The registry strictly follows the append-only rule,
>     meaning that once a record is appended to the log, it cannot be
>     modified or deleted. This ensures that the log maintains a
>     reliable and immutable history of all entries.
>  2. Atomicity: Operations on the registry should be atomic, meaning
>     they are treated as indivisible units of work. This guarantees
>     that each operation either succeeds completely or fails,
>     preventing partial updates that could compromise consistency.
>  3. Strong consistency guarantees: The registry should provide strong
>     consistency guarantees to ensure that all related records are
>     found and there are no additional or missing entries. This means
>     that queries made to the registry will always reflect the most
>     up-to-date state of the log, and the registry will not return
>     stale or inconsistent results.
>  4. Indexing and querying: The registry should support efficient
>     indexing and querying mechanisms to allow users to find relevant
>     entries quickly and accurately. This can be achieved through
>     various techniques, such as maintaining indexes or using data
>     structures optimized for querying, like balanced search trees or
>     hash tables.
>
> ==========
> The other important aspect is that the data can be transparently 
> inspected. Others can check that the log isn't lying. So the 
> consistency guarantee will be important, esp. bc we need to know for 
> sure that there are no forgotten entries related to the product of 
> concern.
>
> Sure, the system could provide completely obscured usage, which would 
> be fine if there were no incremental submissions for the same product.
>
> Thus, we will need to also provide some exposed metadata with enough 
> structure to at least identify the submitter and the product, when 
> consistency attribute is desired.
>
> This is not fully thought through but it is a start.
>
> --Ray
>
>
>
>
>
>
> -- 
> -------
> Ray Lutz
> Citizens' Oversight Projects (COPs)
> http://www.citizensoversight.org
> 619-820-5321

-- 
-------
Ray Lutz
Citizens' Oversight Projects (COPs)
http://www.citizensoversight.org
619-820-5321