Re: [SCITT] Consistency of the registry append-only log.

Isaac Hepworth <isaach@google.com> Wed, 24 May 2023 20:00 UTC

Return-Path: <isaach@google.com>
X-Original-To: scitt@ietfa.amsl.com
Delivered-To: scitt@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E91DCC151B1E for <scitt@ietfa.amsl.com>; Wed, 24 May 2023 13:00:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -17.586
X-Spam-Level:
X-Spam-Status: No, score=-17.586 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_KAM_HTML_FONT_INVALID=0.01, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id UkBlK1pFi6t1 for <scitt@ietfa.amsl.com>; Wed, 24 May 2023 13:00:14 -0700 (PDT)
Received: from mail-ua1-x934.google.com (mail-ua1-x934.google.com [IPv6:2607:f8b0:4864:20::934]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 772B1C151709 for <scitt@ietf.org>; Wed, 24 May 2023 13:00:14 -0700 (PDT)
Received: by mail-ua1-x934.google.com with SMTP id a1e0cc1a2514c-783ff10cdbaso105219241.2 for <scitt@ietf.org>; Wed, 24 May 2023 13:00:14 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684958413; x=1687550413; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+fPEfzDxPHFj1rji/T/8TmG7aA7J30m682ujTlL7EYk=; b=Ce6N2XJwn/dzbu0hcNyGu49PHN0tlKsAnGpNeicUk9+Y/JDcfAm9sF80IF5y6OtZ4P cKAd7/o2fc3Sfz5EiD+eu22tghg0H78RC6YGYa3Vuhj38Oek2a9/grOadC809garhSLf 4WIs2hEwN0PkKwakO5snutiovZPm91bwlRBoT0l3WS34ziB9zNWXFz9ozGbz+UCkAJw7 oTUdKmEb2oP0spkC8816pM0sc3TD8H5kLEKgneGP6a9pyJvGIRs8zadHfwpYSlAcbNP1 1xvAXRDd7J5LlK8NEtLpky1xJcHcyVoxNmsFcCfQMbT/ytdhkEHRGoReBygk875aGcPA 1RAg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684958413; x=1687550413; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+fPEfzDxPHFj1rji/T/8TmG7aA7J30m682ujTlL7EYk=; b=LM18Sg/aRI+a4slQp9kbmMlxc+C1FTps9kFc/11lQZABJVWV1mHRr5kubO1+RFaSqB br+rSX12aBG5XOaftXTtjoTzPEp6BSIVt6ZXi8BtWJ15iitkD4XPDpJqW33q5JCPH11L 8ODrVY9/lduRVTeOBJsxIK1eWTBaZBGvv4D7PsVWP98DmxiwDqI2OrD0TXGa8G0fgRcg uYN+P31GB4tt52CphtloQ0z/hOvmA6ihSodnBfyE9k+bs6XX3vRXV5niCuLoCuiZuhXS Uj6OtRHbKCv3SptFhPAj9qbZadwIvyGbn8GD8UTUsVZtmUP+IoBEw+7b6Xq3+NIBVqR8 MnZA==
X-Gm-Message-State: AC+VfDxpTQ5oWvQETH2+dpdLcYegdnNp43tMYETu19HWB8kCcoLOVR5Y WIhPTxYSTY+QHTeXNmW7fc+TNCKf8FI/bEu4CmnyCU/qQWZgBN3IfHg=
X-Google-Smtp-Source: ACHHUZ7tX9jmKXQsArlUDa2xcu/uk88lXwLC/nYu6qMXInRCiNGMuwuEo/wg5Yvk2YPKRIYjTaVQiMA3cZtYIPU8Zk4=
X-Received: by 2002:a05:6102:364a:b0:434:6dcf:5e13 with SMTP id s10-20020a056102364a00b004346dcf5e13mr5740367vsu.18.1684958413168; Wed, 24 May 2023 13:00:13 -0700 (PDT)
MIME-Version: 1.0
References: <1dfae262-2802-6036-7382-3f5496a3e186@citizensoversight.org> <071401d98d70$3255fd10$9701f730$@reliableenergyanalytics.com> <3bcd8717-5f59-42ce-4e73-71ac04b860d2@citizensoversight.org> <136601d98e75$4e6044b0$eb20ce10$@reliableenergyanalytics.com>
In-Reply-To: <136601d98e75$4e6044b0$eb20ce10$@reliableenergyanalytics.com>
From: Isaac Hepworth <isaach@google.com>
Date: Wed, 24 May 2023 14:00:01 -0600
Message-ID: <CAMYDBzEWkXqZvOgS7R3qX2aFZEY=qYF2KQbYep_rzg3jQhLz6A@mail.gmail.com>
To: dick@reliableenergyanalytics.com
Cc: Ray Lutz <raylutz@citizensoversight.org>, scitt@ietf.org
Content-Type: multipart/related; boundary="000000000000ae1b5305fc75f3ee"
Archived-At: <https://mailarchive.ietf.org/arch/msg/scitt/fj70nqbZxFXSH_0PCrPOc2Ov3M4>
Subject: Re: [SCITT] Consistency of the registry append-only log.
X-BeenThere: scitt@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: "Supply Chain Integrity, Transparency, and Trust" <scitt.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/scitt>, <mailto:scitt-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/scitt/>
List-Post: <mailto:scitt@ietf.org>
List-Help: <mailto:scitt-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/scitt>, <mailto:scitt-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 24 May 2023 20:00:19 -0000

It may be valuable to map this discussion, and particularly the terms being
used, to the Terminology
<https://datatracker.ietf.org/doc/html/draft-ietf-scitt-architecture-01#name-terminology-7>
section of the architecture draft.

For instance, what the draft calls "Feed" ("an identifier chosen by the
Issuer for the Artifact") seems close to some of the concepts in this
thread — and I wonder if it's immediately substitutable as-is for Ray's "an
id when the first artifact is submitted" or if they're different things.

Similarly perhaps one should understand STATEMENT TYPE used here as the
(RFC6838) "media type" of the "Statement" defined in the draft? Or maybe
not!

And so on.

Isaac

On Wed, May 24, 2023 at 1:24 PM Dick Brooks <
dick@reliableenergyanalytics.com> wrote:

> Ray,
>
>
>
> I think we are close in our view of what a Transparency Service must
> provide, from the consumers perspective.
>
>
>
> The Registration Policy must identify the type of statements that are
> recorded in the log/registry with a given Transparency Service offering.
>
> In the case of Person Information, you are exactly correct, Blood Pressure
> is another “STATEMENT TYPE” that can be found in the log, along with other
> statements type, i.e. Height, Weight, Salary, Eye Color, Hair Color, etc.
> The same will be true for software related artifacts, i.e. SBOM, VDR, Trust
> Score, all for a given SUBJECT ( identified by
> ProductSupplierName/ProductName/ProductVersion and a UNIQUEID, i.e. SHA-256
> Hash Value), the Statement is associated with a TIMESTAMP indicating the
> effective date of the statement and is submitted by an authorized ISSUER
> (i.e., An Auditing Firm) to the SCITT Transparency Service (i.e. RKVST),
> where the information is recorded.
>
>
>
> Consumers can query the SCITT Transparency Service for specific a specific
> “STATEMENT TYPE” for a known SUBJECT using a UNIQUEID to retrieve all
> logged Statements about the SUBJECT.
>
>
>
> Here is an example of how we do this consumer query in SAG-CTR:
>
> Consumer downloads the installation file for a Software Product which
> contains known information about the product,
> ProductSupplierName/ProductName/ProductVersion) and a UNIQUEID (SHA-256
> hash value for the file) then issues a query to retrieve a specific type of
> record using the UNIQUEID:
>
>
>
>
> https://softwareassuranceguardian.com/SAGCTR_inquiry/getSAGScore?FileHash=94EDB27E1995370E9003EE0A0A12D0A7DE2E8DA5EB663C31A97CB4215AB171B6
>
>
>
> The above query returns the SAGScore STATEMENT TYPE for SUBJECT “REA’s
> SAG-PM Version 1.2 software product” installation file.
>
>
>
>
>
> Thanks,
>
>
>
> Dick Brooks
>
>
>
> *Active Member of the CISA Critical Manufacturing Sector, *
>
> *Sector Coordinating Council – A Public-Private Partnership*
>
>
>
> *Never trust software, always verify and report!
> <https://reliableenergyanalytics.com/products>* ™
>
> http://www.reliableenergyanalytics.com
>
> Email: dick@reliableenergyanalytics.com
>
> Tel: +1 978-696-1788 <(978)%20696-1788>
>
>
>
>
>
> *From:* Ray Lutz <raylutz@citizensoversight.org>
> *Sent:* Wednesday, May 24, 2023 12:10 PM
> *To:* dick@reliableenergyanalytics.com; scitt@ietf.org
> *Subject:* Re: [SCITT] Consistency of the registry append-only log.
>
>
>
> I think this can be boiled down to a simple requirement.
>
> 1. First statement A is submitted to the log about artifact 1.
> 2. Second statement B is submitted to the log about artifact 1.
>
> Statement B can mention A, so if you know about B, you can find A.
> But if you only know A was submitted, then can you find B?
>
> Yes, if the artifact is the same and it is exposed as metadata in the
> submission, then we can find both of them.
>
> One solution is to provide an id when the first artifact is submitted.
> Then it is used for all later submissions.
> This puts the responsibility of submitting with the same id on the
> submitter, and does not expose any
> metadata of the artifact, bc the id is generated by the registry.
> Anyone can find all the submissions with the same id.
> And it differs from using the hash of the content as the id because that
> won't work with the case where the product is the same but the artifact is
> different. For example, if the new record is not about artifact 1 (the
> weight) but is about artifact 2 (the blood pressure).
>
> Alternatively, the ID can be provided by the submitter, and used for all
> similar submissions. This would be like the name of the product (or the
> person).
>
> But let's extend your example Dick, to have the blood pressure recorded.
> These are both about the same person. So if you want to find all the
> statements about that person, it is handy to identify the person. If the id
> is returned by the registry, then the work to link them is done by the
> submitter so they can be easily found later.
>
> I have as a goal to reduce the ability of a given identified entity to
> submit false claims. To avoid false claims of any kind, we have to have at
> least one thing held constant. So if we choose that the id of the entity
> submitting the claims is held constant (i.e. they will always identify as
> the same submitting entity), then we can at least find all the entries
> submitted by that entity. And if the entity makes a claim, then we can
> compare that with prior claims by the same entity. And the, I also claim
> that we need to know the subject of the submission, to whatever degree
> makes sense.
>
> In the case of supply chains, I think we can assume we always need to know
> 1) the product and 2) who is responsible for it. There may also be a
> submitter of the record to the log.
>
> These could be exposed in the scitt log so users of the log can understand
> them.
>
> Let's consider black-box submissions that are managed by a returned ID,
> and the users can submit a 'group-with' ID, so the records will be grouped
> together, and when a request for a given id is received, then all the
> records with that group-with ID can be returned. Then the SCITT log knows
> not too much about what is being submitted, and mapping the group-with id
> with the entity and product can be added in another layer.
>
> The questions that need to be considered is how does DID and PURL, etc
> solve any of this, and is this sufficient to provide "consistency" of the
> log, such that an identified but dishonest submitter can lie about what is
> in the log.
>
> --Ray
>
>
>
>
> On 5/23/2023 5:14 AM, Dick Brooks wrote:
>
> Ray,
>
>
>
> IMO, we could examine the SCITT architectural model, especially the
> Transparency Service using a simple, easy to understand use case that would
> serve as a starting point to flush out the details we have been discussing.
>
>
>
> I recommend we discuss the functionality needed from a SCITT Transparency
> Service using “Person Information” as a simple use case to validate and
> enhance design details. For example, we start simple by evaluating how
> SCITT could be used to register “Person Weight” information over time. This
> has many of the characteristics of other temporal information which SCITT
> may be used to report on, such as software products.
>
>
>
> The scenario I proposed goes like this:
>
>
>
> A Transparency Service is used to report “Person Information” as the
> registration policy. Only information pertaining to a Person may be
> registered in this Transparency Service.
>
>
>
> Each Day a person, the SUBJECT, is weighed by an authorized party, the
> STATEMENT ISSUER, who submits this information to the Transparency Service
> that registers Person Information. Log Records are never deleted, the log
> is append only.
>
>
>
> The Transparency Service “logs” this information, for example:
>
>
>
> StatementType: PERSON WEIGHT
>
> Person: Dick Brooks
>
> UniqueID: SHA-256 Hash Value to identify Dick Brooks uniquely
>
> Weight: 180
>
> UnitOfMeasure: US Pounds
>
> EffectiveDateTimeUTC: 19790101T14:00:00Z
>
>
>
> StatementType: PERSON WEIGHT
>
> Person: Dick Brooks
>
> UniqueID: SHA-256 Hash Value to identify Dick Brooks uniquely
>
> Weight: 201
>
> UnitOfMeasure: US Pounds
>
> EffectiveDateTimeUTC: 19850101T14:00:00Z
>
>
>
> And this continues for the life of the SUBJECT recording different
> Statement Types at increasing dates/times, i.e., Height, Salary, Job Title,
> etc. Each one is a different “STATEMENT TYPE”, in SAG-CTR we call these
> ‘Trust Declaration Types”
>
>
>
> A party interested in querying the “Person Information Transparency
> Service” can query for a persons weight or other info, for example:
>
> https://person.transparencyservice/getPersonWeight?UniqueID=hashvalue
>
>
>
> returning the weight information for all weight records in the log
> associated with the UniqueID.
>
>
>
> IMO, this abstraction makes it easier to think through how the model works
> using a simple, easy to understand use case that correlates to the software
> use case. A software product is identified by 3 data elements, based on
> NTIA minimum elements: SupplierName/ProductName/ProductVersion and a unique
> SHA-256 hash value of the software product installation object, i.e.
> container, install file, tar file, etc.
>
>
>
> This approach would help us define a process to evaluate other use cases,
> which we would document and follow for all use cases in SCITT.
>
>
>
> Thanks,
>
>
>
> Dick Brooks
>
>
>
> *Active Member of the CISA Critical Manufacturing Sector, *
>
> *Sector Coordinating Council – A Public-Private Partnership*
>
>
>
> *Never trust software, always verify and report!
> <https://reliableenergyanalytics.com/products>* ™
>
> http://www.reliableenergyanalytics.com
>
> Email: dick@reliableenergyanalytics.com
>
> Tel: +1 978-696-1788 <(978)%20696-1788>
>
>
>
>
>
> *From:* SCITT <scitt-bounces@ietf.org> <scitt-bounces@ietf.org> *On
> Behalf Of *Ray Lutz
> *Sent:* Tuesday, May 23, 2023 1:20 AM
> *To:* scitt@ietf.org
> *Subject:* [SCITT] Consistency of the registry append-only log.
>
>
>
> We were talking today about how to deal with various related records that
> may be submitted at various times in the log.
> I will outline some thoughts here with the caveat that I have to spend a
> night thinking about it.
>
> I thought at first that the log could be quite brain-dead about this, and
> just accept a payload and secure it with a receipt. If the user wants to
> group them together, then the submitter can submit additional entries that
> provide the grouping. But I don't think this is good enough for most use
> cases.
>
> First, we must admit that records related to specific supply chain
> products will not all be submitted at the same time. In the SW use case,
> there are various points in the development and release and testing cycle
> when submissions about the same product must be submitted separately, but
> thought of as a set.
>
> For a simple sw product, there may be a release of the source code, and
> also later several related submissions, such as testing results.
>
> For the election use case I am currently working to resolve, there appear
> to be several release stages:
> 1. Release of a set of all the public keys of the devices that will be
> deployed.
> 2. Incremental releases of cryptographic evidence related to scanning of
> ballots (without releasing the ballots or the vote).
> 3. Release of all the evidence for auditing and review.
> 4. Final release of the result of the election upon certification.
> 5. Submission of auditing results based on all the evidence found.
>
> There are similar scenarios in the sw space, usually related to critical
> code that must be tested to some agreed criteria. The example I used in the
> call was of say self-driving vehicle software that must be tested to a
> standard test suite or the mfr will be liable for crashes. We can't trust
> them to tell the truth about prior submissions, that is one of the great
> things about the append-only log.
>
> These are all logically all part of the same election, so we need a way to
> submit incrementally.
> But one threat scenario is a compromised official or hacker, who may want
> to hope that we will forget about the original public keys, and substitute
> a compromised set so the results can be modified. Thus, we can't rely on
> the submitter to be honest about what was submitted, but must build that in.
>
> this is a useful Chat GPT response.
>
> The attribute you are describing is called consistency in the context of a
> registry built on an append-only log. Consistency ensures that the registry
> maintains a reliable and accurate state, where all related records are
> present and there are no additional or missing entries.
>
> In an append-only log, new entries are added to the end of the log,
> preserving the order in which they were appended. To achieve consistency,
> the registry must adhere to certain principles:
>
>    1. Append-only: The registry strictly follows the append-only rule,
>    meaning that once a record is appended to the log, it cannot be modified or
>    deleted. This ensures that the log maintains a reliable and immutable
>    history of all entries.
>    2. Atomicity: Operations on the registry should be atomic, meaning
>    they are treated as indivisible units of work. This guarantees that each
>    operation either succeeds completely or fails, preventing partial updates
>    that could compromise consistency.
>    3. Strong consistency guarantees: The registry should provide strong
>    consistency guarantees to ensure that all related records are found and
>    there are no additional or missing entries. This means that queries made to
>    the registry will always reflect the most up-to-date state of the log, and
>    the registry will not return stale or inconsistent results.
>    4. Indexing and querying: The registry should support efficient
>    indexing and querying mechanisms to allow users to find relevant entries
>    quickly and accurately. This can be achieved through various techniques,
>    such as maintaining indexes or using data structures optimized for
>    querying, like balanced search trees or hash tables.
>
> ==========
> The other important aspect is that the data can be transparently
> inspected. Others can check that the log isn't lying. So the consistency
> guarantee will be important, esp. bc we need to know for sure that there
> are no forgotten entries related to the product of concern.
>
> Sure, the system could provide completely obscured usage, which would be
> fine if there were no incremental submissions for the same product.
>
> Thus, we will need to also provide some exposed metadata with enough
> structure to at least identify the submitter and the product, when
> consistency attribute is desired.
>
> This is not fully thought through but it is a start.
>
> --Ray
>
>
>
>
>
>
>
> --
>
> -------
>
> Ray Lutz
>
> Citizens' Oversight Projects (COPs)
>
> http://www.citizensoversight.org
>
> 619-820-5321 <(619)%20820-5321>
>
>
>
> --
>
> -------
>
> Ray Lutz
>
> Citizens' Oversight Projects (COPs)
>
> http://www.citizensoversight.org
>
> 619-820-5321 <(619)%20820-5321>
>
> --
> SCITT mailing list
> SCITT@ietf.org
> https://www.ietf.org/mailman/listinfo/scitt
>