[sacm] Benjamin Kaduk's Discuss on draft-ietf-sacm-coswid-20: (with DISCUSS and COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Tue, 15 February 2022 07:54 UTC
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-sacm-coswid@ietf.org, sacm-chairs@ietf.org, sacm@ietf.org, Christopher Inacio <inacio@cert.org>, Karen O'Donoghue <odonoghue@isoc.org>, inacio@cert.org
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <164491166777.25459.5492865802012578305@ietfa.amsl.com>
Date: Mon, 14 Feb 2022 23:54:27 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/sacm/WDA9Ok2PzxyS3EZ2uUR5Gc8ah1c>
Subject: [sacm] Benjamin Kaduk's Discuss on draft-ietf-sacm-coswid-20: (with DISCUSS and COMMENT)
Benjamin Kaduk has entered the following ballot position for
draft-ietf-sacm-coswid-20: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/blog/handling-iesg-ballot-positions/
for more information about how to handle DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-sacm-coswid/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

The volume of my comments notwithstanding, this document was actually
quite nice to read.  I think that these discuss points, at least, should
be fairly straightforward to resolve.

(1) In a number of places we have text roughly of the form:

    String values based on a <particular type of name> from <a particular
    IANA registry> MUST NOT be used, as these values are less concise than
    their index value equivalent.

This seems like it could have some nasty interactions with updates to the
IANA registry in question, especially if consumers attempt to enforce the
MUST NOT.  Consider a version scheme "foo", used by an implementation M to
emit CoSWID tags.  Implementation M is old and predates "foo"'s
registration, so it uses the text form.  Implementation N postdates "foo"'s
registration and knows to use the integer form for encoding it.  But if N
insists on the integer form for decoding, it will reject M's tags, and
needlessly so.  So I think we need a warning that the "MUST NOT" is only for
encoding, and that decoders MUST accept both forms (at least for names not
listed in this document).

(2) Section 4.1 contains SHOULD-level guidance to use the "semver" version
scheme when the value matches the semantic versioning syntax.  That seems
like it would be highly problematic if the version number only happens to
match the syntax by accident and does not actually match the semantic
versioning semantics.  Shouldn't we be giving recommendations based on the
underlying (intended) semantics rather than just the syntax?
(A similar concern might apply to the recommendation to use any scheme other
than "alphanumeric", but there are not really well-known semantics for the
"alphanumeric" syntax such that expectations of semantics would fail to be
met if the wrong version scheme was assigned.)

(3) The integer values assigned to link ownership values disagree between
Table 5 and the CDDL.  (The IANA registry guidance matches Table 5.)
I did not attempt to obtain a copy of ISO/IEC 19770-2:2015 to confirm
whether it uses integer identifiers that we want to maintain compatibility
with -- the prose in §4.3 is a little unclear as to whether such
compatibility is relevant since it only talks about "values" that are to
match.

(4) It's quite possible that I'm just confused about one or both of the
statements in question, but it seems like there may be some inconsistency
between §2.7's "This specification does not define how to resolve an XPath
query in the context of CBOR" and §5.2's "This XPath is evaluated over SWID
or CoSWID tags found on a system" (with, IIRC, a couple other relevant
mentions elsewhere).  My understanding is that a CoSWID tag is intrinsically
represented in a CBOR form, so I'm not sure how one could cause an XPath
evaluation to match without having defined semantics for evaluating that
query in a CBOR context.

(5) There are a couple of references to first-come, first-served
allocations for SWID index value registrations (e.g., §2's "new constructs
are assigned a unique index value on a first-come, first- served basis",
§6.2.1's "New index values will be provided on a First Come First Served as
defined by [BCP26]", but I do not see any direction to IANA to create a
registry using such an allocation policy for any range of the registry in
question.  It seems like this indicates some internal inconsistency to be
resolved, but I'm not entirely sure what the proper resolution is.

(6) Section 6.2.2 attempts to provide a namespaced scheme for distributed
allocation of unique (collision-free) names for private-use index values,
but I do not think it admits a unique partition into "domain.prefix" and
"name" by treating U+002D HYPHEN-MINUS as a separator, since that character
is valid in both LDH hostnames and in NMTOKEN names.  This makes it
impossible to guarantee uniqueness, since we could have different
partitionings of the same consolidated name into the underlying components.

(7) We seem to have conflicting statements in §7 about how a signed CoSWID
tag is represented.  First we say that "[a] CoSWID tag MUST be wrapped in a
COSE Single Signer Data Object (COSE_Sign1) that contains a single signature
and MUST be signed by the tag creator", but just a few paragraphs later we
say that "[t]he COSE_Sign structure that allows for more than one signature to
be applied to a CoSWID tag MAY be used", but following the MAY would violate
the MUST.  Furthermore(!), the last paragraph of the section says only that
"[a] CoSWID SHOULD be signed, using the above mechanism", which again is in
conflict with the MUST.  (Section 8 goes on to admit the possibility of
unsigned tags as well as both forms of signed tag, and Section 9 includes "a
signature provided by the supplier if present in the CoSWID tag".)

(8) Table 1 seems to be missing an entry for
$$resource-collection-extension, defined in §2.9.2 and appearing in multiple
other locations.


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

I put the (hopefully!) editorial bits I had specific suggestions for in github at
https://github.com/sacmwg/draft-ietf-sacm-coswid/pull/49

Section 1

   percent reduction of size in generic usage scenarios.  Additional
   size reduction is enabled with respect to the memory footprint of XML
   parsing/validation as well as the reduction of stack sizes where XML
   processing is now obsolete.

I suggest clarifying regarding "stack", i.e., if this is the memory
allocation chunk that is not the heap, or if this is the amount of
executable code on the system.

Section 2

   two formats.  In such cases, a manual mapping will need to be used.
   These cases are specifically noted in this and subsequent sections
   using an [W3C.REC-xpath20-20101214] where a manual mapping is needed.

I feel like there's some additional text needed around the W3C reference.

Section 2.1

   All names registered with IANA according to requirements in
   Section 6.2 also MUST be valid according to the XML Schema NMTOKEN
   data type (see [W3C.REC-xmlschema-2-20041028] Section 3.3.4) to
   ensure compatibility with the SWID specification where these names
   are used.

The secdir reviewer notes that NMTOKEN has a very large expansion space and
that some further restriction is likely merited (really, in all instances
where NMTOKEN is used, but I'll just mention it once).  What would motivate
allowing the full NMTOKEN space without such restriction?

Section 2.3

     ? payload-or-evidence,

The prose goes on to describe 'payload' and 'evidence' separately, which is
perhaps a slight bump in the road for the reader.

      identifier as defined by [RFC4122].  There are no strict
      guidelines on how this identifier is structured, but examples
      include a 16 byte GUID (e.g. class 4 UUID) [RFC4122], or a text
      string appended to a DNS domain name to ensure uniqueness across
      organizations.

Is there existing deployed reality w.r.t. "DNS domain name" vs "reversed DNS
domain name"?  The latter seems more attractive in theory for this use case.

   *  tag-version (index 12): An integer value that indicate the
      specific release revision of the tag.  Typically, the initial
      value of this field is set to 0 and the value is monotonically
      increased for subsequent tags produced for the same software
      component release.  [...]

I guess "typical" would be "increase by one" each time, which is not exactly
"monotonic" (or even "strictly monotonic" which is what we really need).

   *  media (index 10): This text value is a hint to the tag consumer to
      understand what target platform this tag applies to.  This item
      item MUST be formatted as a query as defined by the W3C Media
      Queries Recommendation (see [W3C.REC-css3-mediaqueries-20120619]).
      Support for media queries are included here for interoperability
      with [SWID], which does not provide any further requirements for
      media query use.  Thus, this specification does not clarify how a
      media query is to be used for a CoSWID.

Following the W3C reference, it's quite hard to see how the media query
would indicate a target platform, but I trust that the authors are
accurately portraying the situation w.r.t. SWID and providing the field for
compatibility purposes.

Section 2.4

Do we want to say what a consumer should do if it encounters a CoSWID tag
that violates the protocol constraints?

Section 2.6

   *  reg-id (index 32): The registration id value is intended to
      uniquely identify a naming authority in a given scope (e.g.
      global, organization, vendor, customer, administrative domain,
      etc.) for the referenced entity.  The value of a registration ID
      MUST be a RFC 3986 URI.  The scope will usually be the scope of an
      organization.

What scheme(s) might we expect to see for these URIs?

Section 2.7

   *  artifact (index: 37): To be used with rel="installation-media",
      this item's value provides the path to the installer executable or
      script that can be run to launch the referenced installation.

Is this a filesystem path, an arbitrary URI, or something else?
If a filesystem path, does it need to be an absolute path?  If relative
paths are allowed, how is the base determined?

      -  If no URI scheme is provided, then the URI-reference is a
         relative reference relative to the URI of the CoSWID tag.  For
         example, "./folder/supplemental.coswid".

What is "the URI of the CoSWID tag"?  I don't see a protocol element in
the concise-swid-tag root map that would obviously convey such a thing.
Would this necessarily be a "swid:" URI that leverages the tag-id of the tag
in question?

   *  media-type (index 41): A link can point to arbitrary resources on
      the endpoint, local network, or Internet using the href item.  Use
      of this item supplies the resource consumer with a hint of what
      type of resource to expect.  [...]

I know we already say "hint", but I'd consider explicitly stating that there
is no obligation for the server hosting the target of the URI to use the
indicated media type when the URI is dereferenced.

Section 2.8

      release.  This version is intended to be used for string
      comparison only and is not intended to be used to determine if a
      specific value is earlier or later in a sequence.

"string comparison" implies automated or mechanical use.  Is that the
intent, as opposed to just "interpretation by humans"?

   *  generator (index 50): The name (or tag-id) of the software
      component that created the CoSWID tag.  If the generating software
      component has a SWID or CoSWID tag, then the tag-id for the
      generating software component SHOULD be provided.

The 'generator' is only defined to hold "text", but the 'tag-id' could be
either text or "bstr .size 16".  How would a bstr tag-id be used here?

Section 2.9.1

   The number used as a value for hash-alg-id is an integer-based hash
   algorithm identifier who's value MUST refer to an ID in the IANA
   "Named Information Hash Algorithm Registry" [IANA.named-information]
   with a Status of "current"; other hash algorithms MUST NOT be used.

"current" as determined when?  Surely this is not introducing a requirement
to consult the live registry prior to any operation on the CoSWID tag...

   If the hash-alg-id is not known, then the integer value "0" MUST be
   used.  This ensures parity between the SWID tag specification [SWID],
   which does not allow an algorithm to be identified for this field.

I'm not sure that "ensures parity" is the best phrasing here; do we mean to
say that it "allows for conversion from" the SWID tags due to the latter
specification not indicating the hash algorithm used?

Section 2.9.2

Some elements of the resource-collection group include information about
filesystem paths.  But if CoSWIDs are immutable after creation, does that
force a certain filesystem hierarchy structure on a consumer of that
software (in effect, squatting on the filesystem namespace per BCP 190)?  Is
there any mechanism for supplemental tags to indicate that different
filesystem paths are to be used?

   The CDDL for the resource-collection group follows:

   path-elements-group = ( ? directory => one-or-more<directory-entry>,
                           ? file => one-or-more<file-entry>,
                         )

   resource-collection = (
     path-elements-group,

Is there a reason to not put the "resource-collection" definition first in
the CDDL listing?

Also, I'm not sure I understand the semantics of the array form for both
'directory' and 'file'.  I guess, in order for there to be a parallel
interpretation between the two, it would have to be that each list holds the
elements of the corresponding type that are children of the current
directory.  But an ordered list of "directory" elements might also be
interpreted as a path hierarchy, so some clarification seems in order.

   file-entry = {
     filesystem-item,
     ? size => uint,
     ? file-version => text,
     ? hash => hash-entry,
     * $$file-extension,
     global-attributes,
   }

How would we achieve algorithm agility (BCP 201) for the hash algorithm used
to verify the file's contents?

   *  file-version (index 21): The file's version as reported by
      querying information on the file from the operating system.  This
      item maps to '/SoftwareIdentity/(Payload|Evidence)/File/@version'
      in [SWID].

Not all file systems record file version information; do we want to mention
that limitation here?

   *  location (index 23): The filesystem path where a file is expected
      to be located when installed or copied.  The location MUST be
      either relative to the location of the parent directory item
      (preferred) or relative to the location of the CoSWID tag if no
      parent is defined.  [...]

Does the "location of the CoSWID tag" refer to a location embedded in the
tag or the location on disk where the tag is found?

   *  type (index 29): A string indicating the type of resource.

Is this human-readable, from a registry, ...?

Section 2.10

Thanks for automatically generating the consolidated CDDL block; it saves a
lot of time reviewing it for consistency.

Section 4.x

It looks like in the actual registries we are going to mark 0 as reserved;
should we indicate that in these sections as well?

Section 4.1

I suggest including some example multipart version numbers that include
multi-digit components (and confirming that they sort as integers by
component).

Section 5.2

                Tags to be evaluated include all tags in the context of
   where the tag is referenced from.  For example, when a tag is
   installed on a given device, that tag can reference related tags on
   the same device using a URI with this scheme.

This definition of "the context" feels rather under-specified and prone to
conflicting interpretations.

   For URIs that use the "swidpath" scheme, the requirements apply.

Is there a word missing here (maybe "following")?

Section 6.1

Is index 30 available for registration or should it be marked as reserved?

Section 6.2.2

   domain.prefix-name

   Where "domain.prefix" MUST be a valid Internationalized Domain Name
   as defined by [RFC5892], and "name" MUST be a unique name within the

I would suggest using the keyword "U-label" from RFC 5890 here.  While using
RFC 5892 as the reference implicitly requires U-label form, it seems more
comfortable for the reader to lay it out explicitly.

   namespace defined by the "domain.prefix".  Use of a prefix in this
   way allows for a name to be used initially in the private use range,
   and to be registered at a future point in time.  This is consistent
   with the guidance in [BCP178].

Is the intention that just the "name" suffix part be what is registered, or
the whole "domain.prefix-name" construct?

Section 6.2.3

   Designated experts MUST ensure that new registration requests meet
   the following additional guidelines:

If they MUST be met, that sounds like "criteria" rather than "guidelines".

That said, in other protocol ecosystems managed by the IETF, we've been
shifting towards much more lenient registration policies, in some cases
essentially a "shall-issue" guidance to the experts.  This has been an
evolution in response to codepoint squatting and is an attempt to make
registration so easy that we can pretty reliably have all codepoints being
used be reflected in the registry.  Attempting to use the registry and the
experts as a gatekeeping function may not always have the desired effect.

   *  Index values and names outside the private use space MUST NOT be
      used without registration.  This is considered squatting and
      SHOULD be avoided.  Designated experts MUST ensure that reviewed
      specifications register all appropriate index values and names.

Why are we matching a SHOULD with a MUST NOT?  Those are different
requirements levels.

Section 6.2.4

                                         Guidelines on how to deconflict
   these value spaces are defined in Section 4.1.

I'm not sure which part of §4.1 we're trying to refer to, here -- I see
guidance on how to interpret values using the version schemes registered by
this document, but am not finding much about how to define new version
schemes in the future.

Section 6.3

   Fragment identifier considerations: Fragment identification for
   application/swid+cbor is supported by using fragment identifiers as
   specified by Section 9.5 of [RFC8949].

Hmm, that reference says:

%  Fragment Identifier Considerations:  The syntax and semantics of
%     fragment identifiers specified for +cbor SHOULD be as specified
%     for "application/cbor".  (At publication of RFC 8949, there is no
%     fragment identification syntax defined for "application/cbor".)

I think it might be more typical to repeat the "SHOULD be as specified ...
at publication of RFC-AAA, there is no fragment identification syntax
defined..." text in this document rather than to say that it's supported and
point to a reference that says it isn't supported.  (But I'm not really an
expert here, and it's a shame that the request for review on the media-types
list didn't get any responses back in October.)

   Magic number(s): first five bytes in hex: da 53 57 49 44

This magic number only holds if the toplevel concise-swid-tag is wrapped by
an explicit tag, but the use of the dedicated media type would normally
render the use of such a tag unnecessary.  (It also only holds if the
requested CBOR Tag value is allocated, of course.)  Do we need/want to
require the presence of this explicit tag somewhere in this document?

Section 6.7

   TAG_CREATOR_REGID "_" "_" UNIQUE_ID

I think this construction suffers a lack of unique decomposition akin to
what I describe in Discuss point (6) -- underscore, including double
underscore, seems to be permitted in both the reg-id ("any-uri") and in the
textual form of the tag-id.

Section 7

   Signing CoSWID tags follows the procedures defined in CBOR Object
   Signing and Encryption [RFC8152].  [...]

draft-ietf-cose-rfc8152bis-struct is in AUTH48 waiting for me to reply to
some RFC Editor questions on Jim Schaad's behalf.  I expect it to be
published before this document is ready, so updating the reference may be in
order.

   protected-signed-coswid-header = {
       1 => int,                      ; algorithm identifier
       3 => "application/swid+cbor",
       4 => bstr,                     ; key identifier

I note that the COSE spec does not require kid to be in the protected
headers, since it's not guaranteed to be unique and is just a hint as to
what key was used.

   Additionally, the COSE Header counter signature MAY be used as an
   attribute in the unprotected header map of the COSE envelope of a
   CoSWID.  The application of counter signing enables second parties to
   provide a signature on a signature allowing for a proof that a
   signature existed at a given time (i.e., a timestamp).

Mention of COSE countersignatures should reference
draft-ietf-cose-countersign since the RFC 8152 countersignature does not
provide the desired properties.

Section 9

A handful of additional potential security considerations to include:

We allow the use of hash values accompanied by a "hash algorithm not known"
identifier, which seems to expose some risk of cross-algorithm attacks
that's worth noting here.  I believe there only becomes practical impact if
some endpoints allow use of an insecure algorithm, but of course we may not
know immediately if an algorithm becomes insecure.

We might also note that an attacker might try to prevent a CoSWID consumer
from learning about new (tag-)versions of a given CoSWID.  CoSWID tags do
not have expiration dates or required freshness checks, so in principle a
powerful attacker could keep an endpoint stuck on an old (vulnerable)
tag-version for quite some time and leverage the vulnerable old tag in some
manner.

Does [SWID] have any discussion of security considerations that we want to
incorporate by reference?

It seems like entitlement-keys have the potential for actually being
confidential information in the CoSWID tag, that should not be distributed
widely.

The use of "persistent-id" for grouping components could allow an attacker
to maliciously claim that their software is a member of some other group.

Errors in setting the 'key' field of a filesystem-item could lead to bad
results from a check for whether a given component is installed.

We probably want to incorporate the security considerations of RFC 8949 by
reference, even if we do already mention that decoders need to exercise
caution in certain regards.

                                                  To support signature
   validation, there is the need associate the right key with the
   software provider or party originating the signature.  This operation
   is application specific and needs to be addressed by the application
   or a user of the application; a specific approach for which is out-
   of-scope for this document.

I feel like we should say something about the need to protect the integrity
of this database of hat keys are authorized to sign which CoSWID tags.

   When an authoritative tag is signed, the originator of the signature
   can be verified.  A trustworthy association between the signature and
   the originator of the signature can be established via trust anchors.
   [...]

I feel like somewhere in here would be a good place to note that just
because there is a signature, and even that the signature can be validated,
does not mean that the CoSWID tag is suitable for a certain use.  The
trustworthy association mentioned here really is required, and it may not be
a simple matter of "all signatures that can be chained to a trust anchor are
trusted for all purposes".

                                                       Consumers of
   CoSWID tags would need to validate the tag using the new credentials
   and would also need to revoke certificates associated with the
   compromised credentials to avoid validating tags signed with them.

Consumers would not need to "revoke" certificates associated with
compromised credentials, but rather consume revocation information about
those certificates, with the revocation information being issued by the
original issuer of the certificates in question.

   CoSWID tags are intended to contain public information about software
   components and, as such, the contents of a CoSWID tag does not need
   to be protected against unintended disclosure on an endpoint.

I know we cover this later on, but we might mention here that the
association of a collection of CoSWID tags with a particular endpoint may
merit confidentiality protection.

   Since the tag-id of a CoSWID tag can be used as a global index value,
   failure to ensure the tag-id's uniqueness can cause collisions or
   ambiguity in CoSWID tags that are retrieved or processed using this
   identifier.  [...]

It seems that a malicious CoSWID issuer could trivially cause such
collisions.  It might be worth discussing this, and the potential
mitigations (don't use the issuer anymore!).

   applications that are vulnerable to certain types of attacks.  As
   noted earlier, CoSWID tags are designed to be easily discoverable by
   an endpoint, but this does not present a significant risk since an

I think we specifically said "*authorized* applications and users" earlier
(emphasis mine), so we might want to qualify the "easily discoverable" here
as well.

Section 11.1

Whether [SAM] or [SEMVER] needs to be classified as normative is not
entirely clear.

X.1520 probably would be fine as informative.

NITS

Section 2.9.1

   The hash-value MUST represent the raw hash value in byte
   representation (in contrast to, e.g., base64 encoded byte
   representation) of the byte string that represents the hashed
   resource generated using the hash algorithm indicated by the hash-
   alg-id.

I'm not sure that "hashed resource" is the most common phrasing for this
sentiment.  Maybe it's the "hash over the [representation of the] resource"?

Section 2.9.2

   *  root (index 25): A filesystem-specific name for the root of the
      filesystem.  The location item is considered relative to this

Is it a filesystem-specific name or a host-specific name?
[sacm] Benjamin Kaduk's Discuss on draft-ietf-sac… Benjamin Kaduk via Datatracker
Re: [sacm] Benjamin Kaduk's Discuss on draft-ietf… Henk Birkholz
Re: [sacm] Benjamin Kaduk's Discuss on draft-ietf… Henk Birkholz