[Anima] Benjamin Kaduk's No Objection on draft-ietf-anima-grasp-api-08: (with COMMENT)

Benjamin Kaduk via Datatracker <noreply@ietf.org> Thu, 03 December 2020 02:03 UTC

Return-Path: <noreply@ietf.org>
X-Original-To: anima@ietf.org
Delivered-To: anima@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id 3857A3A07D3; Wed, 2 Dec 2020 18:03:53 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
From: Benjamin Kaduk via Datatracker <noreply@ietf.org>
To: "The IESG" <iesg@ietf.org>
Cc: draft-ietf-anima-grasp-api@ietf.org, anima-chairs@ietf.org, anima@ietf.org, Sheng Jiang <jiangsheng@huawei.com>, jiangsheng@huawei.com
X-Test-IDTracker: no
X-IETF-IDTracker: 7.23.0
Auto-Submitted: auto-generated
Precedence: bulk
Reply-To: Benjamin Kaduk <kaduk@mit.edu>
Message-ID: <160696103273.15916.4258244407672253133@ietfa.amsl.com>
Date: Wed, 02 Dec 2020 18:03:53 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/anima/ekVGi7Xqyk2jddgL7ajIxVOsu9g>
Subject: [Anima] Benjamin Kaduk's No Objection on draft-ietf-anima-grasp-api-08: (with COMMENT)
X-BeenThere: anima@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Autonomic Networking Integrated Model and Approach <anima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/anima>, <mailto:anima-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/anima/>
List-Post: <mailto:anima@ietf.org>
List-Help: <mailto:anima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/anima>, <mailto:anima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 03 Dec 2020 02:03:53 -0000

Benjamin Kaduk has entered the following ballot position for
draft-ietf-anima-grasp-api-08: No Objection

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)

Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.

The document, along with other ballot positions, can be found here:


I have two comments in particular that I would like to call your
attention to: my comment on cache flushing in Section 2.3.4, and my
comment on the CBOR data model used for validation in Appendix A.

Section 1

   An ASA runs in an ACP node and therefore inherits all its security
   properties, i.e., message integrity, message confidentiality and the
   fact that unauthorized nodes cannot join the ACP.  All ASAs within a

I agree with Roman's comment that the "it" whose security properties are
inhereited is the ACP *node*, not the ACP itself, and thus that some
rewording is appropriate.

   The GRASP API library would need to communicate with the GRASP core
   via an inter-process communication (IPC) mechanism.  The details of

Hmm, if the GRASP core is in kernel-space and the API library in
userspace, wouldn't we normally refer to that exchange as a system call
rather than IPC?  (Figure 1 also labels this interaction "IPC".)

Section 2.1

   *  Authorization of ASAs is not defined as part of GRASP and is not

Any chance I could interest you in s/not supported/a subject for future
work/?  It is looking somewhat likely since such a statement is already
present in the security considerations...

   *  User-supplied explicit locators for an objective are not
      supported.  The GRASP core will supply the locator, using the ACP
      address of the node concerned.

This would seem to prevent any non-ACP use of GRASP; I suggest adding
some language with a caveat about "for example" or similar, unless the
intent is to limit the API usage to ACP (or DULL) scenarios.

Section 2.2.1

I think that the possibility for a single outbound message to get a
sequence of incoming replies (at different times) further complicates
the design of an asynchronous mechanism, and we would do well to discuss
how such scenarios (e.g., broadcast discovery messages) would be handled
by the implementation and API.  (I see that we do end up using a timeout
in practice to resolve this topic, but would probably still mention it
as an issue that has been resolved, here.)

Section 2.2.2

   ports rather than a separate port per session.  Hence the GRASP
   design includes a session identifier.  Thus, when necessary, a
   'session_nonce' parameter is used in the API to distinguish
   simultaneous GRASP sessions from each other, so that any number of
   sessions may proceed asynchronously in parallel.

I do see that there was previous discussion on the 'nonce' terminology
here, and I am unsure why there is need to move away from the "session
ID" terminology used in GRASP itself.  In particular, the
"session_nonce" is not a number used *once*, rather, it is used only for
one session (but potentially multiple times within that session).  That,
to me, makes it a (short-lived) identifier, not a nonce.  Roman's
proposal of 'handle' would resolve this apparent disparity.

Section 2.2.3

   On the first call in a new GRASP session, the API returns a
   'session_nonce' value based on the GRASP session identifier.  This

What does "based on" mean?  Does there need to be a one-to-one
correspondence?  Or just in one direction?  Are we going to be
constrained by the (IMO, too limited) 32 bits of randomness limit of the
GRASP Session ID?


   -  Note 3: In a language such as C the preferred implementation
      may be to represent the Boolean flags as bits in a single byte,

Which aspect(s) of C are relevant for the "such as"?

   An essential requirement for all language mappings and all
   implementations is that, regardless of what other options exist
   for a language-specific representation of the value, there is
   always an option to use a raw CBOR data item as the value.  The
   API will then wrap this with CBOR Tag 24 as an encoded CBOR data
   item [RFC7049] for transmission via GRASP, and unwrap it after

I'm not sure I understand why the bstr wrapping is mandatory -- I would
have thought that the attraction of using a raw encoded CBOR data item
would be that it could be used directly, without additional wrapping.

    int loop_count;
    int value_size;           // size of value in bytes

Some people might argue for using unsigned types for at least sizes
(e.g., size_t), and often for things like loop counts that cannot be
negative (though the argument for an unsigned type there is somewhat

        self.value = 0      # Place holder; any valid Python object

Wouldn't None be a more conventional placeholder in Python?


   *  The following cover all locator types currently supported by

      -  is_ipaddress (Boolean) - True if the locator is an IP address

      -  is_fqdn (Boolean) - True if the locator is an FQDN

      -  is_uri (Boolean) - True if the locator is a URI

Are these mutually exclusive?


As for the GRASP session ID, I think that a 32-bit cap is too
restrictive.  I think we should be in the habit of using 128-bit nonces
and needing to justify anything smaller.  (64 bits would *probably* be
fine here, FWIW, and might make it easier to represent in common
language bindings.)

   Section  Another possible implementation is to hash the
   name of the ASA with a locally defined secret key.

I recognize that this is a throwaway line, but the naive keyed hash
construction is subject to length-extension attacks (for certain hash
constructions such as the Merkle-Damgarg family that includes SHA-2);
HMAC is more robust for this type of usage and can be phrased in an
similarly concise manner ("compute an HMAC of the name of the ASA under
a locally defined secret key").

Section 2.3.3

   *  deregister_asa()
      -  Note - the ASA name is strictly speaking redundant in this
         call, but is present for clarity.

So what happens if the wrong name is passed?

         transmit to other ASAs.  It is not necessary to register an
         objective that is only received by GRASP synchronization or
         Registration is not needed for "read-only" operations, i.e.,
         the ASA only wants to receive synchronization or flooded data
         for the objective concerned.

These seem to have high overlap and thus be candidates for

      -  The 'ttl' parameter is the valid lifetime (time to live) in
         milliseconds of any discovery response for this objective.  The

(nit?) I'd suggest to add "generated", since it would not apply to any
hypothetical received discovery response for the objective in question.

      -  If the parameter 'overlap' is True, more than one ASA may
         register this objective in the same GRASP instance.

Do all ASAs registering this objective have to set it to True, or just
the first one, in order for the subsequent registrations to succeed?

Section 2.3.4

      -  If the parameter 'minimum_TTL' is greater than zero, any
         locally cached locators for the objective whose remaining time
         to live in milliseconds is less than or equal to 'minimum_TTL'
         are deleted first.  Thus 'minimum_TTL' = 0 will flush all

Why does one ASA's request flush entries from the cache shared with
other ASAs?  I am forced to infer the motivation for including the
minimum_TTL parameter in the first place, but it seems like it is useful
if the requesting ASA needs to find something that will remain active
for a given period of time, but different ASAs may have different needs
for the peer's stability, and so flushing the cache in this way could
hamper the operation of peer ASAs.
If the intent is only to not return those cached locators *for this
discovery operation*, then say that, not that they are flushed from the
cache entirely.

Section 2.3.5

Thanks for the figure (I probably should have put one into RFC 7546,
which is basically this section but for the GSS-API).

I suggest noting in the first paragraph that the negotiation occurs in
lockstep, with the initiator starting the negotiation and preparing a
message, the responder processing that message and generating a new
negotiation message in turn, with at most one negotiation message in
flight at any given time.  It seems particularly important to note
whether this also applies to negotiate_wait() calls/messages, or if
those can be made at any time by either entity.  (This probably relates
to some of the genart reviewer's comments.)

I note that the prospect of the loop count going up (and, thus, risk of
infinite looping) was pointed out by the genart review.  I share such
concerns and am happy to see that improved discussion of this topic (and
the related 'lifetime' extension) is planned.

         For this and any other error code, an exponential backoff is
         recommended before any retry.

Any guidance about whether this should be by doubling vs a different
exponent base?  I guess the security considerations do say that it's
dependent on the semantics of the objective in question, which may be
enough (though a pointer or mention here would be appreciated).
(Also, any reason to not use the 2119 RECOMMENDED?)

      -  This function must be followed by calls to 'negotiate_step'
         and/or 'negotiate_wait' and/or 'end_negotiate' until the
         negotiation ends. 'listen_negotiate' may then be called again
         to await a new negotiation.

We just recommended a few paragraph previously that listen_negotiate()
should be called again *immediately* after the first listen_negotiate()
returns; I don't see why it's useful to also say that it might be called
again after a given negotiation ends.

      -  Executes the next negotation step with the peer.  The
         'objective' parameter contains the next value being proffered
         by the ASA in this step.  It must also contain the latest
         'loop_count' value received from request_negotiate() or

This is intreseting; negotiate_step() must preserve the loop count from
the previous call, so only the initial negotiation response (the
request_negotiate() 'proffered_objective' output) can increase the loop
count, not any arbitrary negotiation step?  That seems to limit concerns
about infinite looping (as raised by the genart reviewer and apparently
acknowledged in the response to the genart review).

         o  Threaded implementation: Called in the same thread as the
            preceding 'request_negotiate' or 'listen_negotiate', with
            the same value of 'session_nonce'.

IIUC it is *expected* to be called in the same thread as the previous
call, but is not strictly speaking *required* to do so, since the
session_nonce tracks the library state for the negotiation in question.
Or am I mistaken?

         'result' = True for accept (successful negotiation), False for
         decline (failed negotiation).

         'reason' = optional string describing reason for decline.

What happens if I pass a reason string with result of True?

Section 2.3.6

      -  If the 'peer' parameter is null, and the objective is already
         available in the local cache, the flooded objective is returned
         immediately in the 'result' parameter.  In this case, the
         'timeout' is ignored.

      -  Otherwise, synchronization with a discovered ASA is performed.
         If successful, the retrieved objective is returned in the
         'result' parameter.

>From context this 'otherwise' seems to be the "'peer' parameter is null
but the objective is not available in the local cache" case (as opposed
to also covering the "'peer' parameter is not null" case).  It might be
possible to clarify this with formatting and/or rewording.

   *  synchronize()
      -  Since this is essentially a read operation, any ASA can do it,
         unless an authorization model is added to GRASP in future.
         Therefore the API checks that the ASA is registered, but the
         objective does not need to be registered by the calling ASA.
      -  Since this is essentially a read operation, any ASA can use it.
         Therefore GRASP checks that the calling ASA is registered but
         the objective doesn't need to be registered by the calling ASA.

These seem redundant and candidates for de-duplication.

      -  In the case of failure, an exponential backoff is recommended
         before retrying.

[same remark as previously]

Section 2.3.7

         'info' = optional diagnostic data.  May be raw bytes from the
         invalid message.

This means it does not have to be well-formed CBOR, and will be wrapped
in a bstr by the library?  (The GRASP spec suggests that a different
CBOR structure would be permitted, though of course the API need not be
required to expose such flexibility.)

Section 4

If we're going to keep the 32-bit nonce/handle/etc, it's probably worth
a mention of collision/guessing probability.

It might be worth a reference to the RFC 3986 security considerations
since we do allow URI locators.  This is not really any different than
for GRASP itself, but the URI is exposed to the API consumer and so
reminding them about it seems worthwhile.

The session_nonce is nominally opaque to (non-ACP, at least) ASAs, but
is likely to be implemented in a way that does preserve some state.  Is
there a risk if an ASA attempts to "peek through the abstraction
barrier"?  (I am not sure I see one, but you're the expert!)

   GRASP objective concerned.  These precautions are intended to assist
   the detection of malicious denial of service attacks.

I suggest to drop the word "malicious"; such denial of service
conditions need not be malicious and can occur by accident.

   As a general precaution, all ASAs able to handle multiple negotiation
   or synchronization requests in parallel may protect themselves
   against a denial of service attack by limiting the number of requests
   they can handle simultaneously and silently discarding excess

I think that best practices would also include some limit on the number
of objectives registered by a given ASA and possibly the number of ASAs
registered, to protect the core library/kernel resources.
(nit?) I suggest dropping 'can'.

Appendix A

There was some discussion with the genart reviewer about the CBORfail
error code as being particularly useful.  I note that
draft-ietf-cbor-7049bis is in AUTH48 and introduces a hierarchy of
"levels of validation" (in the form of different data models).  CBOR
that is valid in the generic data model might not be valid in the
extended data model or a data model specific to a given application.  I
strongly encourage this document to update to referencing 7049bis and
giving an indication of what data model is in use for processing both
information received from the peer and any CBOR-encoded data received
from the ASA.

   'noSecurity' error will be returned to most calls if GRASP is running
   in an insecure mode (no ACP), except for the specific DULL usage mode

My understanding of the text in the GRASP spec itself was that non-ACP
security services were allowed.  Is the API intended to be limited to
only ACP usage?

   ASAfull          4 "ASA registry full"  (register_asa)
   dupASA           5 "Duplicate ASA name" (register_asa)
   noASA            6 "ASA not registered"
   notYourASA       7 "ASA registered but not by you"

Giving this much detail is making things much easier for malicious ASAs
... but given that the deployment model basically assumes that such
things don't exist (even if we do give some small consideration to the
possibility in some places), I will not complain about retaining this
level of detail in the error messages.

   noDiscReply     17 "No reply to discovery"

There is perhaps some explanation to give about the distinction between
noReply and noDiscReply, i.e., in the body text.  Maybe it is
self-explanatory, though, provided that the author of the code notices
that noDiscReply exists at all.
Likewise for noNegReply, noSynchReply, noValidSynch, and, possibly,