Re: [Anima] AD review of draft-ietf-anima-grasp-api-06

Brian E Carpenter <brian.e.carpenter@gmail.com> Wed, 30 September 2020 22:28 UTC

Return-Path: <brian.e.carpenter@gmail.com>
X-Original-To: anima@ietfa.amsl.com
Delivered-To: anima@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 235C23A0CFA; Wed, 30 Sep 2020 15:28:04 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.311
X-Spam-Level:
X-Spam-Status: No, score=-2.311 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, NICE_REPLY_A=-0.213, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id mXHykG53N_G2; Wed, 30 Sep 2020 15:28:01 -0700 (PDT)
Received: from mail-pg1-x543.google.com (mail-pg1-x543.google.com [IPv6:2607:f8b0:4864:20::543]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id C2A7F3A0CF4; Wed, 30 Sep 2020 15:28:01 -0700 (PDT)
Received: by mail-pg1-x543.google.com with SMTP id x16so2168005pgj.3; Wed, 30 Sep 2020 15:28:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:subject:to:cc:references:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=9Z5EgVH7hBgTlBQvyPimTY+zeEW7yyX+taATXzPOz+A=; b=NRzsuF466n6vRRkGpOD0rvK/EF6+3ZlYKthNQejFWeaxG2LZBM9Mpa4YWKFmsJjOrJ rI3cc7B2OGa4OizGZSSgD0/U2kd9bbCYFTWJfVRLm24KD9875vO3TdDpD/l8Ye24wjZ3 OFMnDyJki8lcZxnXvGx0+Z4ZQh+Y//rm7ldP/oMl2j0fz+V6IqQClaTbnLvGP/eyczvL Q8T5ZELQkDQSq4q6fA8b2sMTCAdC3F61Uwiilb/6IuIbTPUOcQpVT0fxYn7blw21hkca JD5tdd1poCOKJHKddfJ0Dc2evOuZQ1kpGX5rRTRAQYpmninm8Wgq31/yjeTuhzeMISWj 9B9A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:subject:to:cc:references:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=9Z5EgVH7hBgTlBQvyPimTY+zeEW7yyX+taATXzPOz+A=; b=ijVjdqyED/nHvM7KN6X9tYxSM4eMu7fzD0kLXmpSSKlespF/wDdLLQgenMPrn+6GgQ lECnQS2ogxZYJ0JLPoVVaz0y1TPDmWzSKTJgkpVE62W7/zNNyKyiHbOAx6wNqOs4340z U25uRa0ruHwoG1FKcM4pEOq/FuL1LsRBCQaoXLH6YkTyOM8ZFnkH4vD123LluT/3odNz TdzbD6mN9eP3ItMD1nMgCOhLY443rxeyA82CVFmI1mApyetzdmFR7DqspxaJ1MItU7rs j8pFFgPX2oOqEvsD0O/GS8kD0lM0GWxTAOpyW6qClICn8PbE0Tw7SkKBFTv8FCAE4aB0 J4Kg==
X-Gm-Message-State: AOAM532O8yRnbIT5A8Q9BQGgtCjzyB+SSdqJTvt9k2PWyh7H5HO01Fgi maEOMgASCoYyksiekGAhsJQ=
X-Google-Smtp-Source: ABdhPJxdbBUpQDASKa9tPhPsz8Qk+Dpru8bOVnoMYvqZ9RhPpl/UbAwmitsh507Uffgcdb01S7g3Tg==
X-Received: by 2002:a17:902:b611:b029:d3:89e2:7957 with SMTP id b17-20020a170902b611b02900d389e27957mr420386pls.19.1601504880960; Wed, 30 Sep 2020 15:28:00 -0700 (PDT)
Received: from [192.168.178.20] ([151.210.138.136]) by smtp.gmail.com with ESMTPSA id gl17sm3372747pjb.49.2020.09.30.15.27.58 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 30 Sep 2020 15:28:00 -0700 (PDT)
From: Brian E Carpenter <brian.e.carpenter@gmail.com>
To: "Rob Wilton (rwilton)" <rwilton@cisco.com>, "draft-ietf-anima-grasp-api@ietf.org" <draft-ietf-anima-grasp-api@ietf.org>
Cc: "anima@ietf.org" <anima@ietf.org>, Sheng Jiang <jiangsheng@huawei.com>
References: <MN2PR11MB4366FCFF358BBABBF64CB27CB5350@MN2PR11MB4366.namprd11.prod.outlook.com>
Message-ID: <e547f069-5138-fa04-2c63-ae7d9ead760a@gmail.com>
Date: Thu, 1 Oct 2020 11:27:55 +1300
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.9.1
MIME-Version: 1.0
In-Reply-To: <MN2PR11MB4366FCFF358BBABBF64CB27CB5350@MN2PR11MB4366.namprd11.prod.outlook.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/anima/stDWkP2aPH69XCCD2NEikFLrxXk>
Subject: Re: [Anima] AD review of draft-ietf-anima-grasp-api-06
X-BeenThere: anima@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Autonomic Networking Integrated Model and Approach <anima.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/anima>, <mailto:anima-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/anima/>
List-Post: <mailto:anima@ietf.org>
List-Help: <mailto:anima-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/anima>, <mailto:anima-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 30 Sep 2020 22:28:04 -0000

Thanks for this thorough review, Rob. Sorry about the previous partial reply. See responses in line.

We'll pause for any feedback before updating the draft. In particular, are there any opnions on the example logic flows issue noted below?

On 29-Sep-20 04:19, Rob Wilton (rwilton) wrote:
> Hi,
> 
> Apologies for the delay for this review.
> 
> Please see comments below.  I've also attached them in case they get truncated (which has happened on other reviews). 
> 
> Thank you for this document.  Overall, it is very well written, easy to read and easy to understand.
> 
> 
> Comments:
> 
>     1.  Introduction
> 
>        As the following figure shows, a GRASP implementation could contain
>        two major sub-layers.  The bottom is the GRASP base protocol module,
>        which is only responsible for sending and receiving GRASP messages
>        and maintaining shared data structures.  The upper layer contains
> 
> I found the text, relative to the diagram, to be somewhat confusing, although the subsequent paragraphs make its intent clearer.
> 
> I think that the top sub-layer is meant to be "GRASP Extended Function Modules", and the "Grasp Modules" is the bottom sub-layer.  However, the diagram seems to have a sub-layer split in "GRASP Modules", and doesn't necessarily equate to "GRASP base protocol module".

Perhaps we have confused two aspects: the view that the app writer has of the API, and the way it has to be implemented. More below...
 
> 
>     1.  Introduction
> 
>        It is desirable that ASAs can be designed as portable user-space
>        programs using a system-independent API.  In many implementations,
>        the GRASP module will therefore be split into two layers.  The top
>        layer is a library that provides the API and communicates directly
>        with ASAs.  The lower layer is a daemon, or a set of sub-services,
>        providing GRASP core functions that are independent of specific ASAs,
>        such as multicast handling and relaying, and common data structures
>        such as the discovery cache.  The GRASP API library would need to
>        communicate with the GRASP core via an inter-process communication
>        (IPC) mechanism.  The details of this are system-dependent.
> 
> I also find this text confusing relative to the diagram.  More naturally, I would assume that the boxes represent different processes and hence you would have IPC between ASA and say the "GRASP API Library".  If I was to draw this, then I would probably have put the GRASP API library co-located with the ASA boxes.

Yes. In fact in my demo implementation, when I added an IPC mechanism, that is essentially what I got: stub functions for all the API calls in the app context, then IPC to the core, then the "real" functions and the various daemons/threads in the core context.

I can't instantly provide old/new text to respond to this but we will try to clarify all this.

> 
> 
>     1.  Introduction
>         
>        Note that a simple autonomic node might contain very few ASAs in
>        addition to the autonomic infrastructure components described in
>        [I-D.ietf-anima-bootstrapping-keyinfra] and
>        [I-D.ietf-anima-autonomic-control-plane].  Such a node might directly
>        integrate GRASP in its code and therefore not require this API to be
>        installed.  However, the programmer would then need a deeper
>        understanding of the GRASP protocol than is needed to use the API.
>  
> Perhaps change "integrate GRASP in its code" to something like "integrate the full GRASP implementation in its code"?

Yes (but actually some cases might choose to implement a GRASP subset).

> 
>     2.2.1.  Alternative Asynchronous Mechanisms
> 
>        Thus, some ASAs need to support asynchronous operations, and
>        therefore the GRASP core must do so.  Depending on both the operating
>        system and the programming language in use, there are three main
>        techniques for such parallel operations: multi-threading, an event
>        loop structure using polling, and an event loop structure using
>        callback functions.
> 
> Rather than "there are three main techniques for such parallel operations", perhaps just "three common techniques are considered for parallel operations".

OK

> Reasoning, there are other approaches for effectively handling concurrent operations, e.g., actors, futures, cooperative threads backed by a OS thread pool ...

[Or of course different names for the same thing. Imagine trying to describe all this in terms of ADA rendezvous.]

> 
> Possibly, change "simple function calls" in the multi-threading paragraph to be "simple synchronous function calls" to emphasize the synchronous nature.

OK

> 
> 
>     2.2.2.  Multiple Negotiation Scenario
> 
>        In the callback model, for the scenario just described, the ASAs "A"
>        and "B" will each provide two instances of negotiate_step_received(),
>        one for each session.  For this reason, each ASA must be able to
>        distinguish the two sessions, and the peer's IP address is not
>        sufficient for this.  It is also not safe to rely on transport port
>        numbers for this, since future variants of GRASP might use shared
>        ports rather than a separate port per session.  This is why the GRASP
>        design includes a session identifier.  Thus, when necessary, a
>        'session_nonce' parameter is used in the API to distinguish
>        simultaneous GRASP sessions from each other, so that any number of
>        sessions may proceed asynchronously in parallel.
> 
> It wasn't obvious to me why both ASAs "A" and "B" both provide callbacks, I presume that this is because there are multiple asynchronous steps involved.  If so, would it be helpful to clarify this point?

Yes, one could build very complicated interlocking negotiations. (That is a whole can of worms, explored in draft-ciavaglia-anima-coordination, but for present purposes we just need to ensure that ASAs can handle any degree of asynchronosity, if that's a word.)

> Also in this paragraph suggest "Hence, " rather than "This is wht"

OK

> 
> 
>     2.2.3.  Overlapping Sessions and Operations
> 
>        In calls where it is used, the 'session_nonce' is an opaque read/
>        write parameter.  On the first call, it is set to a null value, and
>        the API returns a non-null 'session_nonce' value based on the GRASP
>        session identifier.  This value must be used in all subsequent calls
>        for the same session, and will be provided as a parameter in the
>        callback functions.  By this mechanism, multiple overlapping sessions
>        can be distinguished, both in the ASA and in the GRASP core.  The
>        value of the 'session_nonce" is opaque to the ASA.
>    
> Query the use of a null value, and whether that is too restrictive.

Its value on the first call is of no importance, so we should just call it a placeholder.

> 
> 
>     2.3.  API definition
> 
> Possibly this API would benefit from a short section (or even pseudo code) explaining the flow or flows of how the APIs are expected to work together.  This could also be in an appendix.  Perhaps the GRASP specification makes this obvious, in which case references back to the GRASP spec would be sufficient.

We have some example logic flows in https://tools.ietf.org/id/draft-carpenter-anima-asa-guidelines-09.html#section-appendix.b

Would a reference to that help? The authors would like that draft to be adopted by the WG. We could of course move the examples into the API draft, if people prefer that.

> 
> 
>     2.3.1.3.  Objective
> 
>        *  name (UTF-8 string) - the objective's name
> 
> Should there be a limit on the length of the name?  This question equivalently applies to the other strings/variable sized types.

I think that's implementation dependent. Because GRASP is a CBOR protocol, there is no particular limit on string length. There is a limit on the total size of a GRASP message (although I suspect we might decide to raise or remove that limit one day).
 
>        *  value - a specific data structure expressing the value of the
>           objective.  The format is language dependent, with the constraint
>           that it can be validly represented in CBOR (default integer = 0).
> 
> I didn't understand the "default integer = 0" part.  If this is a byte string then I would thought that the default might be an empty length bytestring, or perhaps a CBOR Null.

I'm not even sure now why we need to specify a default in a generic API. 

>           An essential requirement for all language mappings and all
>           implementations is that, regardless of what other options exist
>           for a language-specific represenation of the value, there is
>           always an option to use a CBOR byte string as the value.  The API
>           will then wrap this byte string in CBOR Tag 24 for transmission
>           via GRASP, and unwrap it after reception.
> 
> It was unclear whether you just meant an opaque CBOR byte string here (CBOR major type 2), or a CBOR byte string that itself contains CBOR encoded data (i.e. CBOR Tag 24).  Is there any limit on the length of the CBOR byte string that is allowed?

It's not intended to be restricted to major type 2; I'll have to check the CBOR RFC to find the correct phrase to use.

As above, the only size limit we have is the total GRASP message size.

> 
> 
>     2.3.2.  Registration
> 
> Some APIs list "Asynchronous Mechanisms:" and others don't.  It might be helpful to clarify that those APIs that don't list asynchronous mechanisms are implicitly synchronous in their behaviour.

OK

> 
> 
>         *  deregister_asa()
> 
>           -  Note - the ASA name is strictly speaking redundant in this
>              call, but is present for clarity.
> 
> Presumably it also makes it significantly harder for a rogue client to try and deregister all ASAs by looping through all asa_nonces.

True, but there is so much entropy in the nonce that this would be a challenge anyway.

> 
> 
>     2.3.4.  Negotiation
> 
>              2.  The 'session_nonce' parameter is not null.  In this case
>                  negotiation must continue.  The returned
>                  'proffered_objective' contains the first value proffered by
>                  the negotiation peer.  Note that this instance of the
>                  objective must be used in the subsequent negotiation call
>                  because it also contains the current loop count.  The
>                  'session_nonce' must be presented in all subsequent
>                  negotiation steps.
> 
> Presumably the loop count in the proffered_objective must be one greater than the loop count in the objective structure?  Would it be helpful to mention this?

Right... I'm going to look in my GRASP code before I try to rephrase that. I recall that it was a bit tricky to get it right.

> 
> I also wasn't sure whether it must be "this instance of the objective must be used", or just that the loop counter must be copied across.  E.g., functional languages would encourage these structures to be immutable.

And because of Python's vagueness about whether it's call-by-reference or call-by-value, there are cases where I've had to explicitly clone a copy of an objective to be sure that I have the right to decrement the loop counter. So yes, it's "this instance or a clone of this instance". Will rephrase.

> 
> 
>          -  Asynchronous Mechanisms:
> 
>              o  Event loop implementation: The 'session_nonce' parameter is
>                 used in read/write mode.
> 
> It is unclear to me, exactly what you mean by this, particularly because this is a return parameter.  Perhaps this could be further explained in the text, or refer back the previous text.

Yes. I think we were trying to express something that will look a bit different in C compared to high-level languages.

> 
> 
>     2.3.4.  Negotiation
> 
>         *  listen_negotiate()
> 
> I was surprised that the listen_negotate didn't include any timeout, e.g. to allow it to be used in polling fashion.  Same comment applies to listen_synchronize().

Well, in a polling solution won't the listen state just go into the event loop? But I'm a bit out of my depth there.

> 
> 
>     2.3.4.  Negotiation
> 
>         *  negotiate_wait()
>         
> It wasn't clear to me why you would want to do this, although that may be apparent in the GRASP draft.  Perhaps a cross reference might be helpful?

Yes, the idea is simply that one party in the negotiation wants to tell the other party to be patient for a while. It is described in the GRASP spec so a reference will help.

> 
> 
>     2.3.4.  Negotiation
>     
>         *  end_negotiate()
>         
> Possibly "successful" might be a better parameter name than "reply".  I think that you reply both if the negotiation is successful and also if it failed.

Yes, I don't know where that came from. In the Python implementation it's called 'result' (True for accept, False for decline).

> 
> 
>     2.3.5.  Synchronization and Flooding
> 
>        *  synchronize()
> 
>           -  If the objective was already flooded, the flooded value is
>              returned immediately in the 'result' parameter.  In this case,
>              the 'source' and 'timeout' are ignored.
> 
> Should source be quoted here?  It is unclear what it is referring to.

Good catch. That should be 'peer'.

> 
> 
>           -  Otherwise, synchronization with a discovered ASA is performed.
>              The 'peer' parameter is an 'ASA_locator' as returned by
>              discover().  If 'peer' is null, GRASP discovery is performed
>              first.
> 
> How does the GRASP discovery work in this scenario?  Does this require use of the discovery APIs?  This may require further explanation.

OK, will fix. The intention is that discovery is launched automatically.

> 
> 
>     3.  Implementation Status [RFC Editor: please remove]
> 
>        A prototype open source Python implementation of GRASP, including an
>        API similar to this document, has been used to verify the concepts
>        for the threaded model.  It may be found at
>        https://github.com/becarpenter/graspy with associated documentation
>        and demonstration ASAs.   
> 
> This section will be removed anyway, but I thought that the GIL (global interpreter lock) effectively prevented Python threads from running concurrently.

It's not that simple, because you can get true concurrency during I/O and clock waits. In any case, the Python queue and lock mechanism is valid regardless of this.
https://wiki.python.org/moin/GlobalInterpreterLock is informative.

> 
> 
> 
> Nits:
> 
> I've noted that you have chosen not to use RFC2119 language.  No issues with this, but this may come up in IESG review ...

Well, it's Informational.

> 
> 
> Use of the term "nonce".  Note in UK English this word sometimes takes an alternative meaning, depending on who you ask and which dictionary you look at: https://dictionary.cambridge.org/dictionary/english/nonce
> 
> You may wish to consider an alternative term, but please note that this is up to the authors.  I'm not objecting to this term, just making you aware, if you weren't already.

I'm not aware of a viable alternative.
 
>     Abstract:
> Maybe languages => programming languages.

OK

> 
> 
>  
>     1.  Introduction
>         As the following figure shows,
> 
> Given that the figure is a few paragraphs away (and on the next page in the txt version), perhaps give the figure a name and reference it by name.

Yes.
> 
> 
> 
>     2.3.1.3.  Objective.
> For me, I would outdent the "} objective" in the C struct example.

Yes.

> 
> represenation -> representation

Yes.

Thanks much,

    Brian
> 
> Thanks,
> Rob
>