Re: [Dime] Benjamin Kaduk's No Objection on draft-ietf-dime-group-signaling-13: (with COMMENT)

Mark Jones <mark@azu.ca> Mon, 15 March 2021 16:12 UTC

Return-Path: <mark@azu.ca>
X-Original-To: dime@ietfa.amsl.com
Delivered-To: dime@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 782A63A158B; Mon, 15 Mar 2021 09:12:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, NICE_REPLY_A=-0.001, RCVD_IN_DNSWL_BLOCKED=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1_nCw0nD779j; Mon, 15 Mar 2021 09:11:57 -0700 (PDT)
Received: from relay8-d.mail.gandi.net (relay8-d.mail.gandi.net [217.70.183.201]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 31B553A158A; Mon, 15 Mar 2021 09:11:55 -0700 (PDT)
X-Originating-IP: 45.72.253.70
Received: from [172.18.10.17] (unknown [45.72.253.70]) (Authenticated sender: mark@azu.ca) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id 8E4DE1BF20C; Mon, 15 Mar 2021 16:11:45 +0000 (UTC)
To: Benjamin Kaduk <kaduk@mit.edu>, The IESG <iesg@ietf.org>
Cc: draft-ietf-dime-group-signaling@ietf.org, dime-chairs@ietf.org, dime@ietf.org, jounikor@gmail.com
References: <161231642581.26850.13965330226098009886@ietfa.amsl.com>
From: Mark Jones <mark@azu.ca>
Message-ID: <c7a328ba-c9dd-0cc2-e39b-8b0beb6ca34c@azu.ca>
Date: Mon, 15 Mar 2021 12:11:44 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1
MIME-Version: 1.0
In-Reply-To: <161231642581.26850.13965330226098009886@ietfa.amsl.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Language: en-US
Content-Transfer-Encoding: 8bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/dime/pYRBmRBrIjQ_JHIR4Uy5LcfUW0o>
Subject: Re: [Dime] Benjamin Kaduk's No Objection on draft-ietf-dime-group-signaling-13: (with COMMENT)
X-BeenThere: dime@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Diameter Maintanence and Extentions Working Group <dime.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dime>, <mailto:dime-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dime/>
List-Post: <mailto:dime@ietf.org>
List-Help: <mailto:dime-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dime>, <mailto:dime-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Mar 2021 16:12:01 -0000

Good afternoon Benjamin,

Thank you for your thorough review and rephrasing suggestions.
Our responses are inline below.

Regards
Mark

On 2021-02-02 8:40 p.m., Benjamin Kaduk via Datatracker wrote:
 > Benjamin Kaduk has entered the following ballot position for
 > draft-ietf-dime-group-signaling-13: No Objection
 >
 > When responding, please keep the subject line intact and reply to all
 > email addresses included in the To and CC lines. (Feel free to cut this
 > introductory paragraph, however.)
 >
 >
 > Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
 > for more information about IESG DISCUSS and COMMENT positions.
 >
 >
 > The document, along with other ballot positions, can be found here:
 > https://datatracker.ietf.org/doc/draft-ietf-dime-group-signaling/
 >
 >
 >
 > ----------------------------------------------------------------------
 > COMMENT:
 > ----------------------------------------------------------------------
 >
 > I agree with the other commenters that an editorial pass over the
 > document before it is sent to the RFC Editor is probably in order; I
 > only noted the more eggregious nits that have obvious fixes but skipped
 > the ones where I did not have a resolution ready at hand to suggest.
 > Barry's remarks about restrictive vs non-restrictive clauses are
 > particularly important.
 >
 > I also have several high-level issues that don't quite rise to
 > discuss-level but seem worth getting resolution on; they mostly involve
 > edge cases where the current text either leaves me unclear on what is
 > supposed to happen or seems to present conflicting guidance:
 >
 > - what's the error handling in cases of partial success?
 >    * For operations adding/removing sessions to/from groups, it seems that
 >      the general principle is that the new changes introduced with any
 >      group-relevant CCF exchange are to be treated atomically: either all
 >      group additions/removals succeed or all fail (and when it fails the
 >      session is forced into a single-session fallback mode).  But for
 >      server-initiated modifications this is enforced by "if the client
 >      can't do it, the client MUST tear down the affected session(s)", and
 >      I'm not sure if there's a case where the client can send a new
 >      request that includes some modifications and the server also tries
 >      to make some modifications in its response; such a scenario might in
 >      effect allow partial success and might also be hard to correctly
 >      interpret.
 >    * For operations acting on groups, we do allow partial success
 >      (DIAMETER_LIMITED_SUCCESS) and attempt to mandate fallback to
 >      single-session operation for affected sessions, but the specifics of
 >      how the Failed-AVP effectuates this are not clear to me (see my
 >      comment on Section 4.4.3).
 >

This was discussed during the drafting phase and we considered whether
to just document the all-or-nothing scenario. We did not want to
prohibit Diameter applications based on this draft from defining their
own partial success scenario. However, it will be eventually up to the
application to describe in details how to exactly react based on the
response from the server and the implication of "tearing down" a
session. We agree that the use of the Failed-AVP needs clarification
(see below).


 > - we require that group identifiers are globally unique, which can be
 >    done with the diameter-node namespace prefix.  But for the portion
 >    after that prefix, we seem to suggest reusing the construction from
 >    Section 8.8 of RFC 6733, which is just (in essence) a 64-bit global
 >    counter.  In the vein of draft-gont-numeric-ids-sec-considerations,
 >    that seems overly constrained, in that it reveals sequencing across
 >    group creation and rate of per-node group creation to the remote peer.
 >    It seems that a construction akin to a keyed hash of a counter would
 >    preserve the guaranteed uniqueness property but avoid leaking such
 >    information to the network.
 >

Although we're not aware of any issues encountered in the field with
the 64-bit counter in the Diameter Session-Id AVP, the implementation-
defined portion of the Session-Group-Id will allow the flexibility to
use an alternate method to achieve global uniqueness. We propose
clarifying the structure of the Session-Group-ID AVP:


    The Session-Group-Id MUST be globally unique. The Session-Group-Id includes a
    mandatory portion and an implementation-defined portion delimited by the ";"
    character. The Session-Group-Id MUST begin with the identity of the Diameter
    node that owns the session group. The remainder of the Session-Group-Id is
    implementation-defined and MAY follow the format recommended for the
    implementation-defined portion of the Session-Id AVP in section 8.8 of
    [RFC6733].

This would allow implementors to use a keyed hashes in the
implementation-defined portion.


 > - the permissions model is a bit unorthodox, with groups "owned" by
 >    their creator but the binding of a session to a group "owned" by the
 >    node that performed the binding.  It seems like there is some risk of
 >    deadlock situations or conflict, e.g., where a client has added a
 >    session to a group G but the server wants to remove the session from
 >    G, or where a session is part of a group but the owner of the group
 >    wants to delete the group.  For the latter case, my understanding is
 >    that group deletion "trumps" ownership of the session/group binding,
 >    such that deletion can proceed even while sessions are in the group,
 >    but I'm less sure what should happen in the former case (or others).
 >

Section 3.3 was intended to be an overview of the permissions involved
at each stage of the Session-Group life cycle but the second paragraph
has become unwieldy over successive edits. We've rephrased this section
to improve readability:

    Permission considerations in the context of this draft apply to the
    permission of Diameter nodes to build new session groups, to assign/
    remove a session to/from a session group and to delete an existing
    session group.

    When a client or a server decides to create a new session group,
    e.g., to group all sessions which share certain characteristics,
    this node builds a session group identifier according to the rules
    described in Section 7.3 and becomes the owner of the group.

    After the creation of a session group, a session can be added to this
    session group by either the client and the server. However, a session
    can only be removed from a session group by the Diameter node (client
    or server) that has assigned this session to the session group.

    A session group can only be deleted by the owner of the session
    group, resulting in individual treatment of the sessions that were
    assigned to this session group.

    Diameter applications with implicit support for session groups MAY
    define a more constrained permission model.  For example, a more
    constrained model could require that a client must not remove a
    session from a group which is owned by the server.  Details about
    enforcing a more constraint permission model are out of scope of this
    specification.


We've gone back and forth on keeping the table in section 3.3 and
settled on its removal. There are caveats to several of the operations
(explained in detail in Section 4) and the simplified view offered in
this table is adding confusion.

 > - Is it possible to have a group with no member sessions?  Section 3.3
 >    suggests that if the last session is removed the group should be
 >    deleted, but AFAICT this would still require the node performing the
 >    removal to unset SESSION_GROUP_STATUS_IND, and I didn't see a
 >    requirement to do that spelled out.  I do see how it is impossible to
 >    create a group directly in a state with no member sessions.
 >

A session group is implicitly deleted and its identifier released
after the last session has been removed from this session group. We
have added text to clarify this explicit/implicit deletion in Section
4.3:

    To explicitly delete a session group and release the associated
    Session-Group-Id value, the owner of a session group appends a
    single Session-Group-Info AVP having the SESSION_GROUP_STATUS_IND
    flag cleared and the Session-Group-Id AVP identifying the session
    group, which is to be deleted.  The SESSION_GROUP_ALLOCATION_ACTION
    flag of the associated Session-Group-Control-Vector AVP MUST be
    cleared.

    A session group is implicitly deleted and its identifier released
    after the last session has been removed from this session group.


 > - Is there a requirement to always send the current state of group
 >    membership when acting on a session?  I could (perhaps naively)
 >    imagine a case where due to previous interactions a session belongs to
 >    groups G1, G2, and G3, but in some subsequent request the client only
 >    mentions G1.  Does the membership in G2 and G3 get implicitly retained
 >    in the absence of an explicit unset SESSION_GROUP_ALLOCATION_ACTION or
 >    are there some other constraints that make such a scenario impossible?
 >


Unless reassigning groups for the session, this specification does
not require the node to send the currently assigned groups in each
exchange.

 > - Is there supposed to be a strong distinction between "including" an
 >    AVP and "appending" one?  I see that 6733 does make some fairly clear
 >    distinction between the terms, but it seems that (e.g.) in Section
 >    4.2.1 we use both phrasings to discuss Session-Group-Info.
 >

No distinction was intended here. We will review for consistency and
ensure "including" is used throughout.


 > Now for some section-by-section (mostly editorial or nit-level) notes.
 >
 > Abstract, Introduction
 >
 >     a million concurrent Diameter sessions.  Recent use cases have
 >     revealed the need for Diameter nodes to apply the same operation to a
 >     large group of Diameter sessions concurrently.  The Diameter base
 >
 > I note that the -00 is from 2012; are these use cases still "recent"?
 >

Ouch! We will replace "Recent" with "In some use cases, Diameter nodes
need to apply the same operation to a large group of Diameter sessions
concurrently."

 > Section 3
 >
 > As an editorial note, the way the current text jumps in to say that
 > sessions can be assigned to groups leaves the reader uncertain whether
 > this is describing preexisting functionality or the new mechanisms added
 > by this document.  A top-level intro paragraph for Section 3 that says
 > roughly "to accomodate bulk operations on Diameter sessions, the concept
 > of session groups is introduced; once sessions are added to a group, a
 > command acting on the group will affect all the member sessions" might
 > help.
 >

Agreed. We will incorporate your text as:

    In order to accommodate bulk operations on Diameter sessions, the
    concept of session groups is introduced. Once sessions are added to
    a group, a command acting on the group will affect all the member
    sessions.


 > Section 3.3
 >
 > If I understand correctly, the lines in the table for "remove a session
 > from an owned Session Group"/"remove a session from a non-owned Session
 > Group" mean only that this operation can be done sometimes, not that it
 > can always be done (per the lines about "created the assignment".  Would
 > it be helpful to indicate this, perhaps by using a different symbol for
 > those lines and adding a footnote?
 >

We concluded that it was less confusing to simply remove the table
rather than keep it and add footnotes repeating text from section 4.

 > Section 4
 >
 > While I understand the desire to keep the document structure as it is,
 > with the actual AVPs specified in Section 7, it would have been very
 > helpful to have a toplevel introductory paragraph here that mentions or
 > references that there is a containing Session-Group-Info grouped AVP
 > that contains the Session-Group-Control-Vector with information about
 > the action and group, and zero or one(?) Session-Group-Id AVP to
 > identify the group when a specific group is being identified.  This
 > would also allow clarifying whether the Session-Group-Id AVP is
 > currently only defined to appear within the Session-Group-Info.
 >


We don't believe this is needed. Grouped AVP or not, an AVP is meant to
be self-consistent and one section per AVP is consistent with other
Diameter drafts. The Session-Group-Id AVP is initially defined in the
Session-Group-Info by this draft but can be reused anywhere in future
applications as a stand-alone AVP (if required).


 > Section 4.1.1
 >
 >                                             Such applications provide
 >     intrinsic discovery for the support of session grouping capability
 >     using the assigned Application Id advertised during the capability
 >     exchange phase two Diameter peers establish a transport connection
 >     (see Section 5.3 of [RFC6733]).
 >
 > nit: I think there's a missing word here, perhaps "where" after "phase"?
 >

Agreed. We propose rephrasing as:

                                             Such applications provide
    intrinsic discovery for the support of session grouping capability
    using the assigned Application Id advertised during the capability
    exchange phase described in Section 5.3 of [RFC6733].

We also propose renaming sections 4.1.1 and 4.1.2 to avoid the
implicit/explicit terminology that has caused confusion:

OLD:

4.1.1.  Implicit Capability Discovery
4.1.2.  Explicit Capability Discovery

NEW:

4.1.1.  Capability Discovery based on the Application Id
4.1.2.  Capability Discovery based on AVP Presence


 > Section 4.2.1
 >
 >     The client may also indicate in the request that the server is
 >     responsible for the assignment of the session in one or multiple
 >     sessions owned by the server.  [...]
 >
 > nit(?): is this supposed to be "assignment of the session into one or
 > multiple session *groups* owned by the server"?  I'm having a hard time
 > understanding it as written.
 >

The client is simply indicating support for group assignment for this
session. We agree the current text is unclear and propose rephrasing as:

    The client may also indicate in the request that it supports
    assignment of the session to one or more groups by the server.
    In such a case, the client MUST include the Session-Group-Info AVP
    in the request including the Session-Group-Control-Vector AVP with the
    SESSION_GROUP_ALLOCATION_ACTION flag set but no Session-Group-Id AVP.

 >     If the assignment of the session to one or some of the multiple
 >     identified session groups fails, the session group assignment is
 >     treated as failure.  In such case the session is treated as single
 >     session without assignment to any session group by the Diameter
 >     nodes.  The server sends the response to the client and MAY include
 >     those Session-Group-Info AVPs for which the group assignment failed.
 >     The SESSION_GROUP_ALLOCATION_ACTION flag of included Session-Group-
 >     Info AVPs MUST be cleared.
 >
 > I guess I understand the part where the entire set of group-assignment
 > operations has to succeed or fail as an atomic unit, but this text
 > perhaps implies some semantics that the server is supposed to only
 > explicitly include in the response the subset of group assignments that
 > were unable to be processed, omitting the ones that could have been
 > processed (but were not processed since a failure on one means that none
 > of the operations get applied).  If that's the intent, I'd suggest being
 > a bit more explicit about what is and isn't sent.
 >

The operation is indeed atomic i.e. all or nothing, but the server can
allow the session to continue without it being assigned to any of the
requested groups. We propose rephrasing to clarify:

    If the Diameter server rejects the client's request for a group
    assignment, the server sends the response to the client, e.g., a
    service-specific auth response as per NASREQ AA-Answer [RFC7155], and
    MUST include all Session-Group-Info AVPs as received in the client's
    request (if any) while clearing the SESSION_GROUP_ALLOCATION_ACTION
    flag of the Session-Group-Control-Vector AVP.

    The server may still accept the client's request for the identified
    session to proceed despite rejecting the group assignment. The
    response sent to the client will then indicate success in the result
    code. In this case, the session is treated as single session without
    assignment to any session group by the Diameter nodes.

 >     A Diameter client, which sent a request for session initiation to a
 >     Diameter server and appended a single or multiple Session-Group-Id
 >     AVPs but cannot find any Session-Group-Info AVP in the associated
 >
 > (editorial) the phrase "cannot find" makes me wonder how hard it's
 > expected to be looking; more definitive statements about "not present"
 > seem more typical for RFC style.
 >

Agreed. We propose rephrasing as "receives an answer where the
Session-Group-Info AVP is not present"...

 > Section 4.2.3
 >
 >     When a Diameter server enforces an update to the assigned groups mid-
 >
 > nit: this seems to be the only time we use the word "enforce" in this
 > sense in the document; previous discussion seems to just use "decides to
 > make an update" or similar.
 >

Agreed. We'll review the draft and consistently use "server/client
decides to create/update/remove".

 >     answer.  The client subsequently sends a service-specific re-
 >     authorization request containing one or multiple Session-Group-Info
 >     AVPs with the SESSION_GROUP_ALLOCATION_ACTION flag set and the
 >     Session-Group-Id AVP identifying the session group to which the
 >     session had been previously assigned.  [...]
 >
 > nit: I think this has to be "group or groups" to be consistent with the
 > rest of the doc.
 >


Agreed. We propose rephrasing as "...request containing the currently
assigned groups represented in one or more Session-Group-Info AVPs."

 > Section 4.4.1
 >
 >     Either Diameter node (client or server) can request the recipient of
 >     a request to process an associated command for all sessions assigned
 >     to one or multiple groups by identifying these groups in the request.
 >     The sender of the request appends for each group, to which the
 >     command applies, a Session-Group-Info AVP including the Session-
 >     Group-Id AVP to identify the associated session group.  Both, the
 >     SESSION_GROUP_ALLOCATION_ACTION flag as well as the
 >     SESSION_GROUP_STATUS_IND flag MUST be set.
 >
 > What's the error handling if one or both listed flags are not set --
 > just ignore the request?
 >

The session SESSION_GROUP_STATUS_IND flag is only meaningful when a
Session-Group-Id AVP is present, as indicated in the definition of the
flag:

       If the Session-Group-Info AVP does not include a Session-Group-Id
       AVP, this flag is meaningless and MUST be ignored by the receiver.


 >     Action AVP to ALL_GROUPS (1) or PER_GROUP (2).  If the answer can be
 >     sent before the complete process of the request for all the sessions
 >     or if the request timeout timer is high enough, the sender MAY set
 >     the Group-Response-Action AVP to ALL_GROUPS (1) or PER_GROUP (2).
 >
 > (side note) just the phrase "high enough" doesn't give much of an
 > indication of what the criteria are and what numerical values might be
 > appropriate.  That said, it's not entirely clear how much guidance we
 > can really give in this situation.
 >

It is tough to provide concrete guidance here because the logic is
ultimately application specific. Perhaps the only recommendation that
can be given is regarding delay-sensitive scenarios as follows:

    If the sender sends a request including the Group-Response-Action AVP
    set to ALL_GROUPS (1) or PER_GROUP (2), it has to expect some delay
    before receiving the corresponding answer(s) as the answer(s) will
    only be sent back when the request is processed for all the sessions
    or all the session of a session group.  If the processing of the
    request is delay-sensitive, the sender SHOULD NOT set the
    Group-Response-Action AVP to ALL_GROUPS (1) or PER_GROUP (2).


 > Section 4.4.2
 >
 >     If the received request identifies multiple groups in multiple
 >     appended Session-Group-Id AVPs, the receiver SHOULD process the
 >     associated command for each of these groups.  If a session has been
 >     assigned to more than one of the identified groups, the receiver MUST
 >     process the associated command only once per session.
 >
 > Why is this only a SHOULD for each group -- what other behaviors could
 > the receiver do?
 >

As the operation is atomic, if one of the assignment fails, there is no
reason to continue the assignment process. The "SHOULD" here was meant
to be "only when each of the previous assignment has been successful"
but a rephrasing is proposed to clarify.

    When receiving a request to process a command for a group of sessions,
    the Diameter node identifies the relevant groups according to the included
    Session-Group-Id AVP(s) in the Session-Group-Info AVP and processes the
    group command according to the included Group-Response-Action AVP.
    If the received request identifies multiple groups in multiple
    Session-Group-Id AVPs, the receiver SHOULD process the
    associated command for each of these groups.  If a session has been
    assigned to more than one of the identified groups, the receiver MUST
    process the associated command only once per session.


 > Section 4.4.3
 >
 >     In the case of limited success, the sessions, for which the
 >     processing of the group command failed, MUST be identified using a
 >     Failed-AVP AVP as per Section 7.5 of [RFC6733].  [...]
 >
 > My reading of the referenced part of RFC 6733 is that there is a single
 > Failed-AVP pointing to a single AVP that could not be processed
 > properly.  Is there such a single failed AVP in the case where
 > processing failed for multiple sessions in the group(s)?  It seems that
 > the "largest containing AVP" that includes all failed groups might be so
 > large so as to not be useful in indicating the problem.
 >

The Failed-AVP contains the Session-Id of all the sessions that the group
command failed and that needs to be clarified here.

    In the case of limited success, the sessions, for which the
    processing of the group command failed, MUST be identified by
    including their Session-Id AVP in the Failed-AVP AVP as per
    Section 7.5 of [RFC6733].

 > Section 7.2
 >
 >     SESSION_GROUP_STATUS_IND (0x00000010)
 >
 > If there's a mnemonic for the "IND" part of "SESSION_GROUP_STATUS_IND",
 > that would be helpful to expand.
 >

"IND" is a truncated "INDICATION" but it appears redundant for a flag
name. SESSION_GROUP_STATUS is sufficient.

 > Section 9.2
 >
 > The Specification Required policy includes review by Designated Experts;
 > is there any guidance we should provide to the DEs?
 >

We don't believe this is required. The reviewer will just rely on the
definition of the AVPs and details provided in the RFC.


 > Appendix A.1
 >
 >        Discon    GASA received                  Cleanup      Idle
 >
 > Spot-checking against RFC 6733's state machine, the non-group ASA
 > received case only makes this transition when there was a previous ASR
 > that was successfully sent.  Is that am important distinction?  (Also,
 > as an editorial nit, RFC 6733 spells "Clean up" as two words.)
 >

It is assumed that an (G)ASA can only be received if (G)ASR has been
successfully sent. Otherwise, it is not possible to receive an (G)ASA.