Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

"Joel M. Halpern" <jmh@joelhalpern.com> Mon, 12 December 2016 23:37 UTC

Return-Path: <jmh@joelhalpern.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 98008129736; Mon, 12 Dec 2016 15:37:18 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.702
X-Spam-Level:
X-Spam-Status: No, score=-2.702 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=joelhalpern.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Rj1NMHge1e4k; Mon, 12 Dec 2016 15:37:15 -0800 (PST)
Received: from mailb2.tigertech.net (mailb2.tigertech.net [208.80.4.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 28F20129F39; Mon, 12 Dec 2016 15:36:32 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by mailb2.tigertech.net (Postfix) with ESMTP id 0E0DA6200B8; Mon, 12 Dec 2016 15:36:32 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelhalpern.com; s=1.tigertech; t=1481585792; bh=Fcs0mN2XG37OwotqKW0IQaengHCiKJ1fnYBuwkS4v3w=; h=Subject:To:References:Cc:From:Date:In-Reply-To:From; b=iIPALbNizZoCprMxyQpycabnK1JnrbHthMxpszqlxZImHBChAvPpVQR0OgjLuk/VT Ds5Q/SRnPxmzMm6lzDZQhZfxBIEZhwKF2deLv6WGUp2ppzWMpUf9/FElahMyHGaHE2 JimxPcK3O1QTNP0xeAwU8Ba5r5caQw6fQTCVKQm4=
X-Virus-Scanned: Debian amavisd-new at b2.tigertech.net
Received: from Joels-MacBook-Pro.local (209-255-163-147.ip.mcleodusa.net [209.255.163.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mailb2.tigertech.net (Postfix) with ESMTPSA id E29A01C0664; Mon, 12 Dec 2016 15:36:28 -0800 (PST)
To: Donald Eastlake <d3e3e3@gmail.com>
References: <570EB05D.20802@joelhalpern.com> <CAF4+nEHWCs7EOzMFN7HzA92DtdEzsFvFk-4zuzY4MRfeXdA4JA@mail.gmail.com> <57110E19.6050304@joelhalpern.com> <CAF4+nEHxnx8NDZAbyVvzdexoGVpA=Z56YJw2HPcr-zh44dYGEQ@mail.gmail.com> <5711B58E.8010506@joelhalpern.com> <CAF4+nEGSL90PYXaiae9z9=AYzHb+0ixenctbZ_+eomhFYLGA_Q@mail.gmail.com> <CAF4+nEF38JYn8Rc+o6TB5=+ocE185QGsJ-Sf0JuTYEtSNqQjWQ@mail.gmail.com>
From: "Joel M. Halpern" <jmh@joelhalpern.com>
Message-ID: <006871e7-e2bc-07e4-0ccc-c436a97812f4@joelhalpern.com>
Date: Mon, 12 Dec 2016 18:36:24 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.5.1
MIME-Version: 1.0
In-Reply-To: <CAF4+nEF38JYn8Rc+o6TB5=+ocE185QGsJ-Sf0JuTYEtSNqQjWQ@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/trill/ak1-Y8SSGNlm8Ts5wXdd9RlLxXg>
Cc: "rtg-ads@ietf.org" <rtg-ads@ietf.org>, "rtg-dir@ietf.org" <rtg-dir@ietf.org>, "trill@ietf.org" <trill@ietf.org>, draft-ietf-trill-directory-assist-mechanisms.all@ietf.org
Subject: Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/trill/>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 12 Dec 2016 23:37:19 -0000

Thank you Donald.  One major and a few minor points I noticed while 
reading.  This does look to have addressed all my major concerns, and 
most of my minor concerns.

Major:
     The QTYPE table in section 3.2.1 lists the values 3 and 4 as 
unused.  (This appears to have changed between versions 7 and 8. 
Possibly in an effort to address my earlier question about why these 
values were used.)  The  Pull Directory Forwarding text in section 
3.2.2.2 still explicitly assigns meanings and responses to QTYPEs 3 and 
4.  Either those values are to be used, in which case 3.2.1 needs to say 
so.  Or they are not to be used, and 2 is used for all the ARP-like 
behaviors.  In which case 3.2.2.2 needs to discuss this.

Minor:
     The text is now clear as to what the content is when frames are 
included in a query (3.2.1)  It would seem helpful to implementors if 
the motivation for distinguishing between type 2 and type 5 in the 
request, since the behavior is apparently decidable based on the frame 
content itself.

     In section 3.2.2.1 on the Response format, in discussing the SIZE 
field of the response record, the text refers to errors in the QUERY 
records and to subsequent QUERY records.  I presume that this was 
intended to say RESPONSE Record in each case?

     In bullet 1 of section 3.3, at the end, in describing the 
possibility of an all-entries flush (F, P, and N bits set), I think the 
text intends that the count must be 0 to trigger this behavior.  It 
would help to say that.

On 12/11/16 12:19 AM, Donald Eastlake wrote:
> Hi Joel,
>
> Sorry for the delay but we have attempted to respond to your points in
> version -09 of the draft. There were also changes unrelated to your
> comments which are briefly described in
> https://www.ietf.org/mail-archive/web/trill/current/msg07572.html
> <https://www.ietf.org/mail-archive/web/trill/current/msg07572.html>
>
> Additional changes in -09 including making "SHOULD" the implementation
> requirement for methods 2 and 3.
>
> Concerning the possible change to the Push Directory state machine,
> looking at this it appears that changes by adding states would have to
> be more extensive than I originally thought. In any case, in this
> version, some explanatory text has been added in Section 2.3.2.
>
> Please take a look when convenient.
>
> Thanks,
> Donald
> ===============================
>  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>  155 Beaver Street, Milford, MA 01757 USA
>  d3e3e3@gmail.com <mailto:d3e3e3@gmail.com>
>
> On Sat, Apr 16, 2016 at 10:03 PM, Donald Eastlake <d3e3e3@gmail.com
> <mailto:d3e3e3@gmail.com>> wrote:
>
>     Hi Joel,
>
>     On Fri, Apr 15, 2016 at 11:46 PM, Joel M. Halpern
>     <jmh@joelhalpern.com <mailto:jmh@joelhalpern.com>> wrote:
>     > If by the connectivity check to the directory server, you mean the
>     > underlying IS-IS routing reporting connectivity, then say that.
>
>     OK.
>
>     > While that
>     > is not actually interchangeable with real connectivity, it is perfectly
>     > reasoanble for the WG to deem it sufficient.  I think it would only take a
>     > sentence or two to clarify for the reader that what is meant is apparent
>     > topological connectivity, as distinct from verified communication.
>
>     The phrase usually used in TRILL (See RFC 7780) is "data reachable".
>
>     Thanks,
>     Donald
>     =============================
>      Donald E. Eastlake 3rd   +1-508-333-2270 <tel:%2B1-508-333-2270> (cell)
>      155 Beaver Street, Milford, MA 01757 USA
>      d3e3e3@gmail.com <mailto:d3e3e3@gmail.com>
>
>     > Yours,
>     > Joel
>     >
>     >
>     > On 4/15/16 11:12 PM, Donald Eastlake wrote:
>     >>
>     >> Hi Joel,
>     >>
>     >> On Fri, Apr 15, 2016 at 11:51 AM, Joel M. Halpern
>     <jmh@joelhalpern.com <mailto:jmh@joelhalpern.com>>
>     >> wrote:
>     >>>
>     >>> Thank you Donald.  Points of agreement elided, some responses to
>     try to
>     >>> clarify my observations.  I will note that from your comments
>     about 3.1,
>     >>> I
>     >>> believe my concerns, now moved to 3.7, are larger, as I had
>     assumed that
>     >>> the
>     >>> magic was in some other protocol, and you now say it is not defined
>     >>> there.
>     >>>
>     >>> Yours,
>     >>> Joel
>     >>>
>     >>> On 4/15/16 11:23 AM, Donald Eastlake wrote:
>     >>>>
>     >>>>
>     >>>> Hi Joel
>     >>>>
>     >>>> Thanks for your thorough review and comments. See below
>     >>>>
>     >>>> On Wed, Apr 13, 2016 at 4:47 PM, Joel M. Halpern
>     <jmh@joelhalpern.com <mailto:jmh@joelhalpern.com>
>     >>>>   <mailto:jmh@joelhalpern.com <mailto:jmh@joelhalpern.com>> wrote:
>     >>>>
>     >>> ...
>     >>>
>     >>>>> Major Issues:
>     >>>>> In the state machine transitions in section 2.3.3
>     >>>>> for push servers, it appears that if the event indicating that the
>     >>>>> server is being shut down occurs while the server is already Going
>     >>>>> Stand-By or Uncompleting, the transitions indicate that this
>     >>>>> "going
>     >>>>> down" event will be lost.  A strict reading of this would seem to
>     >>>>> mean that the "go Down" event would need to recur after the
>     >>>>> timeout
>     >>>>> condition.  This would seem to be best addressed by a new state
>     >>>>> "Going-Down" whose timeout behavior is to move to down state.
>     >>>>
>     >>>>
>     >>>> I understand your point but "going down" and the like are called
>     >>>> "events or conditions" in this draft, not just events.
>     >>>> The problem with adding a single "Going-Down" state is that
>     >>>> transition
>     >>>> to that state would lose the information as to whether or not the
>     >>>> Push
>     >>>> Directory had been advertising that it was pushing complete
>     >>>> information or not. The reason to remember this is that you would
>     >>>> want
>     >>>> to behave a differently if the "going down" condition was revoked
>     >>>> before it completed. This information could be preserved in a
>     >>>> Boolean
>     >>>> pseudo variable but the current style of state machine in this
>     draft
>     >>>> avoids such pseudo variables and encodes all of the relevant push
>     >>>> directory's state into the state machine state. Thus, I can see
>     >>>> three
>     >>>> possible responses to your comment:
>     >>>>
>     >>>> 1) Change wording to emphasize that these "events or
>     conditions" can
>     >>>> be conditions that cause a state transition some substantial time
>     >>>> after they become true.
>     >>>>
>     >>>> 2) Add two new states: (1) going down - was complete; (2) going
>     down
>     >>>> -
>     >>>> was incomplete.
>     >>>>
>     >>>> 3) Change the style of state machine to admit pseudo variables
>     which
>     >>>> can be set and testing as part of the state machinery.
>     >>>>
>     >>>> Option 1 is just some minor wording changes but adopting either
>     >>>> options 2 or 3 involves more extensive changes so I would prefer to
>     >>>> avoid them.
>     >>>
>     >>>
>     >>>  From what I have seen, trying to build a state machine with
>     conditions
>     >>> rather than events is fraught with problems and tends to lead to
>     errors
>     >>> in
>     >>> implementation.  It amounts to hiding pseudo-variables inside
>     the states,
>     >>> but not describing them.
>     >>> Thus, I would much prefer solution 2, but it is of course up to
>     the WG.
>     >>
>     >>
>     >> Well, option 2 wouldn't be too hard. Option 3 would probably
>     involve the
>     >> most
>     >> change.
>     >>
>     >>> ...
>     >>>
>     >>>>> Minor Issues:
>     >>>>> In section 2.3.3 describing the state transitions for push
>     >>>>> servers, there is an event (event 1) described as "the server was
>     >>>>> Down but is now Up."  The state transition diagram describes this
>     >>>>> as
>     >>>>> being a valid event that does not change the servers state if the
>     >>>>> server is in any state other than "Down." In one sense, this is
>     >>>>> reasonable, saying that such an event is harmless.  I would
>     >>>>> however
>     >>>>> expect some sort of logging or administrative notification, as
>     >>>>> something in the system is quite confused.
>     >>>>
>     >>>>
>     >>>> Again, I see your point but it seems to me to be a matter of state
>     >>>> machine style. Note that the "event" is described as a
>     condition, so
>     >>>> from that point of view, it is true anytime the state is other than
>     >>>> Down. On the other hand, if you view it as strictly an event, you
>     >>>> are
>     >>>> left with the question of what to put at the intersection of a
>     state
>     >>>> and event in the table when it is impossible for that event to
>     occur
>     >>>> in that state. Some people note this with an "N/A" (not applicable)
>     >>>> entry. In fact, previous TRILL state diagrams such as in RFC 7177
>     >>>> use
>     >>>> "N/A" so it would probably be simplest to change to that for
>     >>>> consistency.
>     >>>
>     >>>
>     >>> I think N/A would be good.
>     >>
>     >>
>     >> OK.
>     >>
>     >>> ...
>     >>>
>     >>>>> Text in section 3.2.2.1 on lifetimes and the information
>     >>>>> maintenance in section 3.3 imply that the clients and servers must
>     >>>>> maintain a connection. Presumably, this is required already by the
>     >>>>> RBridge Channel protocol, and I understand that we should not
>     >>>>> repeat
>     >>>>> the entire protocol here.  It would seem to make readers life MUCH
>     >>>>> simpler if the text noted that the RBridge Channel protocol
>     >>>>> requires
>     >>>>> that there be a maintained connection between the client and the
>     >>>>> server, and that these mechanisms leverage the presence of that
>     >>>>> connection.
>     >>>>
>     >>>>
>     >>>> The basic RBridge Channel protocol [RFC7178] is a datagram protocol
>     >>>> rather than a connection protocol. So there is no guaranteed
>     >>>> continuity of connection between RBridges that have previously
>     >>>> exchanged RBridge Channel messages. But connection would only be
>     >>>> lost
>     >>>> if the network partitions since RBridge Channel messages look like
>     >>>> data packets to any transit RBridges and will get forwarded as long
>     >>>> as
>     >>>> there is any route. Network partition is immediately visible in the
>     >>>> link state database to the RBridges at both ends of an RBridge
>     >>>> Channel
>     >>>> exchange.  Section 3.7 provides that if a Pull Directory is no
>     >>>> longer
>     >>>> reachable (i.e., RBridge Channel protocol packets would no longer
>     >>>> get
>     >>>> through), then all pull responses from that Pull Directory MUST be
>     >>>> discarded since cache consistency update messages can't get
>     through.
>     >>>> Perhaps a reference to Section 3.7 should be added to Section 3.3.
>     >>>
>     >>>
>     >>> I don't think a reference to 3.7 is sufficient, although it is
>     helpful.
>     >>> If the protocol is a datagram protocol, and if it is important
>     to discard
>     >>> data from unreachable pull servers, then I think 3.7 NEEDS to
>     say more
>     >>> than
>     >>> just ~if you happen to magically figure out you can't reach the
>     server,
>     >>> discard data it has given you.~  From the rest of the text, this
>     is an
>     >>> important and unspecified protocol mechanism.
>     >>
>     >>
>     >> Figuring out whether/how you can reach other RBridges is a basic
>     >> function of TRILL IS-IS based routing, not something "magical".
>     >> Whenever their is a topology change, an RBridge MUST determine routes
>     >> to all data reachable RBridges in the new topology. If there was an
>     >> RBridge previously reachable but no longer reachable, as would be the
>     >> case for all RBridges on the other side of a network partition, this
>     >> MUST be noticed so that, for example, all MAC reachability
>     information
>     >> associated with each of the no longer reachable RBridges can be
>     discarded.
>     >> It does not seem like much of a stretch to believe that an
>     RBridge would
>     >> keep track of the Pull Directory or Directories it was using, each of
>     >> which will be some other RBridge, and notice when a topology change
>     >> makes any of them inaccessible. But I have no problem adding some
>     >> wording to make this clearer.
>     >>
>     >>> ...
>     >>> In the flooding flag and behavior, (long text elided) I don't
>     think there
>     >>> is
>     >>> anything wrong with the intended behavior.  It is just that the very
>     >>> brief
>     >>> description of the FL flag leads the reader to an incorrect
>     expectation.
>     >>> Yes, it gets sorted out, but that is not good.  What I would
>     suggest is
>     >>> when
>     >>> the flag is defined (with whatever name you choose) note that
>     "for the
>     >>> qtypes 2,3,and 4, the flag indicates that the server should
>     flood its
>     >>> response."
>     >>
>     >>
>     >> We can work  on clarifying the wording.
>     >>
>     >> Thanks,
>     >> Donald
>     >> =============================
>     >>   Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>     >>   155 Beaver Street, Milford, MA 01757 USA
>     >>   d3e3e3@gmail.com <mailto:d3e3e3@gmail.com>
>     >>
>     >
>
>