Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

"Joel M. Halpern" <jmh@joelhalpern.com> Tue, 13 December 2016 20:26 UTC

Return-Path: <jmh@joelhalpern.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 984E512948B; Tue, 13 Dec 2016 12:26:40 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.702
X-Spam-Level:
X-Spam-Status: No, score=-2.702 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=joelhalpern.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OiCkSjzoT6uT; Tue, 13 Dec 2016 12:26:37 -0800 (PST)
Received: from maila2.tigertech.net (maila2.tigertech.net [208.80.4.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D40F512950C; Tue, 13 Dec 2016 12:26:37 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by maila2.tigertech.net (Postfix) with ESMTP id BD557240209; Tue, 13 Dec 2016 12:26:37 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelhalpern.com; s=1.tigertech; t=1481660797; bh=3Ll81tSPLYjMEo/xD5DwOp4g99koMME1vGqdUBbDBNw=; h=Subject:To:References:Cc:From:Date:In-Reply-To:From; b=iU1aK3/+NP/mYVOoiKpQUrZ7HstcpYGAKYvjNWBJKEGlqSU9ES3fUo/Z7l9l9E9Gi 90cpWjWt9QjUQp1WmfGEB/uS+3rreVz2ZfMozgiuV+K7ETc39AytHtCqW1pDi50oSU nT7SwFJGiG3maumgrisVZBw9s5wDoclrg1bERmrk=
X-Virus-Scanned: Debian amavisd-new at maila2.tigertech.net
Received: from Joels-MacBook-Pro.local (209-255-163-147.ip.mcleodusa.net [209.255.163.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by maila2.tigertech.net (Postfix) with ESMTPSA id CC28F24019A; Tue, 13 Dec 2016 12:26:36 -0800 (PST)
To: Donald Eastlake <Donald.Eastlake@huawei.com>
References: <570EB05D.20802@joelhalpern.com> <CAF4+nEHWCs7EOzMFN7HzA92DtdEzsFvFk-4zuzY4MRfeXdA4JA@mail.gmail.com> <57110E19.6050304@joelhalpern.com> <CAF4+nEHxnx8NDZAbyVvzdexoGVpA=Z56YJw2HPcr-zh44dYGEQ@mail.gmail.com> <5711B58E.8010506@joelhalpern.com> <CAF4+nEGSL90PYXaiae9z9=AYzHb+0ixenctbZ_+eomhFYLGA_Q@mail.gmail.com> <CAF4+nEF38JYn8Rc+o6TB5=+ocE185QGsJ-Sf0JuTYEtSNqQjWQ@mail.gmail.com> <006871e7-e2bc-07e4-0ccc-c436a97812f4@joelhalpern.com> <C5BD54C085F1DB4D9B6B5BFF7ACE182B6EA50028@dfweml501-mbx>
From: "Joel M. Halpern" <jmh@joelhalpern.com>
Message-ID: <f70fba9c-2e3d-8bb5-54cc-40353a748362@joelhalpern.com>
Date: Tue, 13 Dec 2016 15:26:35 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.5.1
MIME-Version: 1.0
In-Reply-To: <C5BD54C085F1DB4D9B6B5BFF7ACE182B6EA50028@dfweml501-mbx>
Content-Type: text/plain; charset="windows-1252"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/trill/U1K4EqbhzVcplcvNzyvTjWtn6BM>
Cc: "rtg-ads@ietf.org" <rtg-ads@ietf.org>, "rtg-dir@ietf.org" <rtg-dir@ietf.org>, "draft-ietf-trill-directory-assist-mechanisms.all@ietf.org" <draft-ietf-trill-directory-assist-mechanisms.all@ietf.org>, "trill@ietf.org" <trill@ietf.org>
Subject: Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/trill/>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 13 Dec 2016 20:26:40 -0000

Thanks.  That works for me.  I suspect the 3.2.1 / 3.2.2.2 disconnect 
was a skipped correction.

Yours,
Joel

On 12/13/16 3:23 PM, Donald Eastlake wrote:
> Hi Joel,
>
> Thanks for your prompt response. See below at <de>
>
> -----Original Message-----
> From: trill [mailto:trill-bounces@ietf.org] On Behalf Of Joel M. Halpern
> Sent: Monday, December 12, 2016 6:36 PM
> To: Donald Eastlake
> Cc: rtg-ads@ietf.org; rtg-dir@ietf.org; trill@ietf.org; draft-ietf-trill-directory-assist-mechanisms.all@ietf.org
> Subject: Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt
>
> Thank you Donald.  One major and a few minor points I noticed while
> reading.  This does look to have addressed all my major concerns, and
> most of my minor concerns.
>
> <de> Thanks.
>
> Major:
>      The QTYPE table in section 3.2.1 lists the values 3 and 4 as
> unused.  (This appears to have changed between versions 7 and 8.
> Possibly in an effort to address my earlier question about why these
> values were used.)  The  Pull Directory Forwarding text in section
> 3.2.2.2 still explicitly assigns meanings and responses to QTYPEs 3 and
> 4.  Either those values are to be used, in which case 3.2.1 needs to say
> so.  Or they are not to be used, and 2 is used for all the ARP-like
> behaviors.  In which case 3.2.2.2 needs to discuss this.
>
> <de> Sorry, 3.2.2.2 was overlooked when 3.2.2.1 was updated. This should be easy to fix.
>
> <de> I do see a difference between QTYPE 2 and QTYPE 5.
> 	QTYPE 2 can be seen as saying to ignore the MAC destination address, look at the Ethertype, and process as an ARP, ND, or RARP packet (or reject if none of these).
> 	QTYPE 5 can be seen as saying to ignore the Ethertype and do various lookups and/or forwarding based on the MAC destination address.
> 	These seems like different services although I suppose you could guess heuristically which was wanted.
>
> Minor:
>      The text is now clear as to what the content is when frames are
> included in a query (3.2.1)  It would seem helpful to implementors if
> the motivation for distinguishing between type 2 and type 5 in the
> request, since the behavior is apparently decidable based on the frame
> content itself.
>
> <de3> OK. Something like my text above could be included.
>
>      In section 3.2.2.1 on the Response format, in discussing the SIZE
> field of the response record, the text refers to errors in the QUERY
> records and to subsequent QUERY records.  I presume that this was
> intended to say RESPONSE Record in each case?
>
> <de> Yup. Looks like a copy and paste error that slipped by.
>
>      In bullet 1 of section 3.3, at the end, in describing the
> possibility of an all-entries flush (F, P, and N bits set), I think the
> text intends that the count must be 0 to trigger this behavior.  It
> would help to say that.
>
> <de> OK. Seems fairly clear to me but it can't hurt to make it clearer.
>
> <de>Thanks,
> Donald
> ==========================================
> Donald E. Eastlake, 3rd     Donald.Eastlake@huawei.com
> 155 Beaver Street              +1-508-333-2270
>  Milford, MA 01757 USA
>
>
> On 12/11/16 12:19 AM, Donald Eastlake wrote:
>> Hi Joel,
>>
>> Sorry for the delay but we have attempted to respond to your points in
>> version -09 of the draft. There were also changes unrelated to your
>> comments which are briefly described in
>> https://www.ietf.org/mail-archive/web/trill/current/msg07572.html
>> <https://www.ietf.org/mail-archive/web/trill/current/msg07572.html>
>>
>> Additional changes in -09 including making "SHOULD" the implementation
>> requirement for methods 2 and 3.
>>
>> Concerning the possible change to the Push Directory state machine,
>> looking at this it appears that changes by adding states would have to
>> be more extensive than I originally thought. In any case, in this
>> version, some explanatory text has been added in Section 2.3.2.
>>
>> Please take a look when convenient.
>>
>> Thanks,
>> Donald
>> ===============================
>>  Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>>  155 Beaver Street, Milford, MA 01757 USA
>>  d3e3e3@gmail.com <mailto:d3e3e3@gmail.com>
>>
>> On Sat, Apr 16, 2016 at 10:03 PM, Donald Eastlake <d3e3e3@gmail.com
>> <mailto:d3e3e3@gmail.com>> wrote:
>>
>>     Hi Joel,
>>
>>     On Fri, Apr 15, 2016 at 11:46 PM, Joel M. Halpern
>>     <jmh@joelhalpern.com <mailto:jmh@joelhalpern.com>> wrote:
>>     > If by the connectivity check to the directory server, you mean the
>>     > underlying IS-IS routing reporting connectivity, then say that.
>>
>>     OK.
>>
>>     > While that
>>     > is not actually interchangeable with real connectivity, it is perfectly
>>     > reasoanble for the WG to deem it sufficient.  I think it would only take a
>>     > sentence or two to clarify for the reader that what is meant is apparent
>>     > topological connectivity, as distinct from verified communication.
>>
>>     The phrase usually used in TRILL (See RFC 7780) is "data reachable".
>>
>>     Thanks,
>>     Donald
>>     =============================
>>      Donald E. Eastlake 3rd   +1-508-333-2270 <tel:%2B1-508-333-2270> (cell)
>>      155 Beaver Street, Milford, MA 01757 USA
>>      d3e3e3@gmail.com <mailto:d3e3e3@gmail.com>
>>
>>     > Yours,
>>     > Joel
>>     >
>>     >
>>     > On 4/15/16 11:12 PM, Donald Eastlake wrote:
>>     >>
>>     >> Hi Joel,
>>     >>
>>     >> On Fri, Apr 15, 2016 at 11:51 AM, Joel M. Halpern
>>     <jmh@joelhalpern.com <mailto:jmh@joelhalpern.com>>
>>     >> wrote:
>>     >>>
>>     >>> Thank you Donald.  Points of agreement elided, some responses to
>>     try to
>>     >>> clarify my observations.  I will note that from your comments
>>     about 3.1,
>>     >>> I
>>     >>> believe my concerns, now moved to 3.7, are larger, as I had
>>     assumed that
>>     >>> the
>>     >>> magic was in some other protocol, and you now say it is not defined
>>     >>> there.
>>     >>>
>>     >>> Yours,
>>     >>> Joel
>>     >>>
>>     >>> On 4/15/16 11:23 AM, Donald Eastlake wrote:
>>     >>>>
>>     >>>>
>>     >>>> Hi Joel
>>     >>>>
>>     >>>> Thanks for your thorough review and comments. See below
>>     >>>>
>>     >>>> On Wed, Apr 13, 2016 at 4:47 PM, Joel M. Halpern
>>     <jmh@joelhalpern.com <mailto:jmh@joelhalpern.com>
>>     >>>>   <mailto:jmh@joelhalpern.com <mailto:jmh@joelhalpern.com>> wrote:
>>     >>>>
>>     >>> ...
>>     >>>
>>     >>>>> Major Issues:
>>     >>>>> In the state machine transitions in section 2.3.3
>>     >>>>> for push servers, it appears that if the event indicating that the
>>     >>>>> server is being shut down occurs while the server is already Going
>>     >>>>> Stand-By or Uncompleting, the transitions indicate that this
>>     >>>>> "going
>>     >>>>> down" event will be lost.  A strict reading of this would seem to
>>     >>>>> mean that the "go Down" event would need to recur after the
>>     >>>>> timeout
>>     >>>>> condition.  This would seem to be best addressed by a new state
>>     >>>>> "Going-Down" whose timeout behavior is to move to down state.
>>     >>>>
>>     >>>>
>>     >>>> I understand your point but "going down" and the like are called
>>     >>>> "events or conditions" in this draft, not just events.
>>     >>>> The problem with adding a single "Going-Down" state is that
>>     >>>> transition
>>     >>>> to that state would lose the information as to whether or not the
>>     >>>> Push
>>     >>>> Directory had been advertising that it was pushing complete
>>     >>>> information or not. The reason to remember this is that you would
>>     >>>> want
>>     >>>> to behave a differently if the "going down" condition was revoked
>>     >>>> before it completed. This information could be preserved in a
>>     >>>> Boolean
>>     >>>> pseudo variable but the current style of state machine in this
>>     draft
>>     >>>> avoids such pseudo variables and encodes all of the relevant push
>>     >>>> directory's state into the state machine state. Thus, I can see
>>     >>>> three
>>     >>>> possible responses to your comment:
>>     >>>>
>>     >>>> 1) Change wording to emphasize that these "events or
>>     conditions" can
>>     >>>> be conditions that cause a state transition some substantial time
>>     >>>> after they become true.
>>     >>>>
>>     >>>> 2) Add two new states: (1) going down - was complete; (2) going
>>     down
>>     >>>> -
>>     >>>> was incomplete.
>>     >>>>
>>     >>>> 3) Change the style of state machine to admit pseudo variables
>>     which
>>     >>>> can be set and testing as part of the state machinery.
>>     >>>>
>>     >>>> Option 1 is just some minor wording changes but adopting either
>>     >>>> options 2 or 3 involves more extensive changes so I would prefer to
>>     >>>> avoid them.
>>     >>>
>>     >>>
>>     >>>  From what I have seen, trying to build a state machine with
>>     conditions
>>     >>> rather than events is fraught with problems and tends to lead to
>>     errors
>>     >>> in
>>     >>> implementation.  It amounts to hiding pseudo-variables inside
>>     the states,
>>     >>> but not describing them.
>>     >>> Thus, I would much prefer solution 2, but it is of course up to
>>     the WG.
>>     >>
>>     >>
>>     >> Well, option 2 wouldn't be too hard. Option 3 would probably
>>     involve the
>>     >> most
>>     >> change.
>>     >>
>>     >>> ...
>>     >>>
>>     >>>>> Minor Issues:
>>     >>>>> In section 2.3.3 describing the state transitions for push
>>     >>>>> servers, there is an event (event 1) described as "the server was
>>     >>>>> Down but is now Up."  The state transition diagram describes this
>>     >>>>> as
>>     >>>>> being a valid event that does not change the servers state if the
>>     >>>>> server is in any state other than "Down." In one sense, this is
>>     >>>>> reasonable, saying that such an event is harmless.  I would
>>     >>>>> however
>>     >>>>> expect some sort of logging or administrative notification, as
>>     >>>>> something in the system is quite confused.
>>     >>>>
>>     >>>>
>>     >>>> Again, I see your point but it seems to me to be a matter of state
>>     >>>> machine style. Note that the "event" is described as a
>>     condition, so
>>     >>>> from that point of view, it is true anytime the state is other than
>>     >>>> Down. On the other hand, if you view it as strictly an event, you
>>     >>>> are
>>     >>>> left with the question of what to put at the intersection of a
>>     state
>>     >>>> and event in the table when it is impossible for that event to
>>     occur
>>     >>>> in that state. Some people note this with an "N/A" (not applicable)
>>     >>>> entry. In fact, previous TRILL state diagrams such as in RFC 7177
>>     >>>> use
>>     >>>> "N/A" so it would probably be simplest to change to that for
>>     >>>> consistency.
>>     >>>
>>     >>>
>>     >>> I think N/A would be good.
>>     >>
>>     >>
>>     >> OK.
>>     >>
>>     >>> ...
>>     >>>
>>     >>>>> Text in section 3.2.2.1 on lifetimes and the information
>>     >>>>> maintenance in section 3.3 imply that the clients and servers must
>>     >>>>> maintain a connection. Presumably, this is required already by the
>>     >>>>> RBridge Channel protocol, and I understand that we should not
>>     >>>>> repeat
>>     >>>>> the entire protocol here.  It would seem to make readers life MUCH
>>     >>>>> simpler if the text noted that the RBridge Channel protocol
>>     >>>>> requires
>>     >>>>> that there be a maintained connection between the client and the
>>     >>>>> server, and that these mechanisms leverage the presence of that
>>     >>>>> connection.
>>     >>>>
>>     >>>>
>>     >>>> The basic RBridge Channel protocol [RFC7178] is a datagram protocol
>>     >>>> rather than a connection protocol. So there is no guaranteed
>>     >>>> continuity of connection between RBridges that have previously
>>     >>>> exchanged RBridge Channel messages. But connection would only be
>>     >>>> lost
>>     >>>> if the network partitions since RBridge Channel messages look like
>>     >>>> data packets to any transit RBridges and will get forwarded as long
>>     >>>> as
>>     >>>> there is any route. Network partition is immediately visible in the
>>     >>>> link state database to the RBridges at both ends of an RBridge
>>     >>>> Channel
>>     >>>> exchange.  Section 3.7 provides that if a Pull Directory is no
>>     >>>> longer
>>     >>>> reachable (i.e., RBridge Channel protocol packets would no longer
>>     >>>> get
>>     >>>> through), then all pull responses from that Pull Directory MUST be
>>     >>>> discarded since cache consistency update messages can't get
>>     through.
>>     >>>> Perhaps a reference to Section 3.7 should be added to Section 3.3.
>>     >>>
>>     >>>
>>     >>> I don't think a reference to 3.7 is sufficient, although it is
>>     helpful.
>>     >>> If the protocol is a datagram protocol, and if it is important
>>     to discard
>>     >>> data from unreachable pull servers, then I think 3.7 NEEDS to
>>     say more
>>     >>> than
>>     >>> just ~if you happen to magically figure out you can't reach the
>>     server,
>>     >>> discard data it has given you.~  From the rest of the text, this
>>     is an
>>     >>> important and unspecified protocol mechanism.
>>     >>
>>     >>
>>     >> Figuring out whether/how you can reach other RBridges is a basic
>>     >> function of TRILL IS-IS based routing, not something "magical".
>>     >> Whenever their is a topology change, an RBridge MUST determine routes
>>     >> to all data reachable RBridges in the new topology. If there was an
>>     >> RBridge previously reachable but no longer reachable, as would be the
>>     >> case for all RBridges on the other side of a network partition, this
>>     >> MUST be noticed so that, for example, all MAC reachability
>>     information
>>     >> associated with each of the no longer reachable RBridges can be
>>     discarded.
>>     >> It does not seem like much of a stretch to believe that an
>>     RBridge would
>>     >> keep track of the Pull Directory or Directories it was using, each of
>>     >> which will be some other RBridge, and notice when a topology change
>>     >> makes any of them inaccessible. But I have no problem adding some
>>     >> wording to make this clearer.
>>     >>
>>     >>> ...
>>     >>> In the flooding flag and behavior, (long text elided) I don't
>>     think there
>>     >>> is
>>     >>> anything wrong with the intended behavior.  It is just that the very
>>     >>> brief
>>     >>> description of the FL flag leads the reader to an incorrect
>>     expectation.
>>     >>> Yes, it gets sorted out, but that is not good.  What I would
>>     suggest is
>>     >>> when
>>     >>> the flag is defined (with whatever name you choose) note that
>>     "for the
>>     >>> qtypes 2,3,and 4, the flag indicates that the server should
>>     flood its
>>     >>> response."
>>     >>
>>     >>
>>     >> We can work  on clarifying the wording.
>>     >>
>>     >> Thanks,
>>     >> Donald
>>     >> =============================
>>     >>   Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>>     >>   155 Beaver Street, Milford, MA 01757 USA
>>     >>   d3e3e3@gmail.com <mailto:d3e3e3@gmail.com>
>>     >>
>>     >
>>
>>
>
> _______________________________________________
> trill mailing list
> trill@ietf.org
> https://www.ietf.org/mailman/listinfo/trill
>