Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

"Joel M. Halpern" <> Sat, 16 April 2016 03:46 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 08FDE12E2A6; Fri, 15 Apr 2016 20:46:37 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.702
X-Spam-Status: No, score=-2.702 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (1024-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id i0lv23oNAbou; Fri, 15 Apr 2016 20:46:35 -0700 (PDT)
Received: from ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 537FC12DBFA; Fri, 15 Apr 2016 20:46:35 -0700 (PDT)
Received: from localhost (localhost []) by (Postfix) with ESMTP id 135DB5E0BF8; Fri, 15 Apr 2016 20:46:35 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=1.tigertech; t=1460778395; bh=jD3WISf2Pcc5wFpjWpbSQm/plcDm3j96LLtUbtRtllk=; h=Subject:To:References:Cc:From:Date:In-Reply-To:From; b=IkiaN1zxFNz2H2ncBNLbFxjpYiSuT+vXMZjtQmsp4NjEsw8LFlXjfMt6rGmQYRRGF Swq81RwPYYVb//FG+zQH7xkbCQ7BjNFAKxA9bFrUqAb4h60DY5EB8xqqIApkE5Muvo XJsErcSzoliC1B7PYsjfiDmZzBhBUMQoOoS6rNWM=
X-Virus-Scanned: Debian amavisd-new at
Received: from Joels-MacBook-Pro.local ( []) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPSA id 242DD5E0BF6; Fri, 15 Apr 2016 20:46:34 -0700 (PDT)
To: Donald Eastlake <>
References: <> <> <> <>
From: "Joel M. Halpern" <>
Message-ID: <>
Date: Fri, 15 Apr 2016 23:46:22 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:38.0) Gecko/20100101 Thunderbird/38.7.2
MIME-Version: 1.0
In-Reply-To: <>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <>
Cc: "" <>, "" <>, "" <>,
Subject: Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Developing a hybrid router/bridge." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sat, 16 Apr 2016 03:46:37 -0000

If by the connectivity check to the directory server, you mean the 
underlying IS-IS routing reporting connectivity, then say that.  While 
that is not actually interchangeable with real connectivity, it is 
perfectly reasoanble for the WG to deem it sufficient.  I think it would 
only take a sentence or two to clarify for the reader that what is meant 
is apparent topological connectivity, as distinct from verified 


On 4/15/16 11:12 PM, Donald Eastlake wrote:
> Hi Joel,
> On Fri, Apr 15, 2016 at 11:51 AM, Joel M. Halpern <> wrote:
>> Thank you Donald.  Points of agreement elided, some responses to try to
>> clarify my observations.  I will note that from your comments about 3.1, I
>> believe my concerns, now moved to 3.7, are larger, as I had assumed that the
>> magic was in some other protocol, and you now say it is not defined there.
>> Yours,
>> Joel
>> On 4/15/16 11:23 AM, Donald Eastlake wrote:
>>> Hi Joel
>>> Thanks for your thorough review and comments. See below
>>> On Wed, Apr 13, 2016 at 4:47 PM, Joel M. Halpern <
>>>   <> wrote:
>> ...
>>>> Major Issues:
>>>> In the state machine transitions in section 2.3.3
>>>> for push servers, it appears that if the event indicating that the
>>>> server is being shut down occurs while the server is already Going
>>>> Stand-By or Uncompleting, the transitions indicate that this
>>>> "going
>>>> down" event will be lost.  A strict reading of this would seem to
>>>> mean that the "go Down" event would need to recur after the
>>>> timeout
>>>> condition.  This would seem to be best addressed by a new state
>>>> "Going-Down" whose timeout behavior is to move to down state.
>>> I understand your point but "going down" and the like are called
>>> "events or conditions" in this draft, not just events.
>>> The problem with adding a single "Going-Down" state is that
>>> transition
>>> to that state would lose the information as to whether or not the
>>> Push
>>> Directory had been advertising that it was pushing complete
>>> information or not. The reason to remember this is that you would
>>> want
>>> to behave a differently if the "going down" condition was revoked
>>> before it completed. This information could be preserved in a
>>> Boolean
>>> pseudo variable but the current style of state machine in this draft
>>> avoids such pseudo variables and encodes all of the relevant push
>>> directory's state into the state machine state. Thus, I can see
>>> three
>>> possible responses to your comment:
>>> 1) Change wording to emphasize that these "events or conditions" can
>>> be conditions that cause a state transition some substantial time
>>> after they become true.
>>> 2) Add two new states: (1) going down - was complete; (2) going down
>>> -
>>> was incomplete.
>>> 3) Change the style of state machine to admit pseudo variables which
>>> can be set and testing as part of the state machinery.
>>> Option 1 is just some minor wording changes but adopting either
>>> options 2 or 3 involves more extensive changes so I would prefer to
>>> avoid them.
>>  From what I have seen, trying to build a state machine with conditions
>> rather than events is fraught with problems and tends to lead to errors in
>> implementation.  It amounts to hiding pseudo-variables inside the states,
>> but not describing them.
>> Thus, I would much prefer solution 2, but it is of course up to the WG.
> Well, option 2 wouldn't be too hard. Option 3 would probably involve the most
> change.
>> ...
>>>> Minor Issues:
>>>> In section 2.3.3 describing the state transitions for push
>>>> servers, there is an event (event 1) described as "the server was
>>>> Down but is now Up."  The state transition diagram describes this
>>>> as
>>>> being a valid event that does not change the servers state if the
>>>> server is in any state other than "Down." In one sense, this is
>>>> reasonable, saying that such an event is harmless.  I would
>>>> however
>>>> expect some sort of logging or administrative notification, as
>>>> something in the system is quite confused.
>>> Again, I see your point but it seems to me to be a matter of state
>>> machine style. Note that the "event" is described as a condition, so
>>> from that point of view, it is true anytime the state is other than
>>> Down. On the other hand, if you view it as strictly an event, you
>>> are
>>> left with the question of what to put at the intersection of a state
>>> and event in the table when it is impossible for that event to occur
>>> in that state. Some people note this with an "N/A" (not applicable)
>>> entry. In fact, previous TRILL state diagrams such as in RFC 7177
>>> use
>>> "N/A" so it would probably be simplest to change to that for
>>> consistency.
>> I think N/A would be good.
> OK.
>> ...
>>>> Text in section on lifetimes and the information
>>>> maintenance in section 3.3 imply that the clients and servers must
>>>> maintain a connection. Presumably, this is required already by the
>>>> RBridge Channel protocol, and I understand that we should not
>>>> repeat
>>>> the entire protocol here.  It would seem to make readers life MUCH
>>>> simpler if the text noted that the RBridge Channel protocol
>>>> requires
>>>> that there be a maintained connection between the client and the
>>>> server, and that these mechanisms leverage the presence of that
>>>> connection.
>>> The basic RBridge Channel protocol [RFC7178] is a datagram protocol
>>> rather than a connection protocol. So there is no guaranteed
>>> continuity of connection between RBridges that have previously
>>> exchanged RBridge Channel messages. But connection would only be
>>> lost
>>> if the network partitions since RBridge Channel messages look like
>>> data packets to any transit RBridges and will get forwarded as long
>>> as
>>> there is any route. Network partition is immediately visible in the
>>> link state database to the RBridges at both ends of an RBridge
>>> Channel
>>> exchange.  Section 3.7 provides that if a Pull Directory is no
>>> longer
>>> reachable (i.e., RBridge Channel protocol packets would no longer
>>> get
>>> through), then all pull responses from that Pull Directory MUST be
>>> discarded since cache consistency update messages can't get through.
>>> Perhaps a reference to Section 3.7 should be added to Section 3.3.
>> I don't think a reference to 3.7 is sufficient, although it is helpful.
>> If the protocol is a datagram protocol, and if it is important to discard
>> data from unreachable pull servers, then I think 3.7 NEEDS to say more than
>> just ~if you happen to magically figure out you can't reach the server,
>> discard data it has given you.~  From the rest of the text, this is an
>> important and unspecified protocol mechanism.
> Figuring out whether/how you can reach other RBridges is a basic
> function of TRILL IS-IS based routing, not something "magical".
> Whenever their is a topology change, an RBridge MUST determine routes
> to all data reachable RBridges in the new topology. If there was an
> RBridge previously reachable but no longer reachable, as would be the
> case for all RBridges on the other side of a network partition, this
> MUST be noticed so that, for example, all MAC reachability information
> associated with each of the no longer reachable RBridges can be discarded.
> It does not seem like much of a stretch to believe that an RBridge would
> keep track of the Pull Directory or Directories it was using, each of
> which will be some other RBridge, and notice when a topology change
> makes any of them inaccessible. But I have no problem adding some
> wording to make this clearer.
>> ...
>> In the flooding flag and behavior, (long text elided) I don't think there is
>> anything wrong with the intended behavior.  It is just that the very brief
>> description of the FL flag leads the reader to an incorrect expectation.
>> Yes, it gets sorted out, but that is not good.  What I would suggest is when
>> the flag is defined (with whatever name you choose) note that "for the
>> qtypes 2,3,and 4, the flag indicates that the server should flood its
>> response."
> We can work  on clarifying the wording.
> Thanks,
> Donald
> =============================
>   Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>   155 Beaver Street, Milford, MA 01757 USA