Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

Donald Eastlake <> Sun, 17 April 2016 02:03 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 2087212D950; Sat, 16 Apr 2016 19:03:39 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -2.45
X-Spam-Status: No, score=-2.45 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_LOW=-0.7, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id YDHpUGOxLF8U; Sat, 16 Apr 2016 19:03:37 -0700 (PDT)
Received: from ( [IPv6:2607:f8b0:4003:c01::232]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 3844312D0BB; Sat, 16 Apr 2016 19:03:37 -0700 (PDT)
Received: by with SMTP id j9so81772259obd.3; Sat, 16 Apr 2016 19:03:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=s4jcW4lOCW4Q0YidEkOti2gbG3redYKPrzsldRoN598=; b=Rsy56jXlgP73AZ3tFcu402Gjl7cpFp7KlupjEOlxNiSZzo9Ryh577SnoEoTdK3Td4d J6kV9soxz3berpoU+LiDEi7avupEQClRAPE5z7p2ohjUqF+WinqvKzXWRXNaZckM39lb Oco52m2v4O0/Rw76zFL8JceGvRA2WqeAhTx7kmZKV+VCro4S8Fcvpe5XloO4bI688noO 93Ux20wXTvsT2mQnjNEEIVCdyJU22QmevGvijyoZb2ckNLmrzr4bISx5jBUnZOkrXNV4 y4sJ5gTzDwaDQ0n4u+sJi3zpgPbqAR6nIHNZlUe0MZ8fD7nvd51vS0KAz0WgdoKz68aa w47w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=s4jcW4lOCW4Q0YidEkOti2gbG3redYKPrzsldRoN598=; b=VzNLpO43imUIM1BtNgp6btT8q9ggDahgf8F4CtnglLr2FDBV8oiLuL7BWmXO3vRwj4 ng4wDoPEEGllQQnLvUgw5KiFxdGkWmnEgcReFSdH6NWSBwvWMzUdQL9q+KSPteOZ00mG YFLBLIv96P/JNUr6kxfh14al2vTM//l9tYvUOQkqdfRWBmhbeuMLx0r3FW0tNLbwF5Xf NS9Lr5lbRTfGHvggu7dYtGSrQCvLyFoGoZc5uznmU+ae2On7fdDpl+/FxUmqanD2AbiV rSBgDPBdXZfgQZihsf/n+hYzPX0/5w0/zeWcWmqlKK/mlWFuHrgKf3mVgzdfmopz4CHp oDXw==
X-Gm-Message-State: AOPr4FUXfMtPNmTzMxalxIq7Z6AGjAdOevYbNYOnrDM/nmzUgG+Tnt/6nJ7NsETLYLVGGBNNBOZcbrUzDqvezw==
X-Received: by with SMTP id dx6mr13620416oeb.5.1460858616541; Sat, 16 Apr 2016 19:03:36 -0700 (PDT)
MIME-Version: 1.0
Received: by with HTTP; Sat, 16 Apr 2016 19:03:22 -0700 (PDT)
In-Reply-To: <>
References: <> <> <> <> <>
From: Donald Eastlake <>
Date: Sat, 16 Apr 2016 22:03:22 -0400
Message-ID: <>
To: "Joel M. Halpern" <>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <>
Cc: "" <>, "" <>, "" <>,
Subject: Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Developing a hybrid router/bridge." <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Sun, 17 Apr 2016 02:03:39 -0000

Hi Joel,

On Fri, Apr 15, 2016 at 11:46 PM, Joel M. Halpern <> wrote:
> If by the connectivity check to the directory server, you mean the
> underlying IS-IS routing reporting connectivity, then say that.


> While that
> is not actually interchangeable with real connectivity, it is perfectly
> reasoanble for the WG to deem it sufficient.  I think it would only take a
> sentence or two to clarify for the reader that what is meant is apparent
> topological connectivity, as distinct from verified communication.

The phrase usually used in TRILL (See RFC 7780) is "data reachable".

 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA

> Yours,
> Joel
> On 4/15/16 11:12 PM, Donald Eastlake wrote:
>> Hi Joel,
>> On Fri, Apr 15, 2016 at 11:51 AM, Joel M. Halpern <>
>> wrote:
>>> Thank you Donald.  Points of agreement elided, some responses to try to
>>> clarify my observations.  I will note that from your comments about 3.1,
>>> I
>>> believe my concerns, now moved to 3.7, are larger, as I had assumed that
>>> the
>>> magic was in some other protocol, and you now say it is not defined
>>> there.
>>> Yours,
>>> Joel
>>> On 4/15/16 11:23 AM, Donald Eastlake wrote:
>>>> Hi Joel
>>>> Thanks for your thorough review and comments. See below
>>>> On Wed, Apr 13, 2016 at 4:47 PM, Joel M. Halpern <
>>>>   <> wrote:
>>> ...
>>>>> Major Issues:
>>>>> In the state machine transitions in section 2.3.3
>>>>> for push servers, it appears that if the event indicating that the
>>>>> server is being shut down occurs while the server is already Going
>>>>> Stand-By or Uncompleting, the transitions indicate that this
>>>>> "going
>>>>> down" event will be lost.  A strict reading of this would seem to
>>>>> mean that the "go Down" event would need to recur after the
>>>>> timeout
>>>>> condition.  This would seem to be best addressed by a new state
>>>>> "Going-Down" whose timeout behavior is to move to down state.
>>>> I understand your point but "going down" and the like are called
>>>> "events or conditions" in this draft, not just events.
>>>> The problem with adding a single "Going-Down" state is that
>>>> transition
>>>> to that state would lose the information as to whether or not the
>>>> Push
>>>> Directory had been advertising that it was pushing complete
>>>> information or not. The reason to remember this is that you would
>>>> want
>>>> to behave a differently if the "going down" condition was revoked
>>>> before it completed. This information could be preserved in a
>>>> Boolean
>>>> pseudo variable but the current style of state machine in this draft
>>>> avoids such pseudo variables and encodes all of the relevant push
>>>> directory's state into the state machine state. Thus, I can see
>>>> three
>>>> possible responses to your comment:
>>>> 1) Change wording to emphasize that these "events or conditions" can
>>>> be conditions that cause a state transition some substantial time
>>>> after they become true.
>>>> 2) Add two new states: (1) going down - was complete; (2) going down
>>>> -
>>>> was incomplete.
>>>> 3) Change the style of state machine to admit pseudo variables which
>>>> can be set and testing as part of the state machinery.
>>>> Option 1 is just some minor wording changes but adopting either
>>>> options 2 or 3 involves more extensive changes so I would prefer to
>>>> avoid them.
>>>  From what I have seen, trying to build a state machine with conditions
>>> rather than events is fraught with problems and tends to lead to errors
>>> in
>>> implementation.  It amounts to hiding pseudo-variables inside the states,
>>> but not describing them.
>>> Thus, I would much prefer solution 2, but it is of course up to the WG.
>> Well, option 2 wouldn't be too hard. Option 3 would probably involve the
>> most
>> change.
>>> ...
>>>>> Minor Issues:
>>>>> In section 2.3.3 describing the state transitions for push
>>>>> servers, there is an event (event 1) described as "the server was
>>>>> Down but is now Up."  The state transition diagram describes this
>>>>> as
>>>>> being a valid event that does not change the servers state if the
>>>>> server is in any state other than "Down." In one sense, this is
>>>>> reasonable, saying that such an event is harmless.  I would
>>>>> however
>>>>> expect some sort of logging or administrative notification, as
>>>>> something in the system is quite confused.
>>>> Again, I see your point but it seems to me to be a matter of state
>>>> machine style. Note that the "event" is described as a condition, so
>>>> from that point of view, it is true anytime the state is other than
>>>> Down. On the other hand, if you view it as strictly an event, you
>>>> are
>>>> left with the question of what to put at the intersection of a state
>>>> and event in the table when it is impossible for that event to occur
>>>> in that state. Some people note this with an "N/A" (not applicable)
>>>> entry. In fact, previous TRILL state diagrams such as in RFC 7177
>>>> use
>>>> "N/A" so it would probably be simplest to change to that for
>>>> consistency.
>>> I think N/A would be good.
>> OK.
>>> ...
>>>>> Text in section on lifetimes and the information
>>>>> maintenance in section 3.3 imply that the clients and servers must
>>>>> maintain a connection. Presumably, this is required already by the
>>>>> RBridge Channel protocol, and I understand that we should not
>>>>> repeat
>>>>> the entire protocol here.  It would seem to make readers life MUCH
>>>>> simpler if the text noted that the RBridge Channel protocol
>>>>> requires
>>>>> that there be a maintained connection between the client and the
>>>>> server, and that these mechanisms leverage the presence of that
>>>>> connection.
>>>> The basic RBridge Channel protocol [RFC7178] is a datagram protocol
>>>> rather than a connection protocol. So there is no guaranteed
>>>> continuity of connection between RBridges that have previously
>>>> exchanged RBridge Channel messages. But connection would only be
>>>> lost
>>>> if the network partitions since RBridge Channel messages look like
>>>> data packets to any transit RBridges and will get forwarded as long
>>>> as
>>>> there is any route. Network partition is immediately visible in the
>>>> link state database to the RBridges at both ends of an RBridge
>>>> Channel
>>>> exchange.  Section 3.7 provides that if a Pull Directory is no
>>>> longer
>>>> reachable (i.e., RBridge Channel protocol packets would no longer
>>>> get
>>>> through), then all pull responses from that Pull Directory MUST be
>>>> discarded since cache consistency update messages can't get through.
>>>> Perhaps a reference to Section 3.7 should be added to Section 3.3.
>>> I don't think a reference to 3.7 is sufficient, although it is helpful.
>>> If the protocol is a datagram protocol, and if it is important to discard
>>> data from unreachable pull servers, then I think 3.7 NEEDS to say more
>>> than
>>> just ~if you happen to magically figure out you can't reach the server,
>>> discard data it has given you.~  From the rest of the text, this is an
>>> important and unspecified protocol mechanism.
>> Figuring out whether/how you can reach other RBridges is a basic
>> function of TRILL IS-IS based routing, not something "magical".
>> Whenever their is a topology change, an RBridge MUST determine routes
>> to all data reachable RBridges in the new topology. If there was an
>> RBridge previously reachable but no longer reachable, as would be the
>> case for all RBridges on the other side of a network partition, this
>> MUST be noticed so that, for example, all MAC reachability information
>> associated with each of the no longer reachable RBridges can be discarded.
>> It does not seem like much of a stretch to believe that an RBridge would
>> keep track of the Pull Directory or Directories it was using, each of
>> which will be some other RBridge, and notice when a topology change
>> makes any of them inaccessible. But I have no problem adding some
>> wording to make this clearer.
>>> ...
>>> In the flooding flag and behavior, (long text elided) I don't think there
>>> is
>>> anything wrong with the intended behavior.  It is just that the very
>>> brief
>>> description of the FL flag leads the reader to an incorrect expectation.
>>> Yes, it gets sorted out, but that is not good.  What I would suggest is
>>> when
>>> the flag is defined (with whatever name you choose) note that "for the
>>> qtypes 2,3,and 4, the flag indicates that the server should flood its
>>> response."
>> We can work  on clarifying the wording.
>> Thanks,
>> Donald
>> =============================
>>   Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>>   155 Beaver Street, Milford, MA 01757 USA