Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

"Joel M. Halpern" <jmh@joelhalpern.com> Fri, 15 April 2016 15:52 UTC

Return-Path: <jmh@joelhalpern.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 57D0412D9DF; Fri, 15 Apr 2016 08:52:07 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.702
X-Spam-Level:
X-Spam-Status: No, score=-2.702 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=joelhalpern.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id AUXNif2rSkLD; Fri, 15 Apr 2016 08:52:05 -0700 (PDT)
Received: from mailb2.tigertech.net (mailb2.tigertech.net [208.80.4.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D0BE312D9A3; Fri, 15 Apr 2016 08:52:05 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by mailb2.tigertech.net (Postfix) with ESMTP id 82D252E27DC; Fri, 15 Apr 2016 08:52:05 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelhalpern.com; s=1.tigertech; t=1460735525; bh=iK6V/+M9YffJC4eZYJfqhuVBlh1m9hotNn1OSrl94LM=; h=Subject:To:References:Cc:From:Date:In-Reply-To:From; b=j1f7rldtgUYled8rktTizzQA90Vfl3cjgfLNoZO6Bkt6vEZ41OX5ayBw8VkmfI5n/ 9er16879PfJeoXUegJtivEeC1xRYVKk3OMRSoScOoUmvTHFAyBu2Em33iiobRaUg0z CntXmmStu+aAHe3Jh9d1ab+iHMLdqBnsvy2jDybU=
X-Virus-Scanned: Debian amavisd-new at b2.tigertech.net
Received: from Joels-MacBook-Pro.local (209-255-163-147.ip.mcleodusa.net [209.255.163.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mailb2.tigertech.net (Postfix) with ESMTPSA id 8B2D61C0D95; Fri, 15 Apr 2016 08:52:04 -0700 (PDT)
To: Donald Eastlake <d3e3e3@gmail.com>
References: <570EB05D.20802@joelhalpern.com> <CAF4+nEHWCs7EOzMFN7HzA92DtdEzsFvFk-4zuzY4MRfeXdA4JA@mail.gmail.com>
From: "Joel M. Halpern" <jmh@joelhalpern.com>
Message-ID: <57110E19.6050304@joelhalpern.com>
Date: Fri, 15 Apr 2016 11:51:53 -0400
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:38.0) Gecko/20100101 Thunderbird/38.7.2
MIME-Version: 1.0
In-Reply-To: <CAF4+nEHWCs7EOzMFN7HzA92DtdEzsFvFk-4zuzY4MRfeXdA4JA@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: <http://mailarchive.ietf.org/arch/msg/trill/m1txH7MpwYYxlOIcbTmJ4ea_t6U>
Cc: "rtg-ads@ietf.org" <rtg-ads@ietf.org>, "rtg-dir@ietf.org" <rtg-dir@ietf.org>, "trill@ietf.org" <trill@ietf.org>, draft-ietf-trill-directory-assist-mechanisms.all@ietf.org
Subject: Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/trill/>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 15 Apr 2016 15:52:07 -0000

Thank you Donald.  Points of agreement elided, some responses to try to
clarify my observations.  I will note that from your comments about 3.1, 
I believe my concerns, now moved to 3.7, are larger, as I had assumed 
that the magic was in some other protocol, and you now say it is not 
defined there.

Yours,
Joel

On 4/15/16 11:23 AM, Donald Eastlake wrote:
> Hi Joel
>
> Thanks for your thorough review and comments. See below
>
> On Wed, Apr 13, 2016 at 4:47 PM, Joel M. Halpern <jmh@joelhalpern.com
>  <mailto:jmh@joelhalpern.com>>
>
> wrote:
>
...
>> Major Issues:
>
>> In the state machine transitions in section 2.3.3
>
>> for push servers, it appears that if the event indicating that the
>
>> server is being shut down occurs while the server is already Going
>
>> Stand-By or Uncompleting, the transitions indicate that this
>> "going
>
>> down" event will be lost.  A strict reading of this would seem to
>
>> mean that the "go Down" event would need to recur after the
>> timeout
>
>> condition.  This would seem to be best addressed by a new state
>
>> "Going-Down" whose timeout behavior is to move to down state.
>
> I understand your point but "going down" and the like are called
>
> "events or conditions" in this draft, not just events.
>
> The problem with adding a single "Going-Down" state is that
> transition
>
> to that state would lose the information as to whether or not the
> Push
>
> Directory had been advertising that it was pushing complete
>
> information or not. The reason to remember this is that you would
> want
>
> to behave a differently if the "going down" condition was revoked
>
> before it completed. This information could be preserved in a
> Boolean
>
> pseudo variable but the current style of state machine in this draft
>
> avoids such pseudo variables and encodes all of the relevant push
>
> directory's state into the state machine state. Thus, I can see
> three
>
> possible responses to your comment:
>
> 1) Change wording to emphasize that these "events or conditions" can
>
> be conditions that cause a state transition some substantial time
>
> after they become true.
>
> 2) Add two new states: (1) going down - was complete; (2) going down
> -
>
> was incomplete.
>
> 3) Change the style of state machine to admit pseudo variables which
>
> can be set and testing as part of the state machinery.
>
> Option 1 is just some minor wording changes but adopting either
>
> options 2 or 3 involves more extensive changes so I would prefer to
>
> avoid them.
>

 From what I have seen, trying to build a state machine with conditions 
rather than events is fraught with problems and tends to lead to errors 
in implementation.  It amounts to hiding pseudo-variables inside the 
states, but not describing them.

Thus, I would much prefer solution 2, but it is of course up to the WG.

...

>> Minor Issues:
>
>> In section 2.3.3 describing the state transitions for push
>
>> servers, there is an event (event 1) described as "the server was
>
>> Down but is now Up."  The state transition diagram describes this
>> as
>
>> being a valid event that does not change the servers state if the
>
>> server is in any state other than "Down." In one sense, this is
>
>> reasonable, saying that such an event is harmless.  I would
>> however
>
>> expect some sort of logging or administrative notification, as
>
>> something in the system is quite confused.
>
> Again, I see your point but it seems to me to be a matter of state
>
> machine style. Note that the "event" is described as a condition, so
>
> from that point of view, it is true anytime the state is other than
>
> Down. On the other hand, if you view it as strictly an event, you
> are
>
> left with the question of what to put at the intersection of a state
>
> and event in the table when it is impossible for that event to occur
>
> in that state. Some people note this with an "N/A" (not applicable)
>
> entry. In fact, previous TRILL state diagrams such as in RFC 7177
> use
>
> "N/A" so it would probably be simplest to change to that for
>
> consistency.

I think N/A would be good.

...
>> Text in section 3.2.2.1 on lifetimes and the information
>
>> maintenance in section 3.3 imply that the clients and servers must
>
>> maintain a connection. Presumably, this is required already by the
>
>> RBridge Channel protocol, and I understand that we should not
>> repeat
>
>> the entire protocol here.  It would seem to make readers life MUCH
>
>> simpler if the text noted that the RBridge Channel protocol
>> requires
>
>> that there be a maintained connection between the client and the
>
>> server, and that these mechanisms leverage the presence of that
>
>> connection.
>
> The basic RBridge Channel protocol [RFC7178] is a datagram protocol
>
> rather than a connection protocol. So there is no guaranteed
>
> continuity of connection between RBridges that have previously
>
> exchanged RBridge Channel messages. But connection would only be
> lost
>
> if the network partitions since RBridge Channel messages look like
>
> data packets to any transit RBridges and will get forwarded as long
> as
>
> there is any route. Network partition is immediately visible in the
>
> link state database to the RBridges at both ends of an RBridge
> Channel
>
> exchange.  Section 3.7 provides that if a Pull Directory is no
> longer
>
> reachable (i.e., RBridge Channel protocol packets would no longer
> get
>
> through), then all pull responses from that Pull Directory MUST be
>
> discarded since cache consistency update messages can't get through.
>
> Perhaps a reference to Section 3.7 should be added to Section 3.3.
>

I don't think a reference to 3.7 is sufficient, although it is helpful.
If the protocol is a datagram protocol, and if it is important to 
discard data from unreachable pull servers, then I think 3.7 NEEDS to 
say more than just ~if you happen to magically figure out you can't 
reach the server, discard data it has given you.~  From the rest of the 
text, this is an important and unspecified protocol mechanism.

...
In the flooding flag and behavior, (long text elided) I don't think 
there is anything wrong with the intended behavior.  It is just that the 
very brief description of the FL flag leads the reader to an incorrect 
expectation.  Yes, it gets sorted out, but that is not good.  What I 
would suggest is when the flag is defined (with whatever name you 
choose) note that "for the qtypes 2,3,and 4, the flag indicates that the 
server should flood its response."

...
>
> Thanks,
>
> Donald