Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

"Susan Hares" <shares@ndzh.com> Wed, 13 April 2016 22:53 UTC

Return-Path: <shares@ndzh.com>
X-Original-To: trill@ietfa.amsl.com
Delivered-To: trill@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C00F312DD59; Wed, 13 Apr 2016 15:53:49 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.738
X-Spam-Level: *
X-Spam-Status: No, score=1.738 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DOS_OUTLOOK_TO_MX=2.845, RDNS_NONE=0.793] autolearn=no autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id x9wTWzs27uZ0; Wed, 13 Apr 2016 15:53:48 -0700 (PDT)
Received: from hickoryhill-consulting.com (unknown [50.245.122.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id BA6DA12DA7A; Wed, 13 Apr 2016 15:53:47 -0700 (PDT)
X-Default-Received-SPF: pass (skip=loggedin (res=PASS)) x-ip-name=74.43.47.77;
From: "Susan Hares" <shares@ndzh.com>
To: "'Joel M. Halpern'" <jmh@joelhalpern.com>, <rtg-ads@ietf.org>
References: <570EB05D.20802@joelhalpern.com>
In-Reply-To: <570EB05D.20802@joelhalpern.com>
Date: Wed, 13 Apr 2016 18:53:42 -0400
Message-ID: <02c401d195d7$550d7e70$ff287b50$@ndzh.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Mailer: Microsoft Outlook 14.0
Thread-Index: AQLt0ar96cOf013H3k5feyl+BDKWp51QCxpQ
Content-Language: en-us
X-Authenticated-User: skh@ndzh.com
Archived-At: <http://mailarchive.ietf.org/arch/msg/trill/53LOOkm7Xep637M9uBvgFxcyhp8>
Cc: rtg-dir@ietf.org, trill@ietf.org, draft-ietf-trill-directory-assist-mechanisms.all@ietf.org
Subject: Re: [trill] RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt
X-BeenThere: trill@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
List-Id: "Developing a hybrid router/bridge." <trill.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/trill>, <mailto:trill-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/trill/>
List-Post: <mailto:trill@ietf.org>
List-Help: <mailto:trill-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/trill>, <mailto:trill-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Apr 2016 22:53:50 -0000

Joel: 

Thank you for the review of this document.  As co-chair, I will work with the authors to try to address this issue. 

Sue 

-----Original Message-----
From: Joel M. Halpern [mailto:jmh@joelhalpern.com] 
Sent: Wednesday, April 13, 2016 4:47 PM
To: rtg-ads@ietf.org
Cc: rtg-dir@ietf.org; draft-ietf-trill-directory-assist-mechanisms.all@ietf.org; trill@ietf.org
Subject: RtgDir review of draft-ietf-trill-directory-assist-mechanisms-07.txt

Hello,

I have been selected as the Routing Directorate reviewer for this draft. 
The Routing Directorate seeks to review all routing or routing-related drafts as they pass through IETF last call and IESG review, and sometimes on special request. The purpose of the review is to provide assistance to the Routing ADs. For more information about the Routing Directorate, please see ​ http://trac.tools.ietf.org/area/rtg/trac/wiki/RtgDir

Although these comments are primarily for the use of the Routing ADs, it would be helpful if you could consider them along with any other IETF Last Call comments that you receive, and strive to resolve them through discussion or by updating the draft.

Document: draft-ietf-trill-directory-assist-mechanisms-07.txt
Reviewer: Joel Halpern
Review Date: 13-April-2016
IETF LC End Date: N/A
Intended Status: Proposed Standard

Summary: I have significant concerns about this document and recommend that the Routing ADs discuss these issues further with the authors.
     I do believe that the major issues are easily resolvable.  I have tried to provide my best guess as to text how to resolve each of them.
     I would like to see the minor issues discussed and preferably addressed.

Major Issues:
     In the state machine transitions in section 2.3.3 for push servers, it appears that if the event indicating that the server is being shut down occurs while the server is already Going Stand-By or Uncompleting, the transitions indicate that this "going down" event will be lost.  A strict reading of this would seem to mean that the "go Down" event would need to recur after the timeout condition.  This would seem to be best addressed by a new state "Going-Down" whose timeout behavior is to move to down state.

In section 2.3.2, The descriptions for event 3 and 5 are identical.  I believe from the state transitions that condition 3 is supposed to reflect the server NOT having complete data when the Activate condition is met.

In section 3.2.1 there is provision for using a received frame as a Query.  There are type indications as to what the type of the frame is. 
  I believe that the intent is that the query always contains the full received Ethernet Frame, no matter what the type is.  But it does not say that.  So one could also conclude that for ARP, what I should send is the ARP message, and for ND, the ND message, etc.  I believe the text needs to be clarified.  If my guess is correct that the full Ethernet Frame is to be send in all cases, then explanatory text as to why the various type codes exist would seem helpful, since the received frame contains enough information to support decoding.



Minor Issues:
     In section 2.3.3 describing the state transitions for push servers, there is an event (event 1) described as "the server was Down but is now Up."  The state transition diagram describes this as being a valid event that does not change the servers state if the server is in any state other than "Down." In one sense, this is reasonable, saying that such an event is harmless.  I would however expect some sort of logging or administrative notification, as something in the system is quite confused.

     Should section 2.4 include a note that indicates that reliance on information completeness does mean that there are windows when new entities join the space represented by particular TRILL data label during which packets for that destination may be dropped, due to clients not yet having received the updated information?  I believe this window is small, and it is quite reasonable to also note that in such text.

     Text in section 3.2.2.1 on lifetimes and the information maintenance in section 3.3 imply that the clients and servers must maintain a connection.  Presumably, this is required already by the RBRidge Channel protocol, and I understand that we should not repeat the entire protocol here.  It would seem to make readers life MUCH simpler if the text noted that the RBRidge Channel protocol requires that there be a maintained connection between the client and the server, and that these mechanisms leverage the presence of that connection.

     In section 3.2.2.2 on Pull directory forwarding, I expect to see text about and to whom the Pull server will flood the received request. 
  Instead, the text appears to say that it is teh response that will be flooded.  More importantly, the descriptive text talks about sending the response, which would normally be a description of sending the response to the requestor, not sending it to someone else.
     In a related confusion, it seems very strange that a "flood" 
request will result in sending an underlying paket unicast to the destination.  This may be just terminology, but it seems likely to confuse implementors.  Maybe the flag should be called the Forward flag, with a note in the definition that it nromally causes the response to be sent to multiple parties, but in the case of a raw MAC frame, results in the packet being forwarded to the destination or flooded, as the server can manage?

     In the description in section 3.3 of Cache management, in the text on method one in which the servers keep minimal state, it would seem that a large health warning is needed, as this method will cause all clients to discard all positive data whenever any positive data at the server changes (even if no client is using the modified data.)  This makes a flapping end station an attack on the cache of all clients!
     It strikes me that the working group could help get robust deployment by making method 3 (tracking what you told clients) a SHOULD. 
  (I grant that it is not a MUST, as the other choices do work.)

Editorial Issues / Nits :