Re: [6tisch] Benjamin Kaduk's Discuss on draft-ietf-6tisch-msf-12: (with DISCUSS and COMMENT)

Benjamin Kaduk <kaduk@mit.edu> Tue, 24 March 2020 19:25 UTC

Return-Path: <kaduk@mit.edu>
X-Original-To: 6tisch@ietfa.amsl.com
Delivered-To: 6tisch@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E30C93A1298; Tue, 24 Mar 2020 12:25:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id noatOKIzOPPx; Tue, 24 Mar 2020 12:25:19 -0700 (PDT)
Received: from outgoing.mit.edu (outgoing-auth-1.mit.edu [18.9.28.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id D83493A1287; Tue, 24 Mar 2020 12:25:17 -0700 (PDT)
Received: from kduck.mit.edu ([24.16.140.251]) (authenticated bits=56) (User authenticated as kaduk@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 02OJPAtD031647 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Tue, 24 Mar 2020 15:25:12 -0400
Date: Tue, 24 Mar 2020 12:25:10 -0700
From: Benjamin Kaduk <kaduk@mit.edu>
To: Tengfei Chang <tengfei.chang@gmail.com>
Cc: The IESG <iesg@ietf.org>, draft-ietf-6tisch-msf@ietf.org, 6tisch <6tisch@ietf.org>, 6tisch-chairs@ietf.org, "Pascal Thubert (pthubert)" <pthubert@cisco.com>
Message-ID: <20200324192510.GE50174@kduck.mit.edu>
References: <158394932747.1671.4699004253009791924@ietfa.amsl.com> <CAAdgstSMOf7wDSfbWMv5tEzpx1=otQZX_TZ+Xevm77f-1ZztNw@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <CAAdgstSMOf7wDSfbWMv5tEzpx1=otQZX_TZ+Xevm77f-1ZztNw@mail.gmail.com>
User-Agent: Mutt/1.12.1 (2019-06-15)
Archived-At: <https://mailarchive.ietf.org/arch/msg/6tisch/fFj-DDZn9DVLB9kV6hUTeLNrvX0>
Subject: Re: [6tisch] Benjamin Kaduk's Discuss on draft-ietf-6tisch-msf-12: (with DISCUSS and COMMENT)
X-BeenThere: 6tisch@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Discuss link layer model for Deterministic IPv6 over the TSCH mode of IEEE 802.15.4e, and impacts on RPL and 6LoWPAN such as resource allocation" <6tisch.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/6tisch>, <mailto:6tisch-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/6tisch/>
List-Post: <mailto:6tisch@ietf.org>
List-Help: <mailto:6tisch-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/6tisch>, <mailto:6tisch-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 24 Mar 2020 19:25:41 -0000

Hi Tengfei,

Also inline.

On Tue, Mar 24, 2020 at 12:22:02PM +0100, Tengfei Chang wrote:
>    Hi Benjamin,
>    I replied inline starting with '>'
>    Thanks so much those detailed comments!
>    On Wed, Mar 11, 2020 at 6:55 PM Benjamin Kaduk via Datatracker
>    <noreply@ietf.org> wrote:
> 
>      Benjamin Kaduk has entered the following ballot position for
>      draft-ietf-6tisch-msf-12: Discuss
> 
>      When responding, please keep the subject line intact and reply to all
>      email addresses included in the To and CC lines. (Feel free to cut this
>      introductory paragraph, however.)
> 
>      Please refer to
>      https://www.ietf.org/iesg/statement/discuss-criteria.html
>      for more information about IESG DISCUSS and COMMENT positions.
> 
>      The document, along with other ballot positions, can be found here:
>      https://datatracker.ietf.org/doc/draft-ietf-6tisch-msf/
> 
>      ----------------------------------------------------------------------
>      DISCUSS:
>      ----------------------------------------------------------------------
> 
>      I'm concerned that the scheduling function for autonomous cells can
>      cause an infinite loop in the case of hash collision -- Section 3
>      specifies that AutoTxCell always takes precedence over AutoRxCell, but
>      if those two cells collide, the corresponding cells on the peer in
>      question will also collide.  If both peers try to send at the same time
>      and the hashes collide, they will both attempt to transmit indefinitely
>      and never be received.
> 
>     
>    >. Notice that the AutoTxCell  is a shared cell, where the back-off
>    mechanism is applied.
>    > In case there is a collision on that cell, a back-off with different
>    exponent will be used on each side. 
>    > The cell will be used AutoTxCell on each side at different timing.

Ah, it seems I was misinterpreting "take precedence over" to apply to the
entire local scheduling, not merely the case when independent tx and rx
scheduling land on the same cell.  Thanks for clarifying here; is there
anything useful to say in the document about how even if there is a
collision in the assigned slot there's still a Tx backoff, so the cell is
usable for Rx some of the time?

>      There seems to be some "passing the buck" going on with respect to
>      rate-limiting unauthenticated (join) traffic:
>      draft-ietf-6tisch-minimal-security (Section 6.1.1) says that the SF
>      "SHOULD NOT allocate additional cells as a result of traffic with code
>      point AF43"; this document is implementing a SF, and yet we try to avoid
>      the issue, saying that "[t]he at IPv6 layer SHOULD ensure that this join
>      traffic is rate-limited before it is passed to 6top sublayer where MSF
>      can observe it".  I think we need a clear and consistent story about
>      where this rate-limiting is supposed to happen.
> 
>    > Thanks for the comments! This has been discussed in some  previous
>    revision of MSF.
>    > It is not "passing the buck" but a decision based on the scheduling
>    function and security context.
>    > In the point of avoiding layer violation, the upper layer information
>    suppose NOT see-able for linker layer where 6P and MSF are.

If we assume strict layiner so that IP information is not visible to the
link layer where the scheduling function lives, then isn't that a flaw in
draft-ietf-6tisch-minimal-security to say that the scheduling function
should do [something relying on IP-layer information]?

>    > But regarding to security, it seems it is not avoidable.
>    > IMO, the scheduling function is aiming to provide algorithm to
>    add/remove cell according to traffic.
>    > The traffic could contains unauthenticated  join request from both
>    normal devices and malicious devices.
>    > The function does NOT have enough information to differentiate them.
>    > We are assuming some other entity out side of MSF needs to resolve this
>    issue.

Nonetheless, we're currently not fulfilling a requirement that a SF should
meet.  If that requirement is unattainable, the requirement should be
modified or removed; if not, we should attain the requirement.

>    >> If assuming the security info in the Ipv6 header is passed to MSF, we
>    could abandon rate-limiting approach and simply jumping over a slot if the
>    AF43 packet is sent on that slot.
>    > Hence the adapting traffic never happens to traffic marked as AF43.
> 
>      ----------------------------------------------------------------------
>      COMMENT:
>      ----------------------------------------------------------------------
> 
>      I support Roman's Discuss -- we need more information for this to be a
>      useful reference; even what seem to be the official DASFAA 1997
>      proceedings (https://dblp.org/db/conf/dasfaa/dasfaa97) do not have an
>      associated document).
> 
>      Basing various scheduling aspects on (a hash of) the EUI64 ties
>      functionality to a persistent identifier for a device.  How significant
>      a disruption would be incurred if a device periodically changes its
>      presented EUI64 for anonymization purposes?
> 
>    > I assume you are saying a malicious device?
>    > There is no doubt this will influence the performance of joining process
>    for normal devices.
>    > But normal devices still have a chance to join.
>    > the join proxy won't be affect as well since the cell will be removed
>    right after the packet is sent out.

I was thinking a non-malicious device, just one that (for example) changes
its physical location frequently, and wants to change its EUI64 when it
does so, to avoid that location being tracked and correlated over time.
That said, your answer still seems to answer my question, and since normal
devices will still have a chance to join, it seems like we probably do not
need to add text to discuss this situation.

>      There seems to be a general pattern of "if you don't have a
>      6P-negotiated Tx cell, install and AutoTxCell to send your one message
>      and then remove it after sending"; I wonder if it would be easier on the
>      reader to consolidate this as a general principle and not repeat the
>      details every time it occurs.
> 
>    >  Yes, this is the feature of autonomous cell. Not sure if it would
>    easier to understand state just one time.
>    > There is little different for each adding/removing, e.g which node to do
>    so, parent/JP?  
>    > I personally feel it's clear to repeat this every time,  with various
>    type of node, so highlighting the difference. 

Okay.  Thank you for considering the idea.

>      Requirements Language
> 
>      "NOT RECOMMENDED" is not in the RFC2119 boilerplate (but is a BCP 14
>      keyword).
> 
>    > Thanks for pointing out. It will be removed in next revision. 
>    > We also updated the RFC to RFC8174 instead of RFC2119.

Oops, I think my comment was unclear.
RFC 8174 has a paragraph in it that you should copy/paste into your
document to replace this one.  ("NOT RECOMMENDED" is included in that
paragraph in RFC 8174.)

Also, you should cite both RFC 2119 and RFC 8174, not just RFC 8174 -- BCP
14 comprises both of them together.

>      Section 1
> 
>         the 6 steps described in Section 4.  The end state of the join
>         process is that the node is synchronized to the network, has mutually
>         authenticated to the network, has identified a routing parent, and
> 
>      nit(?): I guess maybe "mutually authenticated with" is more correct for
>      the bidirectional operation.
> 
>    > will update in next revision. 
> 
>         It does so for 3 reasons: to match the link-layer resources to the
>         traffic, to handle changing parent, to handle a schedule collision.
> 
>      nit: end the list with "or" (or "and"?).
> 
>    > will update in next revision.   
> 
>         MSF works closely with RPL, specifically the routing parent defined
>         in [RFC6550].  This specification only describes how MSF works with
>         one routing parent, which is phrased as "selected parent".  The
> 
>      nit: I suggest '''one routing parent; this parent is referred to as the
>      "selected parent"'''.
> 
>    > will update in next revision.   
> 
>         activity of MSF towards to single routing parent is called as a "MSF
> 
>      nit: "towards the"
> 
>    > will update in next revision.   
> 
>         *  We added sections on the interface to the minimal 6TiSCH
>            configuration (Section 2), the use of the SIGNAL command
>            (Section 6), the MSF constants (Section 14), the MSF statistics
>            (Section 15).
> 
>      nit: end the list with "and".
> 
>    > will update in next revision.   
> 
>      Section 2
> 
>         In a TSCH network, time is sliced up into time slots.  The time slots
>         are grouped as one of more slotframes which repeat over time.  The
> 
>      nit(?): should this be "one or more"?
> 
>    > it should be 'one or multiple slotframes". Will update in next revision
> 
>         channel) is indicated as a cell of TSCH schedule.  MSF is one of the
>         policies defining how to manage the TSCH schedule.
> 
>      nit: if there is only one such policy active at a given time for a given
>      network, I suggest "MSF is a policy for managing the TCSH schedule".
>      (If multiple policies are active simultaneously, no change is needed.)
> 
>    > As indicated in RFC8480: A node MAY implement multiple SFs  and run them
>    at the same time.
>    > so MSF is one of the policies defining how to manage the TSCH schedule.

Thank you for the reference, and sorry for missing it.

>         MSF uses the minimal cell for broadcast frames such as Enhanced
>         Beacons (EBs) [IEEE802154] and broadcast DODAG Information Objects
>         (DIOs) [RFC6550].  Cells scheduled by MSF are meant to be used only
>         for unicast frames.
> 
>      If this paragraph was moved before the previous paragraph, then EB and
>      DIO would be defined before their first usage.
> 
>    > Maybe I understand it wrong. Do you mean you prefer to move this
>    paragraph before the previous one?
>    > The EB and DIO are defined in the references, not sure we still need
>    define them in MSF.  

That is my preference, but I defer to your preference where it differs from
mine.

>         bandwidth of minimal cell.  One of the algorithm met the rule is the
>         Trickle timer defined in [RFC6206] which is applied on DIO messages
>         [RFC6550].  However, any such algorithm of limiting the broadcast
> 
>      nit(?): "One of the algorithms that fulfills this requirement"?
> 
>    > will update accordingly. 
> 
>         MSF RECOMMENDS the use of 3 slotframes.  MSF schedules autonomous
>         cells at Slotframe 1 (Section 3) and 6P negotiated cells at Slotframe
>         2 (Section 5) , while Slotframe 0 is used for the bootstrap traffic
>         as defined in the Minimal 6TiSCH Configuration.  It is RECOMMENDED to
>         use the same slotframe length for Slotframe 0, 1 and 2.  Thus it is
> 
>      Perhaps this is just a question of writing style, but if an
>      implementation is free to use an alternative SF or a variant of MSF,
>      could we not say that "MSF uses 3 slotframts", "MSF uses the same
>      slotframe length for", etc.?
> 
>    > updated to "3 slotframes are used in MSF. " , "The same slotframe length
>    for Slotframe 0, 1 and 2 is RECOMMENDED".
> 
>      Section 3
> 
>      Is there any risk of unwanted correlation between slot and channel
>      offsets when using the same hash function and input for both
>      calculations?
> 
>         hash function.  Other optional parameters defined in SAX determine
>         the performance of SAX hash function.  Those parameters could be
>         broadcasted in EB frame or pre-configured.  For interoperability
>         purposes, an example how the hash function is implemented is detailed
>         in Appendix B.
> 
>      Given the lack of usable reference for [SAX-DASFAA], I assume that the
>      content in Appendix B is going to be used as a specification, not just
>      an example.
> 
>    > the new reference for SAX is updated in the new revision. 
> 
>         *  The AutoRxCell MUST always remain scheduled after synchronized.
> 
>      nit: s/synchronized/synchronization/
> 
>         AutoRxCell.  In case of conflicting with a negotiated cell,
>         autonomous cells take precedence over negotiated cell, which is
>         stated in [IEEE802154].  However, when the Slotframe 0, 1 and 2 use
>         the same length value, it is possible for negotiated cell to avoid
>         the collision with AutoRxCell.
> 
>      Presumably this factors in to the recommendation to have the three
>      listed slotframes use the same length, but mentioning it explicitly
>      (whether here or where the recommendation is made) might be nice.
> 
>    > it is mentioned before as:  The same slotframe length for Slotframe 0, 1
>    and 2 is RECOMMENDED.

I agree that it is mentioned before.  My point is that we have the
recommendation to use the same slotframe length (Section 2) in a different
place from discussion about why having the same slotframe length is
beneficial (here), so the reader has to remember and make the connection.
If we mention both the recommendation and the reason for the recommendation
in the same place, the reader has to do less work.

>      Section 4
> 
>         network.  Alternative behaviors may involved, for example, when
>         alternative security solution is used for the network.  Section 4.1
> 
>      nit: singular/plural mismatch "behaviors"/"solution is used"
> 
>    > will be fixed in next revision. 
> 
>      Section 4.1
> 
>         A node implementing MSF SHOULD implement the Minimal Security
>         Framework for 6TiSCH [I-D.ietf-6tisch-minimal-security].  As a
> 
>      Didn't this get renamed to CoJP?
> 
>    > Thanks for pointing it out! Will update in next revision. 
> 
>      Section 4.2
> 
>      I a little bit wonder if there is a better description than "available
>      frequencies" but don't have one to offer.
> 
>    > The frequency to be selected is randomly picked. There is no one that is
>    preferred comparing to others.

I was not sure if this was "available" in the sense of "my hardware radio
has a list of frequencies that it can tune to", "the channels that my
network cycles amongst", or " the channels not already scheduled at this
time".

>      Section 4.3
> 
>         While the exact behavior is implementation-specific, it is
>         RECOMMENDED that after having received the first EB, a node keeps
>         listen for at most MAX_EB_DELAY seconds until it has received EBs
>         from NUM_NEIGHBOURS_TO_WAIT distinct neighbors, which is defined in
>         [RFC8180].
> 
>      nit(?): this phrasing implies that only NUM_NEIGHBOURS_TO_WAIT is
>      defined in RFC 8180, but MAX_EB_DELAY is also defined there.
> 
>    > The "which" here indicates the whole behavior. 
>    > It will be rephrased  as "This behavior is defined in [RFC8180]".
> 
>      not-nit: this phrasing is ambiguous as to whether one of MAX_EB_DELAY
>      and NUM_NEIGHBOURS_TO_WAIT is sufficient to move to the next step or
>      whether both are required.
> 
>    > The two are actually explaining two situations: 
>    > 1 .keep listening, when EBs from NUM_NEIGHBOURS_TO_WAIT are received, it
>    stops listening and synchronize to one of the neighbors  .
>    > 2. if after  MAX_EB_DELAY timeout,  EBs are received from number of
>    neighbors <  NUM_NEIGHBOURS_TO_WAIT, it stops listening as well and
>    synchronize to the neighbor or one of neighbors.

Okay.  I would suggest to s/at most MAX_EB_DELAY seconds until it has
received/at most MAX_EB_DELAY seconds or until it has received/, then.

Also, I se that the -14 has changed this from RECOMMENDED to MAY; my naive
expectation would be that it is still RECOMMENDED, but I don't remember if
another reviewer's comment prompted this change.

>      Section 4.4
> 
>         After selected a JP, a node generates a Join Request and installs an
>         AutoTxCell to the JP.  The Join Request is then sent by the pledge to
>         its JP over the AutoTxCell.  The AutoTxCell is removed by the pledge
> 
>      editorial: I'd suggest s/its JP/its selected JP/
> 
>    > Will be updated in next revision.  
> 
>         Response is sent out.  The pledge receives the Join Response from its
>         AutoRxCell, thereby learns the keying material used in the network,
>         as well as other configurations, and becomes a "joined node".
> 
>      nit: maybe "other configuration values" or "other configuration
>      settings"?
> 
>    > Will be updated in next revision. 
> 
>      Section 4.6
> 
>         Once it has selected a routing parent, the joined node MUST generate
>         a 6P ADD Request and install an AutoTxCell to that parent.  The 6P
>         ADD Request is sent out through the AutoTxCell with the following
>         fields:
> 
>         *  CellOptions: set to TX=1,RX=0,SHARED=0
>         *  NumCells: set to 1
>         *  CellList: at least 5 cells, chosen according to Section 8
> 
>      Is this listing describing the contents of the ADD request or the
>      AuthTxCell used to send it?  (I presume the former, in which case I
>      suggest to use "containing" or similar in preference to "with".)
> 
>    > yes, it is the former. Will update in the next revision. 
> 
>      Section 5.1
> 
>         The goal of MSF is to manage the communication schedule in the 6TiSCH
>         schedule in a distributed manner.  For a node, this translates into
>         monitoring the current usage of the cells it has to the selected
>         parent:
> 
>      Is this goal strictly limited to traffic "to the selected parent" vs.
>      all traffic?
> 
>    > Theoretically MSF does not limit to traffic to the selected parent but
>    any neighbors. 
>    > However, all the experiment result with MSF we have made to verify it is
>    to the selected parent only.
>    > Hence, We state here "the selected parent" only.

I think the stated scope of applicability of the specification is not
limited to just the experiments that have been performed so far, so there
does not seem much justification for saying that "this translates into
monitoring [...] to the selected parent".

>         *  If the node determines that the number of link-layer frames it is
>            attempting to exchange with the selected parent per unit of time
>            is larger than the capacity offered by the TSCH negotiated cells
>            it has scheduled with it, the node issues a 6P ADD command to that
>            parent to add cells to the TSCH schedule.
>         *  If the traffic is lower than the capacity, the node issues a 6P
>            DELETE command to that parent to delete cells from the TSCH
>            schedule.
> 
>      As written, this would potentially lead to oscillation when demand is
>      basically at capacity, due to the quantization of capacity.  Perhaps
>      some provisioning for hysteresis is appropriate?
> 
>    > Yes, if referring to the MSF cell usage algorithm in the following, more
>    cell are scheduled than what needed.
>    > Here is to explain the basic concept of this scheduling function. 
> 
>         The cell option of cells listed in CellList in 6P Request frame
>         SHOULD be either Tx=1 only or Rx=1 only.  Both NumCellsElapsed and
>         NumCellsUsed counters can be used to both type of negotiated cells.
> 
>      Would this be more clear as "(Tx=1,Rx=0) or (Tx=0,Rx=1)"?
> 
>    > Yes it's more clear. Will update in next revision 
> 
>         *  NumCellsElapsed is incremented by exactly 1 when the current cell
>            is AutoRxCell.
> 
>      This holds for all peers/parents we're keeping counters for, so the
>      AutoRxCell can get "double counted"?
> 
>    > one pair of counters is associated to one neighbor. 
>    > If there is multiple parents, then there are two NumCellsElapsed
>    counters, one for each of the parents.

I agree.  It seems that when an AutoRxCell occurs, the NumCellsElapsed
counter will increment in all of the counters (i.e., for each parent).
This is in some sense "double counting" that cell.  I'm not sure whether
this has a negative effect on the usefulness of the statistics, especially
in the (unlikely) case when there are a large number of parents.

>         In case that a node booted or disappeared from the network, the cell
>         reserved at the selected parent may be kept in the schedule forever.
>         A clean-up mechanism MUST be provided to resolve this issue.  The
>         clean-up mechanism is implementation-specific.  It could either be a
>         periodic polling to the neighbors the nodes have negotiated cells
>         with, or monitoring the activities on those cells.  The goal is to
>         confirm those negotiated cells are not used anymore by the associated
>         neighbors and remove them from the schedule.
> 
>      I'm not sure that "monitoring the activities on those cells" is safe
>      with the current level of specification; if a node negotiates a 6P
>      transmit cell to a parent and uses it only sparingly, with the parent
>      eventually reclaiming it due to inactivity, I don't see a mechanism by
>      which the node will reliably discover the negotiated cell to be
>      nonfunctional and fall back to (e.g.) the corresponding AutoTxCell.  It
>      may be most prudent to just not mention that as an example (a "periodic
>      polling" procedure does not seem to have the same potential for
>      information skew)
> 
>    > Thanks for the comment! I will just remove that sentence from this
>    paragraph.
> 
>      Section 5.3
> 
>         schedule is executed and the node sends frames to that parent.  When
>         NumTx reaches MAX_NUMTX, both NumTx and NumTxAck MUST be divided by
>         2.  For example, when MAX_NUMTX is set to 256, from NumTx=255 and
>         NumTxAck=127, the counters become NumTx=128 and NumTxAck=64 if one
>         frame is sent to the parent with an Acknowledgment received.  This
>         operation does not change the value of the PDR, but allows the
>         counters to keep incrementing.  The value of MAX_NUMTX is
>         implementation-specific.
> 
>      Does MAX_NUMTX need to be a power of two (to avoid errors when the
>      division occurs)?
> 
>    > Agree, it's better to be a power of two. Will state in the text. 
> 
>         4.  For any other cell, it compares its PDR against that of the cell
>             with the highest PDR.  If the difference is larger than
>             RELOCATE_PDRTHRES, it triggers the relocation of that cell using
>             a 6P RELOCATE command.
> 
>      The recommended RELOCATE_PDRTHRES is given as "50 %".  Is this
>      "difference" performed as a subtraction (so that if the highest PDR is
>      less than 50%, no cells can ever be relocated) or a ratio (a PDR that's
>      half than the maximum PDR or smaller will trigger relocation)?
> 
>    > This is "difference" performed as a subtraction. 
>    > Yes it's sure if highest PDR is less than 50%, no cell can be
>    relocated. 
>    > But it can't tell those cells are link quality bad or because of
>    collision. 
>    > If all cell PDR is so low, highly chance the routing will be affected
>    and switch to another neighbor.
>    > In experiments,  we never encounter highest PDR less 50% all time.

I strongly suggest changing the wording to be clear that it is the
"subtraction" interpretation that's desired.  Perhaps "If the difference
(PDR_highest - PDR_thiscell) is larger than RELOCATE_PDRTHRES"?

>      Section 7
> 
>      Maybe reference Section 17.1 where the allocation will occur?
> 
>    > Will add this in next revision. 
> 
>      Section 8
> 
>         *  The slotOffset of a cell in the CellList SHOULD be randomly and
>            uniformly chosen among all the slotOffset values that satisfy the
>            restrictions above.
>         *  The channelOffset of a cell in the CellList SHOULD be randomly and
>            uniformly chosen in [0..numFrequencies], where numFrequencies
>            represents the number of frequencies a node can communicate on.
> 
>      Do these random selections need to be independent from each other?  (I
>      note that the selection for the autonomous cells are not.)
> 
>    > For channelOffset, they are independently random selected.
>    > For slotOffset, since once a slotOffset is picked, the next time to
>    select slotOffset, that one can't be selected.
>    > This is indicated in the text already as "chosen among all the
>    slotOffset values that satisfy the
>          restrictions above"

I was trying to get at a different point, I think: is there expected to be
correlation between the actual slotOffset and channelOffset values for a
given cell, as opposed to having them be completely independent selections?
In the case of the autonomous cells, since we use the same hash function
and input to the hash function for selecting both values, there is a
correlation between the two values.  Such a correlation might in theory
result in occasional problematic scenarios that are very problematic,
whereas if the channel and slot offsets are chosen independently, such
"very problematic" scenarios are expected to be much less common (based on
the obvious/naive mathematical model).

> 
>      Section 9
> 
>      Is there a reference for these three parameters (MAXBE, MAXRETRIES,
>      SLOTFRAME_LENGTH)?  SLOTFRAME_LENGTH seems new in this document and is
>      listed in the table in Section 14, but the other two are not listed
>      there.
> 
>    > The MAXBE, MAXRETRIES are defined in IEEE802.15.4 standard. 
>    > Their values various on different network systems, according to the size
>    and density.
>    > Hence we didn't give a recommended value in this draft.

Ah, I see now.  It might be helpful to note somewhere that MAXBE and
MAXRETRIES are defined by 802.15.4, though I expect most readers of this
document to already be at least somewhat familiar with 802.15.4.

>      Section 14
> 
>      Why is MAX_NUMTX not listed in the table?

Should MAX_NUMTX be listed in the table?

>      Can we really give a recommended NUM_CH_OFFSET value, since this is in
>      effect dependent on the number of channels available?
> 
>    > We give a recommended value as this is a parameter used in the SAX
>    hashing algorithm. 
>    >  This doesn't provide implementer to use other values.
> 
>      KA_PERIOD is defined but not used elsewhere in the document.
> 
>    > This is a legacy of MSF draft, which we forgot to remove. Will update in
>    next revision 
> 
>      What are the considerations in using a power of 10 vs. a power of 2 as
>      MAX_NUM_CELLS?
> 
>    > We pick power of 10 simply because it's easy for reader to understand.
>    Nothing specific.
>    > There is no restriction to use power of 2, such as 128. 
> 
>      Section 16
> 
>         MSF defines a series of "rules" for the node to follow.  It triggers
>         several actions, that are carried out by the protocols defined in the
>         following specifications: the Minimal IPv6 over the TSCH Mode of IEEE
>         802.15.4e (6TiSCH) Configuration [RFC8180], the 6TiSCH Operation
> 
>      I'd suggest a brief note that the security considerations of those
>      protocols continue to apply (even though it ought to be obvious);
>      reading them could help a reader understand the behavior of this
>      document as well.
> 
>         Sublayer Protocol (6P) [RFC8480], and the Minimal Security Framework
>         for 6TiSCH [I-D.ietf-6tisch-minimal-security].  In particular, MSF
> 
>      [CoJP again]
> 
>         prevent it from receiving the join response.  This situation should
>         be detected through the absence of a particular node from the network
>         and handled by the network administrator through out-of-band means,
>         e.g. by moving the node outside the radio range of the attacker.
> 
>      "the radio range of the attacker" is not exactly a fixed constant ...
>      attackers are not in general bound by legal limits and can increase Tx
>      power subject only to their equipment and budget.
> 
>    > Yes, I agree. For action, I will simply remove the example. 
> 
>         MSF adapts to traffics containing packets from IP layer.  It is
>         possible that the IP packet has a non-zero DSCP (Diffserv Code Point
>         [RFC2597]) value in its IPv6 header.  The decision whether to hand
> 
>      RFC 2597 is talking more about specifically assured forwarding PHB
>      groups
>      than "DSCP codepoint"s per se.
> 
>    > Yes, RFC2472 is the one defined the DSCP codepoint. Will update the
>    reference. 

This text was also changed to fix a pluralization nit, but over-corrected.
Please s/containing packet/containing packets/.

>      Section 18.1
> 
>      RFC 6206 seems to only be used as an example (Trickle), and could
>      probably be informative.
> 
>      RFC 8505 might also not need to be normative.
> 
>    > They will be moved to informative reference section 
> 
>      Appendix B
> 
>         In MSF, the T is replaced by the length slotframe 1.  String s is
> 
>      nit: "length of"
> 
>         2.  sum the value of L_shift(h,l_bit), R_shift(h,r_bit) and ci
> 
>      Is this addition performed in "infinite precision" integer arithmetic or
>      limited to the output width of h, e.g., by modular division?  (It's not
>      clear to me whether this is the role T plays or not.)
> 
>    > What I know here the sum is used by most of the classic string hashing
>    functions.
>    > The deep reason why using sum here is more mathematics question, which I
>    am not an expertise on it:-(
>    > The T here used for modular is to make sure the result fall into the
>    range of slotframe ( to pick slotOffset), or available frequencies ( to
>    pick channelOffset).

It sounds like this sum is performed modulo T as well?  (I am genuinely not
sure.)  I'm also not sure whether it's worth mentioning that fact; perhaps
just leaving the text as-is is best.

>         8.  assign the result of Step 5 to h
> 
>      The value from step 5 *is* h, so taken literally this says "assign h to
>      h" and is not needed.
> 
>    >  Yes, this step is removed in next revision.
>    Thanks so much for your comments. Will prepare revision 13 to resolve
>    them!

Thank you for the updates!

I will await further clarification about whether changes to
draft-ietf-6tisch-minimal-security are required in order for this document
to realistically be able to meet the requirements from that document.

-Ben