Re: [Roll] Way forward for draft-clausen-lln-rpl-experiences

Philip Levis <pal@cs.stanford.edu> Wed, 06 June 2012 14:40 UTC

Return-Path: <pal@cs.stanford.edu>
X-Original-To: roll@ietfa.amsl.com
Delivered-To: roll@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id CF8A321F879F for <roll@ietfa.amsl.com>; Wed, 6 Jun 2012 07:40:53 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.599
X-Spam-Level:
X-Spam-Status: No, score=-6.599 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id w24aBTgqvYLB for <roll@ietfa.amsl.com>; Wed, 6 Jun 2012 07:40:52 -0700 (PDT)
Received: from cs-smtp-3.Stanford.EDU (cs-smtp-3.Stanford.EDU [171.64.64.27]) by ietfa.amsl.com (Postfix) with ESMTP id C48C621F873C for <roll@ietf.org>; Wed, 6 Jun 2012 07:40:52 -0700 (PDT)
Received: from 23-24-194-1-static.hfc.comcastbusiness.net ([23.24.194.1] helo=[192.168.1.106]) by cs-smtp-3.Stanford.EDU with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.76) (envelope-from <pal@cs.stanford.edu>) id 1ScHPm-0002gZ-GJ; Wed, 06 Jun 2012 07:40:51 -0700
Mime-Version: 1.0 (Apple Message framework v1257)
Content-Type: text/plain; charset="windows-1252"
From: Philip Levis <pal@cs.stanford.edu>
In-Reply-To: <59F4B025-990D-4738-8424-D8078EF9FB7C@ThomasClausen.org>
Date: Wed, 06 Jun 2012 07:25:50 -0700
Content-Transfer-Encoding: quoted-printable
Message-Id: <CF38B18D-1848-454A-A1C3-A5CDC437F52D@cs.stanford.edu>
References: <258D7E2F-F0C7-49EA-B831-81070C86EDB3@thomasclausen.org> <2257A578-B2DF-4145-8393-9BB5D7E1CFBD@cisco.com> <2225986E-992E-43C7-B0CA-9CDA91CE1F3A@thomasclausen.org> <B1B81482-0F7E-4BCE-BBA7-B21949E3C16C@cisco.com> <0958556A-7D9A-4E8B-8091-1D6EC0B813B4@thomasclausen.org> <ACBA7834-F4A1-4D9C-80D6-E76C793A6770@cisco.com> <91E71E23-8797-4C70-A1F8-1CE64BD4ED39@thomasclausen.org> <1D6FEB49-CB62-4FFA-9E34-3FEF82DB644C@cisco.com> <BE51553F-67BE-4652-A8E8-9654BF953A96@thomasclausen.org> <78FB3B50-3150-4729-A089-D9EAF0B02BB6@cs.stanford.edu> <59F4B025-990D-4738-8424-D8078EF9FB7C@ThomasClausen.org>
To: Thomas Heide Clausen <IETF@ThomasClausen.org>
X-Mailer: Apple Mail (2.1257)
X-Scan-Signature: bf2d39bfc2650c7e8471b46ddb5f48c6
Cc: roll WG <roll@ietf.org>, Michael Richardson <mcr@sandelman.ca>
Subject: Re: [Roll] Way forward for draft-clausen-lln-rpl-experiences
X-BeenThere: roll@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Routing Over Low power and Lossy networks <roll.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/roll>, <mailto:roll-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/roll>
List-Post: <mailto:roll@ietf.org>
List-Help: <mailto:roll-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/roll>, <mailto:roll-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Jun 2012 14:40:53 -0000

Responses inline.

On Jun 3, 2012, at 5:27 PM, Thomas Heide Clausen wrote:

>> 10 assumes that a node only uses DIO reception to allow a parent; the specification is pretty clear that you should check the parent is usable (section 1.1). You're taking a bad implementation decision and assuming there isn't another way to do things. 
>> 
> 
> That is definitely a fair point to bring up.
> 
> We agree that various external mechanisms can contribute to make a smart, or even very smart, parent selection - but, as section 1.1 points out, as bi-directional links are a minimum requirement, the fact that RPL doesn't specify a minimum, basic mechanism for ensuring that is highly problematic. NUD is the only mechanism explicitly called out in RPL (section 8.2, 9.8, for example), which is why the observations in section 10 concern RPL when using NUD (and not some other "good (but unspecified) design decision").
> 
> We do, however, understand (but correct us if we are wrong) that you agree that the mechanisms for this problem, as specified in the RPL RFC, are insufficient for building and operating networks. That is exactly the point that we are trying to make also.
> 
> Where we may differ in opinion (do we?)  is, that we think that the IETF should specify such mechanisms: in order to make interoperable implementations, some sort of agreement on what a "good implementation decision" is required, in particular if such requires messages being exchanged.

In the abstract, I agree -- the IETF should specify such mechanisms (assuming their specification is needed for interoperability). However, I think you're taking a dubious philosophical position which is contrary to all other IETF specifications. No RFC specifies every single detail of implementation in its entirety. There seems to be a pull for this from the industrial side, and it's completely mistaken. If we specify everything immediately, in some cases we will be forced to make decisions with very limited information and some of those decisions will be bad. Take a look at the early ZigBee standards. An RFC is not an implementer's guide: it's a tension between clarity and flexibility, so that we can continue to produce interoperable implementations with increasingly better techniques and algorithms. TCP is a canonical example of this.

In that way, I think your draft, as currently written, is disingenuous. I'm sure I could point out plenty of similar examples in OLSR, or OLSRv2, or almost any other IETF specification of non-trivial complexity. This concern or criticism is not specific to RFC6550, but to the RFC process and the notion of an RFC itself. I'm open to having such a debate, but my guess is that your position would collapse once you consider the implications of applying it generally. 


>> For 11, there are implementations of RPL smaller than 50kB; they do not implement every feature, but that was kind of the point of the protocol, that it could be implemented on a sliding scale of implementation complexity. The TinyOS implementation, for example, is, I believe, ~20kB, less than half the size. You don't report what architecture the 50kB is for, clearly it would be more for a 32-bit than a 16-bit architecture. 
> 
> Phil, would you clarify this for us? 
> 
> As the specification reads, there is only the MOP flag that can specify that which the DODAG root expects from the routers in the network; if a router receives a MOP flag that it doesn't support, it won't be able to join as anything as a leaf.
> 
> There are no negotiations, or discovery of capabilities, we think, nor is it clear what exactly the "sliding scale"  you refer to is. (sure, we can see the MOP-scale: DIO only, DAO-nonstoring, DAO-storing - but beyond that, what do we miss?)

Leaf-only operation. Whether or not you'll float. 

> 
> Short of heavy-handed administrative configuration, the only realistic way of ensuring interoperability of devices from different vendors is to support the full spec.
> 
> But, don't just take our words for complexity issues:
> 
> 	http://comments.gmane.org/gmane.os.contiki.devel/12102
> 
> The implementation that we cite (URL is in the draft) is the ContikiRPL implementation, which only supports storing mode, and which consumes these ~50 KB (in the above link, they talk about 44KB being their code size). This on a 16bit MSP 430 tmote sky.

Right -- what fraction of the 44kB is RPL, and what fraction is the Contiki kernel? Saying 44kB is the size of RPL is similar to saying that xml2rfc is >2GB because Windows is >2GB. I realize that my analogy is far more extreme than the one above, but it seems that you're playing fast and loose with numbers in a way to support your argument and so again being disingenuous. That erodes my confidence in your claims.


>> For 12, "implementations may exhibit a bad performance if not carefully implemented."  I think it is safe to say this is true for almost ANY protocol.
> 
> We think we get your point, and this phrase is indeed a little like "You can write Fortran77 in any language" ;)
> 
>> A specification is not intended to be a complete statement of efficient implementation, otherwise you give little latitude to future improvements and good engineering.
> 
> This is very true, and we agree that a specification should (necessarily) be indicating the most efficient way of doing something. 
> 
> Alas, for example, for DAO messages, as the diffusion mechanism is not well specified, this may lead to  not just inefficiency, but actual traffic loss by way of connectivity not being accurately reflected in the information sets on RPL, in very mechanical way. 
> 
> It would, of course, be perfectly fine to specify an inefficient - but working - method, and allow future improvements and good engineering to better that. Unfortunately, in this case, no working diffusion method is given (i.e., it may lead to the above issues).
> 
> In addition, it is not just about efficient implementation from one vendor that could mitigate these problems, but rather about allowing for interoperable implementations in the same routing domain.

I don't think anyone expects an RFC to be perfectly precise. This is one reason why there are bake-offs. It could be that we take 4 RPL implementations, each of which has a slightly different DAO diffusion approach, and they all work together fine. Or they don't -- at which point we, as a working group, have found a specific issue through practice, and can write an RFC to better specify the solution.



>> For 13, this assumes that a wireless network has a stable topology which the protocol can converge to. Wireless networks are often NOT stable: one cannot expect a protocol to converge on a dynamic graph.
> 
> That it absolutely correct. This observation absolutely applies to the testbed, in which these trickle experiments were conducted.
> 
> The point we want to make here is, that contrary to what we observe in simulations and modeling, when deploying trickle in wireless networks, one should expect more control traffic than seen in theory and simulations. We may be able to do an editorial pass to make this section clearer -- especially as it seems that we do agree on the fundamental property that's being discussed?

If your point is that simulation and theoretical analysis of wireless networks are typically far from reality and therefore generally not very elucidating for real-world behavior, I'd agree with that. But I think there have been a bunch of experimental studies of how Trickle behaves in a dynamic and challenging network. The CTP paper, for example. One of things we found in CTP is that the routing topology is most dynamic when the network first starts up. Over time, if implemented carefully, nodes settle on stable rather than dynamic links and the control traffic continually decreases over the period of hours and even days. Fundamentally, nodes have to probe the RF environment to figure out which links are good and stable, and this takes time. There are some shortcuts (e.g., using physical layer information), but including something PHY-specific in an RFC is a big no-no.

> 
>> 14 is similarly confused about what a wireless network looks like. How can the state of a distributed system based on a dynamic topology be "consistent?" I think this is a fundamental misunderstanding of how the network works.
> 
> We are a little unclear on what you mean here, or what you suggest that we disagree on. Is it the use of "inconsistent" in the penultimate paragraph of  page 24? 
> 

Yes.

Phil