[manet] Ready for WGLC: Advancing draft-ietf-manet-olsrv2-dat-metric

Thomas Clausen <thomas@thomasclausen.org> Mon, 28 July 2014 16:35 UTC

Return-Path: <thomas@thomasclausen.org>
X-Original-To: manet@ietfa.amsl.com
Delivered-To: manet@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9D7D81A03E7 for <manet@ietfa.amsl.com>; Mon, 28 Jul 2014 09:35:51 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bmmneNwS02xe for <manet@ietfa.amsl.com>; Mon, 28 Jul 2014 09:35:43 -0700 (PDT)
Received: from maila2.tigertech.net (maila2.tigertech.net [208.80.4.152]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 2E7A61A039A for <manet@ietf.org>; Mon, 28 Jul 2014 09:35:28 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by maila2.tigertech.net (Postfix) with ESMTP id 1010D24759E; Mon, 28 Jul 2014 09:35:28 -0700 (PDT)
X-Virus-Scanned: Debian amavisd-new at maila2.tigertech.net
Received: from [192.168.147.111] (mtg91-1-82-227-24-173.fbx.proxad.net [82.227.24.173]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) by maila2.tigertech.net (Postfix) with ESMTPSA id BF95A247554; Mon, 28 Jul 2014 09:35:24 -0700 (PDT)
From: Thomas Clausen <thomas@thomasclausen.org>
Content-Type: multipart/alternative; boundary="Apple-Mail=_494A7F44-B763-4BF7-AE9A-926029B142F7"
Date: Mon, 28 Jul 2014 18:35:21 +0200
Message-Id: <D45879AA-544A-4EB1-86F8-7DC8C7C1065E@thomasclausen.org>
To: manet <manet@ietf.org>, "<manet-chairs@tools.ietf.org>" <manet-chairs@tools.ietf.org>, manet-ads <manet-ads@tools.ietf.org>
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
X-Mailer: Apple Mail (2.1878.6)
Archived-At: http://mailarchive.ietf.org/arch/msg/manet/v4NbwZj8n7B5LYnaUMRMBCDWItE
Cc: "Dr. Emmanuel Baccelli" <Emmanuel.Baccelli@inria.fr>
Subject: [manet] Ready for WGLC: Advancing draft-ietf-manet-olsrv2-dat-metric
X-BeenThere: manet@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Mobile Ad-hoc Networks <manet.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/manet>, <mailto:manet-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/manet/>
List-Post: <mailto:manet@ietf.org>
List-Help: <mailto:manet-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/manet>, <mailto:manet-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 28 Jul 2014 16:35:52 -0000

Dear WG Chairs, ADs,
Dear Henning, Emmanuel, all,

As one of the authors indicated by email recently, he believes that draft-ietf-manet-olsrv2-dat-metric is ready for WGLC. 

As a reminder, this document aims for publication as an Experimental RFC.

I have, therefore, reviewed the document carefully. In my opinion, the author is right — in my opinion this document is ready for WGLC, and my comments can (at the authors discretion) be addressed either by spinning a new revision now, or during the WGLC.

Moreover, I believe that it is a highly important document for the WG to produce. Currently, hop-count metrics are all that’s specified — although experience shows that they have well-known issues, and that OLSRv2 therefore supports (of course) and encourages other metric types to be developed.

Is this metric “the one and only, be-all-end-all” of metrics? Probably not. But, it has the merits of being one that has been developed through extensive testing in OLSRv1 (RFC3626). That probably means that it’s also well applicable for similar deployments with OLSRv2, given the (algorithmic) similarity between OLSRv1 and OLSRv2 — experiments will tell us, and so publication as an Experimental RFC seems appropriate

With that being said, I previously indicated that I have a few nits identified, and the lack of an “This is the experiment(s) that publishing this RFC will enable” is one. Our benevolent ADs are “strongly suggesting” such sections in Experimental RFC, and I think that we as a WG should try to not intentionally irritate them by sending documents forward which do not have that.

I think that a subsection to the introduction would do quite nicely (see how we did in OLSRv2-MT, or how they did in RFC6971). Other than the “will this work as well in OLSRv2 as it did in OLSRv1” experiment, I think that there are a lots of interesting experiments possible. Through the below, I will try to point to some thing that jumps out at me when I read the document, and which I think could benefit from a separate section and discussion.

The challenge, IMO, would be to define a set of “experiments necessary to determine if taking this to PS, or to declaring this metric historic” — I would *love* to have a better metric than hop-count standardized, but I do not think that we’re quite there yet. So perhaps some back-and-forth there would be a good idea, as to what experiment(s) would allow us to determine this? 

I have handful of issues and a handful of nits, that I expect the authors will consider along with any other WGLC comments they may receive. Of course, if a new version is spun before WGLC is requested, then I’ll gladly review that, also. Point being: let’s get a call started on this document, none of my nits and issues are of the sort that should be blocking issuing a WGLC.

Issues:
	
	Placeholder for the “Needs a ‘The Experiments’ Section”, which was discussed above ;)

	It appears evident when reading the introduction and applicability statement,
	that this metric surely must need some “hooks” into the data forwarding
	path, or into the L2 data structures — as well as, as you indicate, needs to learn 
	something about the static properties of the L2 (link rate). For the latter, you
	state that DLEP can be used, as well as static configuration.  For the former,
	one is left with baited breath until section 9.3, wherein it is kinda-sorta
	hinted that one doesn’t, the RFC5444 packets exchanged suffice.
	Unless you plan on thus pulling an Agatha Christie on the reader, perhaps
	something regarding this could be stated in the applicability statement?

	The document uses the ill-defined term “node” — which is incorrect in a
	specification. Please harmonize to the use of “Router” or “OLSRv2 Router”
	as is done in RFC7181, both for correctness and for coherence with 7181
	and related specifications. (And, “node” occurs only thrice, so should be
	fairly easy to replace ;) )

	I am not sure if the term “mesh networks” is one that we’re actually using
	a lot - I believe that we’ve said “OLSRv2 routed network”, to be precise,
	in other documents. Technically, there’s nothing wrong with “mesh networks”
	except that many people do (incorrectly) think “layer 2” when they hear that 
	term. I would suggest to do what we can to avoid confusion ;)

	Would you mind, in the References, to replace a reference:
		OLD:
	   [OLSRV2]   Clausen, T., Jacquet, P., and C. Dearlove, "The Optimized
        	      Link State Routing Protocol version 2", draft-ietf-manet-
	              olsrv2-19 , March 2013.

		NEW:
	   [RFC7181]   Clausen, T., Jacquet, P., and C. Dearlove, "The Optimized
        	      Link State Routing Protocol version 2", RFC7181 , March 2013.

Nits:
	Introduction, 1st paragraph is oddly phrased, notably as RFC3626 was not a “standard” but an “Experimental RFC":

		OLD:
		   One of the major shortcomings of OLSR [RFC3626] is the missing of a
		   link cost metric between mesh nodes.  Operational experience with
		   mesh networks gathered since the standardization of OLSR has revealed
		   that wireless networks links can have highly variable and
		   heterogeneous properties.  This makes a hopcount metric insufficient
		   for effective mesh routing.

		NEW:
		   One of the shortcomings of OLSR [RFC3626] is the lack of a granular
		   link cost metric between mesh nodes.  Operational experience with
		   mesh networks gathered since the publication of  OLSR [RFC3626] has revealed
		   that wireless networks links can have highly variable and
		   heterogeneous properties.  This makes a hopcount metric insufficient
		   for effective mesh routing.

	Ultimate paragraph in introduction:
		OLD:
		   This document describes a Directional Airtime routing metric for
		   OLSRv2, a successor of the OLSR.org routing metric for [RFC3626].  It
		   takes both the loss rate and the link speed into account to provide a
		   more accurate picture of the mesh network links.

		NEW:
		   This document describes a Directional Airtime routing metric for
		   OLSRv2, a successor to the ETX-derived OLSR.org routing metric for [RFC3626].  It
		   takes both the loss rate and the link speed into account to provide a
		   more accurate picture of the links within the network.

	Terminology:
		For when discussing “UNDEFINED”, please specify that this is used only for the
		descriptions of the processing in this document, and that the “value for UNDEFINED”
		does not need to be agreed upon across routers in a deployment?
		If that is the case, then please remove the last sentence “Might be -1 for this protocol”

	Applicability Statement, last sentence in the penultimate paragraph, is it possible to be 
		a little more careful that the example looks like an example:

		OLD:
		   It might be necessary to increase the
		   data-rate of the multicast transmissions, e.g. set the multicast
		   data-rate to 6 MBit/s if you use IEEE 802.11g only.

		NEW:
		   It might, for example in IEEE 802.11g, be necessary to increase the
		   data-rate of the multicast transmissions, e.g. set the multicast
		   data-rate to 6 MBit/s.

	Applicability statement, ultimate paragraph, I have a little bit of a hard time parsing:
		   The metric can only handle a certain range of packet loss and unicast
		   data-rate.  Maximum packet loss is "ETX 8" (1 of 8 packets is
		   successfully sent to the receiver, without link layer
		   retransmissions), the unicast data-rate can be between 1 kBit/s and 2
		   GBit/s. The metric has been designed for data-rates of 1 MBit/s and
		   hundreds of MBit/s.

		Specifically,”The metric” —  that corresponds to what? "The DAT metric" (if
		so, then I suggest saying that). A little bit later, you talk about “ETX 8” —
		which to someone versed in over-the-air metrics isn’t gibberish, but which
		to someone who’s not, most definitely is.  The notation “ETX 8” is
		not introduced - can you rephrase that? Perhaps by simply removing
		“ETX 8” and expanding on the parenthesis?

	Probably a nit, but isn’t the title for section 4 wrong (I think that it is “rationalE”)?

	Section 4, first paragraph:

		OLD:
		   The Directional Airtime Metric has been inspired by the publications
		   on the ETX [MOBICOM03] and ETT [MOBICOM04] metric, but has several
		   key differences.

		NEW:
		   The Directional Airtime Metric has been inspired by the publications
		   on the ETX [MOBICOM03] and ETT [MOBICOM04] metric, but differs from
 		   both of these in several ways.

	Section 4, general:
		Thank you for this section, this rationale is very helpful and educative, and
		establishes the relationship to RFC7181 quite well.

	Section 6, I have some general questions:
		Any of these need to be exchanged between two routers, over the air? 
		I am asking because either it is too specific, or too vague?
		“Floating point number between 1,0 and 2.0, large enough to….”
		

	Section 6.1 “Recommended Values”
		This almost reads as if it could be part of the “The Experiment” section, thus:

			“In the community networks, where this has been deployed, which
			 includes […], [….], and [….], the values below were those which
			 experiments showed to work satisfactory. Understanding that these
			 networks represent one type of OLSRv2 deployments, and may not
			 be representative for all possible OLSRv2 deployments, nor even for
			 all possible community network deployments, this leads to two questions,
			 where further experimentation is required:

				o	Is this set of parameters generally applicable to Community 
					Mesh Networks, or are there adaptations required?

				o	Is this set of parameters generally applicable to other OLSRv2
					deployments, or are there different parameter sets which apply?
			“

			Obviously, another part of the experiment would be “…and, can we come 
			up with an even better metric that doesn’t need parametrization at all, yet
			would work everywhere?” — although that may be one of those pie-in-the-sky
			things.

	Section 7 starts a bit oddly, how about:
			OLD:
			   This specification defines the following constants, which cannot be
			   changed without making the metric outputs incomparable:

			NEW:
			   This specification defines two constants, agreement on which is
			   required, from all the OLSRv2 routers participating in the same deployment. 
			   Two routers which use different values for these constants will not be able
			   to generate metric values which can be correctly interpreted by both. These
			   constants are:

		But, I also want to say, that it is awesome that you do point this out so clearly.
		Is there any doubt as to if those constants are “universal”, or is there some
		experiment to be done here? [My gut-feeling tells me “no”, but my gut has 
		been wrong before…]

	Section 8, first paragraph:

		I would appreciate if you would point out if you need to update the Information
		Bases in RFC6130, specifically if you need to update elements other than those
		which you specify in this document.

		I would also appreciate a pointer to if you use (but not update) information in
		the information bases in RFC6130.

		Finally, I would like to suggest:

			OLD:
			  This specification extends the Link Set Tuples of the Interface
			   Information Base, as defined in [RFC6130] section 7.1, by the
			   following additional elements for each link tuple when being used
			   with this metric:

			NEW:
			  This specification extends the Link Set of the Interface
			   Information Base, as defined in [RFC6130] section 7.1, by adding the
			   following elements to each link tuple:

			And, with this, I would add L_DAT_last_pkt_seqno to the list on same
			level as the other 6 elements. I know that it is not always that it is used,
			but with the caveat you give (which you should keep in a modified form)
			that can be left for an implementer to figure out. The extension in this
			document needs it, so it’s something that this doc imposes on 6130.

	Section 9.2, suggest that the title be rephrased to something like
		“Minimum Requirements to an OLSRv2 Implementation for using this metric”
		I would actually also like to see this piece of information included in
		the applicability statement.
		
	Section 9.2, what would happen, would there be a possible back-up mode in
		case no INTERVAL_TIME TLV is included? Worst case, could you not
		make an assumption based on a VALIDITY_TIME TLV present? How
		bad would that be? Do we know? Or, is this something worth experimenting
		with (to the experiments section)? Right now, that particular experiment
		is not enabled by this document (as it requires INTERVAL_TIME), so
		perhaps it is “known to be FUBAR” — in which case, actually, stating
		that (and why) in the experiments section would be awesome.

	Section 9.3, when I read this my initial thought (from reading the paragraph between
		9.3 and 9.3.1) that “Oh, ok, so this tells me what to do if my implementation
		is not able to add the packet sequence number” - but then, 9.3.1 immediately
		kills this hope. So, is this not a requirement, that would belong in 9.2 also?

		OK, I am being intentional daft here…..I think that you need to point out clearly,
		but section 9 does not seem quite the right place for that, that:

			“There are these two modes of operation. In either case, the 
			 requirements from 9.2 apply. If you use mode B, then the
			 requirements from 9.2 AND 9.3 apply.

			 For both modes of operation, the processing in section X applies.
			 If you use mode B, then the processing in section Y also applies"
			
		This, because section 9 “Packets and messages” start out looking more 
		like what is expected from a 5444 packet/message, but from 9.3.1 diverts
		more into a set of processing directives. 

		I think that the processing directives are clear and well written (thank you
		for adopting a non-ambiguous and clear format for this, it is refreshing and
		appreciated), but I think that it would be awesome to have a slightly different
		structure, something like this (don’t get hung up in the specifics of “Mode B”,
		you have a better term, and this is just to indicate the basic idea):

			9. Packets and Messages
			9.1 Definitions
			9.2 Requirements to an OLSRv2 Implementation Using This Metric
			9.3 Additional Requirements to an OLSRv2 Implementation using Mode B

			10. Processing Directives
			10.1 Processing Directives for “Mode A”
			10.2 Additional Processing Directives for when using “Mode B”

		I note that section 10 in the above would have a ton of subsections, essentially
		capturing the current 9.3.1 to 11

	Section 12 - IANA — this baffled me. Do we not need a code-point assigned from
		among the LINK_METRIC space? Or, was a decision made to not do this, 
		and to instead use the 224-255 type extension space for this experiment? 
		I would think that if that was the case, then some documentation of this would
		be good to include in this section.

		But, I would be very very supportive of an assignment of a Type Extension to
		LINK_METRIC for this DAT metric type.

	Section 13 - Security, this needs a review; notably RFC6622 has been obsoleted by
		RFC7182. I think that you probably also want to look at RFC7183, and consider
		if, perhaps, some of the provisions from the MTI part of 7183 would not guard
		against a few things there?

	Appendix B
		You use Linkspeed here, but "link speed” in the introduction, and “link-speed” 
		in section 5. Then, in some areas, you talk about “unicast data rate” — are 
		those the same, or different concepts? Could you do an editorial pass and
		unify this terminology, please?

Respectfully,

Thomas