[manet] Ready for WGLC: Advancing draft-ietf-manet-olsrv2-dat-metric

Thomas Clausen <thomas@thomasclausen.org> Mon, 28 July 2014 16:35 UTC

From: Thomas Clausen <thomas@thomasclausen.org>
Content-Type: multipart/alternative; boundary="Apple-Mail=_494A7F44-B763-4BF7-AE9A-926029B142F7"
Date: Mon, 28 Jul 2014 18:35:21 +0200
Message-Id: <D45879AA-544A-4EB1-86F8-7DC8C7C1065E@thomasclausen.org>
To: manet <manet@ietf.org>, "<manet-chairs@tools.ietf.org>" <manet-chairs@tools.ietf.org>, manet-ads <manet-ads@tools.ietf.org>
Mime-Version: 1.0 (Mac OS X Mail 7.3 \(1878.6\))
Archived-At: http://mailarchive.ietf.org/arch/msg/manet/v4NbwZj8n7B5LYnaUMRMBCDWItE
Cc: "Dr. Emmanuel Baccelli" <Emmanuel.Baccelli@inria.fr>
Subject: [manet] Ready for WGLC: Advancing draft-ietf-manet-olsrv2-dat-metric
Precedence: list

Dear WG Chairs, ADs,
Dear Henning, Emmanuel, all,

As one of the authors indicated by email recently, he believes that draft-ietf-manet-olsrv2-dat-metric is ready for WGLC.

As a reminder, this document aims for publication as an Experimental RFC.

I have, therefore, reviewed the document carefully. In my opinion, the author is right — in my opinion this document is ready for WGLC, and my comments can (at the authors discretion) be addressed either by spinning a new revision now, or during the WGLC.

Moreover, I believe that it is a highly important document for the WG to produce. Currently, hop-count metrics are all that’s specified — although experience shows that they have well-known issues, and that OLSRv2 therefore supports (of course) and encourages other metric types to be developed.

Is this metric “the one and only, be-all-end-all” of metrics? Probably not. But, it has the merits of being one that has been developed through extensive testing in OLSRv1 (RFC3626). That probably means that it’s also well applicable for similar deployments with OLSRv2, given the (algorithmic) similarity between OLSRv1 and OLSRv2 — experiments will tell us, and so publication as an Experimental RFC seems appropriate

With that being said, I previously indicated that I have a few nits identified, and the lack of an “This is the experiment(s) that publishing this RFC will enable” is one. Our benevolent ADs are “strongly suggesting” such sections in Experimental RFC, and I think that we as a WG should try to not intentionally irritate them by sending documents forward which do not have that.

I think that a subsection to the introduction would do quite nicely (see how we did in OLSRv2-MT, or how they did in RFC6971). Other than the “will this work as well in OLSRv2 as it did in OLSRv1” experiment, I think that there are a lots of interesting experiments possible. Through the below, I will try to point to some thing that jumps out at me when I read the document, and which I think could benefit from a separate section and discussion.

The challenge, IMO, would be to define a set of “experiments necessary to determine if taking this to PS, or to declaring this metric historic” — I would *love* to have a better metric than hop-count standardized, but I do not think that we’re quite there yet. So perhaps some back-and-forth there would be a good idea, as to what experiment(s) would allow us to determine this?

I have handful of issues and a handful of nits, that I expect the authors will consider along with any other WGLC comments they may receive. Of course, if a new version is spun before WGLC is requested, then I’ll gladly review that, also. Point being: let’s get a call started on this document, none of my nits and issues are of the sort that should be blocking issuing a WGLC.

Issues:

Placeholder for the “Needs a ‘The Experiments’ Section”, which was discussed above ;)

It appears evident when reading the introduction and applicability statement,
that this metric surely must need some “hooks” into the data forwarding
path, or into the L2 data structures — as well as, as you indicate, needs to learn
something about the static properties of the L2 (link rate). For the latter, you
state that DLEP can be used, as well as static configuration. For the former,
one is left with baited breath until section 9.3, wherein it is kinda-sorta
hinted that one doesn’t, the RFC5444 packets exchanged suffice.
Unless you plan on thus pulling an Agatha Christie on the reader, perhaps
something regarding this could be stated in the applicability statement?

The document uses the ill-defined term “node” — which is incorrect in a
specification. Please harmonize to the use of “Router” or “OLSRv2 Router”
as is done in RFC7181, both for correctness and for coherence with 7181
and related specifications. (And, “node” occurs only thrice, so should be
fairly easy to replace ;) )

I am not sure if the term “mesh networks” is one that we’re actually using
a lot - I believe that we’ve said “OLSRv2 routed network”, to be precise,
in other documents. Technically, there’s nothing wrong with “mesh networks”
except that many people do (incorrectly) think “layer 2” when they hear that
term. I would suggest to do what we can to avoid confusion ;)

Would you mind, in the References, to replace a reference:
OLD:
[OLSRV2] Clausen, T., Jacquet, P., and C. Dearlove, "The Optimized
Link State Routing Protocol version 2", draft-ietf-manet-
olsrv2-19 , March 2013.

NEW:
[RFC7181] Clausen, T., Jacquet, P., and C. Dearlove, "The Optimized
Link State Routing Protocol version 2", RFC7181 , March 2013.

Nits:
Introduction, 1st paragraph is oddly phrased, notably as RFC3626 was not a “standard” but an “Experimental RFC":

OLD:
One of the major shortcomings of OLSR [RFC3626] is the missing of a
link cost metric between mesh nodes. Operational experience with
mesh networks gathered since the standardization of OLSR has revealed
that wireless networks links can have highly variable and
heterogeneous properties. This makes a hopcount metric insufficient
for effective mesh routing.

NEW:
One of the shortcomings of OLSR [RFC3626] is the lack of a granular
link cost metric between mesh nodes. Operational experience with
mesh networks gathered since the publication of OLSR [RFC3626] has revealed
that wireless networks links can have highly variable and
heterogeneous properties. This makes a hopcount metric insufficient
for effective mesh routing.

Ultimate paragraph in introduction:
OLD:
This document describes a Directional Airtime routing metric for
OLSRv2, a successor of the OLSR.org routing metric for [RFC3626]. It
takes both the loss rate and the link speed into account to provide a
more accurate picture of the mesh network links.

NEW:
This document describes a Directional Airtime routing metric for
OLSRv2, a successor to the ETX-derived OLSR.org routing metric for [RFC3626]. It
takes both the loss rate and the link speed into account to provide a
more accurate picture of the links within the network.

Terminology:
For when discussing “UNDEFINED”, please specify that this is used only for the
descriptions of the processing in this document, and that the “value for UNDEFINED”
does not need to be agreed upon across routers in a deployment?
If that is the case, then please remove the last sentence “Might be -1 for this protocol”

Applicability Statement, last sentence in the penultimate paragraph, is it possible to be
a little more careful that the example looks like an example:

OLD:
It might be necessary to increase the
data-rate of the multicast transmissions, e.g. set the multicast
data-rate to 6 MBit/s if you use IEEE 802.11g only.

NEW:
It might, for example in IEEE 802.11g, be necessary to increase the
data-rate of the multicast transmissions, e.g. set the multicast
data-rate to 6 MBit/s.

Applicability statement, ultimate paragraph, I have a little bit of a hard time parsing:
The metric can only handle a certain range of packet loss and unicast
data-rate. Maximum packet loss is "ETX 8" (1 of 8 packets is
successfully sent to the receiver, without link layer
retransmissions), the unicast data-rate can be between 1 kBit/s and 2
GBit/s. The metric has been designed for data-rates of 1 MBit/s and
hundreds of MBit/s.

Specifically,”The metric” — that corresponds to what? "The DAT metric" (if
so, then I suggest saying that). A little bit later, you talk about “ETX 8” —
which to someone versed in over-the-air metrics isn’t gibberish, but which
to someone who’s not, most definitely is. The notation “ETX 8” is
not introduced - can you rephrase that? Perhaps by simply removing
“ETX 8” and expanding on the parenthesis?

Probably a nit, but isn’t the title for section 4 wrong (I think that it is “rationalE”)?

Section 4, first paragraph:

OLD:
The Directional Airtime Metric has been inspired by the publications
on the ETX [MOBICOM03] and ETT [MOBICOM04] metric, but has several
key differences.

NEW:
The Directional Airtime Metric has been inspired by the publications
on the ETX [MOBICOM03] and ETT [MOBICOM04] metric, but differs from
both of these in several ways.

Section 4, general:
Thank you for this section, this rationale is very helpful and educative, and
establishes the relationship to RFC7181 quite well.

Section 6, I have some general questions:
Any of these need to be exchanged between two routers, over the air?
I am asking because either it is too specific, or too vague?
“Floating point number between 1,0 and 2.0, large enough to….”

Section 6.1 “Recommended Values”
This almost reads as if it could be part of the “The Experiment” section, thus:

“In the community networks, where this has been deployed, which
includes […], [….], and [….], the values below were those which
experiments showed to work satisfactory. Understanding that these
networks represent one type of OLSRv2 deployments, and may not
be representative for all possible OLSRv2 deployments, nor even for
all possible community network deployments, this leads to two questions,
where further experimentation is required:

o Is this set of parameters generally applicable to Community
Mesh Networks, or are there adaptations required?

o Is this set of parameters generally applicable to other OLSRv2
deployments, or are there different parameter sets which apply?
“

Obviously, another part of the experiment would be “…and, can we come
up with an even better metric that doesn’t need parametrization at all, yet
would work everywhere?” — although that may be one of those pie-in-the-sky
things.

Section 7 starts a bit oddly, how about:
OLD:
This specification defines the following constants, which cannot be
changed without making the metric outputs incomparable:

NEW:
This specification defines two constants, agreement on which is
required, from all the OLSRv2 routers participating in the same deployment.
Two routers which use different values for these constants will not be able
to generate metric values which can be correctly interpreted by both. These
constants are:

But, I also want to say, that it is awesome that you do point this out so clearly.
Is there any doubt as to if those constants are “universal”, or is there some
experiment to be done here? [My gut-feeling tells me “no”, but my gut has
been wrong before…]

Section 8, first paragraph:

I would appreciate if you would point out if you need to update the Information
Bases in RFC6130, specifically if you need to update elements other than those
which you specify in this document.

I would also appreciate a pointer to if you use (but not update) information in
the information bases in RFC6130.

Finally, I would like to suggest:

OLD:
This specification extends the Link Set Tuples of the Interface
Information Base, as defined in [RFC6130] section 7.1, by the
following additional elements for each link tuple when being used
with this metric:

NEW:
This specification extends the Link Set of the Interface
Information Base, as defined in [RFC6130] section 7.1, by adding the
following elements to each link tuple:

And, with this, I would add L_DAT_last_pkt_seqno to the list on same
level as the other 6 elements. I know that it is not always that it is used,
but with the caveat you give (which you should keep in a modified form)
that can be left for an implementer to figure out. The extension in this
document needs it, so it’s something that this doc imposes on 6130.

Section 9.2, suggest that the title be rephrased to something like
“Minimum Requirements to an OLSRv2 Implementation for using this metric”
I would actually also like to see this piece of information included in
the applicability statement.

Section 9.2, what would happen, would there be a possible back-up mode in
case no INTERVAL_TIME TLV is included? Worst case, could you not
make an assumption based on a VALIDITY_TIME TLV present? How
bad would that be? Do we know? Or, is this something worth experimenting
with (to the experiments section)? Right now, that particular experiment
is not enabled by this document (as it requires INTERVAL_TIME), so
perhaps it is “known to be FUBAR” — in which case, actually, stating
that (and why) in the experiments section would be awesome.

Section 9.3, when I read this my initial thought (from reading the paragraph between
9.3 and 9.3.1) that “Oh, ok, so this tells me what to do if my implementation
is not able to add the packet sequence number” - but then, 9.3.1 immediately
kills this hope. So, is this not a requirement, that would belong in 9.2 also?

OK, I am being intentional daft here…..I think that you need to point out clearly,
but section 9 does not seem quite the right place for that, that:

“There are these two modes of operation. In either case, the
requirements from 9.2 apply. If you use mode B, then the
requirements from 9.2 AND 9.3 apply.

For both modes of operation, the processing in section X applies.
If you use mode B, then the processing in section Y also applies"

This, because section 9 “Packets and messages” start out looking more
like what is expected from a 5444 packet/message, but from 9.3.1 diverts
more into a set of processing directives.

I think that the processing directives are clear and well written (thank you
for adopting a non-ambiguous and clear format for this, it is refreshing and
appreciated), but I think that it would be awesome to have a slightly different
structure, something like this (don’t get hung up in the specifics of “Mode B”,
you have a better term, and this is just to indicate the basic idea):

9. Packets and Messages
9.1 Definitions
9.2 Requirements to an OLSRv2 Implementation Using This Metric
9.3 Additional Requirements to an OLSRv2 Implementation using Mode B

10. Processing Directives
10.1 Processing Directives for “Mode A”
10.2 Additional Processing Directives for when using “Mode B”

I note that section 10 in the above would have a ton of subsections, essentially
capturing the current 9.3.1 to 11

Section 12 - IANA — this baffled me. Do we not need a code-point assigned from
among the LINK_METRIC space? Or, was a decision made to not do this,
and to instead use the 224-255 type extension space for this experiment?
I would think that if that was the case, then some documentation of this would
be good to include in this section.

But, I would be very very supportive of an assignment of a Type Extension to
LINK_METRIC for this DAT metric type.

Section 13 - Security, this needs a review; notably RFC6622 has been obsoleted by
RFC7182. I think that you probably also want to look at RFC7183, and consider
if, perhaps, some of the provisions from the MTI part of 7183 would not guard
against a few things there?

Appendix B
You use Linkspeed here, but "link speed” in the introduction, and “link-speed”
in section 5. Then, in some areas, you talk about “unicast data rate” — are
those the same, or different concepts? Could you do an editorial pass and
unify this terminology, please?

Respectfully,

Thomas

[manet] Ready for WGLC: Advancing draft-ietf-mane… Thomas Clausen