Re: [manet] I-D Action: draft-ietf-manet-olsrv2-dat-metric-03.txt

Henning Rogge <hrogge@gmail.com> Tue, 25 November 2014 07:34 UTC

Return-Path: <hrogge@gmail.com>
X-Original-To: manet@ietfa.amsl.com
Delivered-To: manet@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8B51F1A0025 for <manet@ietfa.amsl.com>; Mon, 24 Nov 2014 23:34:26 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id z9dnicLFHrxw for <manet@ietfa.amsl.com>; Mon, 24 Nov 2014 23:34:24 -0800 (PST)
Received: from mail-wg0-x236.google.com (mail-wg0-x236.google.com [IPv6:2a00:1450:400c:c00::236]) (using TLSv1 with cipher ECDHE-RSA-RC4-SHA (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id A931E1A0006 for <manet@ietf.org>; Mon, 24 Nov 2014 23:34:23 -0800 (PST)
Received: by mail-wg0-f54.google.com with SMTP id l2so31240wgh.27 for <manet@ietf.org>; Mon, 24 Nov 2014 23:34:22 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=5AK6sLvBbqw8Xs1zm6f5K3ISWKg+EpMF/wBenle3UM8=; b=HMKJudXlWLxvFOIu4VCnA2yawaBUpZtQcQ9k2lkTP2blKtW0qYKBQUjohziFov1+Ty ijaWIEVjPDo23R4YSPSSXSP1pTHZpSNi9wnBlUDPpMikeYIqOT+ojKo1jhm8C+0LXaVr q7Pjr8a78EOMh7NcK61f1+SJ5+lLlVhh2+aobQNZQUpTiwDAY3l5EAEfHZSmixALarfo /Vkcq2qTjKOTRlV8MFnfmZm7t5Wo0vk8XRna90xYGkMpCrWOA7hfPYt1S2At6n2J0dqq U8ZUZXPWxShsd+qPNVWLOBQPwMlPvRCuWOFJDTl/Z0+GewnXoOvv3e2p1dgwMnN5/uMM MAaw==
X-Received: by 10.180.187.234 with SMTP id fv10mr29378970wic.25.1416900862519; Mon, 24 Nov 2014 23:34:22 -0800 (PST)
MIME-Version: 1.0
Received: by 10.27.8.75 with HTTP; Mon, 24 Nov 2014 23:34:02 -0800 (PST)
In-Reply-To: <87ppcco5eg.wl-jch@pps.univ-paris-diderot.fr>
References: <20141124125651.10916.18205.idtracker@ietfa.amsl.com> <87ppcco5eg.wl-jch@pps.univ-paris-diderot.fr>
From: Henning Rogge <hrogge@gmail.com>
Date: Tue, 25 Nov 2014 08:34:02 +0100
Message-ID: <CAGnRvuryy-ms8aU9zpB+ViWvaW+UjQPVczx0PVdVdP6UpzxhHA@mail.gmail.com>
To: Juliusz Chroboczek <jch@pps.univ-paris-diderot.fr>
Content-Type: text/plain; charset="UTF-8"
Archived-At: http://mailarchive.ietf.org/arch/msg/manet/bdlhcTRwtRsyCCFRFYqwHZvXRNU
Cc: MANET IETF <manet@ietf.org>
Subject: Re: [manet] I-D Action: draft-ietf-manet-olsrv2-dat-metric-03.txt
X-BeenThere: manet@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: Mobile Ad-hoc Networks <manet.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/manet>, <mailto:manet-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/manet/>
List-Post: <mailto:manet@ietf.org>
List-Help: <mailto:manet-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/manet>, <mailto:manet-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 25 Nov 2014 07:34:26 -0000

Hi Juliusz,

thank you for the quick review!

On Mon, Nov 24, 2014 at 6:49 PM, Juliusz Chroboczek
<jch@pps.univ-paris-diderot.fr> wrote:
> I strongly support publication of this document as an Experimental RFC.
>
> Minor nits follow.
>
>
> 1. Introduction
>
> The introduction seems to imply that DAT is preferable to ETX ("less
> efficient than in the past, which is one reason to move to a different
> metric").  I don't understand that -- it is my feeling that DAT is just
> a variant of ETX that's easier to implement.  If you stand by your claim,
> then that deserves more discussion.

Are you talking about ETT? Because ETX doesn't include the link speed.

We already had the situation in a few of community mesh networks where
they had to cut a few long range links... they were low speed (2
MBit/s in the example I remember) but had no packet loss. And ETX was
concentrating traffic on these links instead of going through a series
of high-speed links (also without packet loss).

So I think there is no question that a link-speed based metric is
better than ETX.

> 3. Applicability Statement
>
> Please clarify what happens when 8 packets in a row are lost.  Is the link
> dropped, or will it be used as the least-preferred link?  (The latter,
> judging from 10.2.5.2.)

8 packets in a row get lost?

The packet loss algorithm described does only care about the total
number of received/lost packets in the measurement period. If you
loose more than 7/8 of them (regardless of the number), the "packet
loss" part of the metric will go to the highest value.

Dropping links is not the job of the metric, its the job of the
validity timeout and the link hysteresis.

> (FWIW, experimental data indicate that 802.11 remains useful when 7/8
> packets are lost, pre-ARQ -- so your "7 or 8" is a little close for
> comfort.)

I think the change to 7/8 came from a discussion with you. With 7/8
packet loss you will loose ~ 50% of your packets even after ARQ, which
should nearly kill every TCP connection. I don't think it makes much
sense to measure how much worse than 7/8 loss you are. Its just "worst
link packet loss while still alive".

> 4. Rationale
>
> "The Directional Airtime Metric measures the incoming loss rate".  This is
> the loss rate in the "wrong" direction -- it's the loss rate in the
> outgoing direction that we're interested in.

Please have a look at http://tools.ietf.org/html/rfc7181#section-8.1

" L_in_metric will be specified by a process that is external to this
specification..."

The job of the metric is to calculate the cost of the incoming links.

>  I agree with you that this
> approximation gives reasonable results when combined with bidirectional
> reachability detection (or whatever it is called in NHDP), but this
> deserves some more discussion.

OLSRv2 use directional metrics, it doesn't combine incoming and outgoing metric.

> 5. Metric Functioning & Overview
>
> You're speaking about "packet loss", it is not clear that you measure
> packet loss for protocol packets only.

I will adapt the text in chapter 3 now that the "option A, B, C" for
packet loss gathering is gone. Thank you for finding this.

> It may be worth restating that
> that protocol packets are always sent over multicast (is that correct?).

OLSRv2/NHDP control traffic is normally done over multicast, but I
know that people have used unicast for OLSRv1 directional point to
point links. Not sure if we need to mention this.

> 6. Protocol Paremeters
>
> "Two routers with different values..."  The consequences of this are not
> clear.  What happens if routers are configured with different parameters?
> Will the network collapse, or will you merely end up with sub-optimal
> routing choices?  (I understand it's the latter, correct?)
>
> "DAT_HELLO_TIMEOUT_FACTOR = 1.2".  I'm using 1.5 in Babel, which gives
> more margin for late packets while still reliably avoiding mistaking the
> next hello for the previous one.

I have no really strong opinion on this... I think the value was 1.5
in an earlier document but got reduced later. 1.5 might be too large
if you use a 25% jitter... Jitter always decrease the interval between
messages, so you could get two Hellos within 1.5 Hello-intervals.

> 8. Data structures
>
> I don't think that citing DLEP is useful here, it only adds confusion by
> implying that this specification is dependent on DLEP.

DLEP is one potential source for the incoming bitrate, so I think its
worth mentioning here.

> Why are you using L_DAT_rx_bitrate instead of the tx bitrate?  (Given that
> you only have packet loss in the "wrong" direction, it's not clear to me
> whether you should be using link speed in the "right" or in the "wrong"
> direction; more discussion would be welcome.)

Because I calculate incoming link metric.

> 9.2
>
> Should a router using this metric ensure that there is a Hello in each
> packet it sends?  Or would the cost be prohibitive?
>
> (FWIW, Babel doesn't make that requirement, but it only measures packet
> loss for packets containing a Hello.)

Not necessary, because every RFC5444 packet can contain a packet
sequence number.

> 9.3 Link Loss Gathering
>
> "SHOULD be carried out".  Why SHOULD?

Yes, if the packet does not contain a packet sequence number of you
are unable to get the packet sequence number, you don't need to do it.
The algorithm will automatically fall back to a "hello timeout based"
packet loss estimation.

Maybe we could say it like "If the packet contains a packet sequence
number, it MUST be processed this way" ?

> "packets without packet sequence number MUST NOT be processed".  Why not?
> I'd say that if you have either INTERVAL_TIME or seqno, than you can
> manage.  If you have neither, then you're out of luck, of course.  At any
> rate, this MUST NOT seems to contradict the stand you're taking in 9.2.
> More discussion would be welcome

You are mixing up Hello (message) processing with packet sequence
number processing. The packet header does not contain INTERVAL_TIME or
VALIDITY_TIME TVLs.

> Step 2.3.  What happens when L_DAT_last_pkt_seqno = pkt_seqno?  Should you
> be dealing with this case specially?  Especially if you take into account
> my suggestion in 10.1 below.

Normally this would be an implementation bug, because the packet
sequence number should be increased for every sent packet.

We could add a check for this, but I am not sure it would really help.

> (FWIW, in Babel we treat specially the case where pkg_seqno is between
> L_DAT_last_pkt_seqno - 15 and L_DAT_last_pkt_seqno -- we "undo history"
> in that case.)

The algorithm does not store which sequence numbers were received and
which not, so it could not undo the history.

Sequence number going backward for a few numbers would be considered a
jump of 6553x forward.

> Step 2.3.  This case deserves further comments.  It implies that the
> neighbouring router has rebooted (since an extended outage would have
> caused your various timers to trigger); is it justified to keep the
> statistics, or should the whole state for this neighbour be reset?  (No
> objection to keeping the current behaviour, just make it clear what's
> going on here.)

I think its worth keeping the statistic... the whole thing happened
within a Hello Validity time (otherwise the link tuple would have been
removed), so the data is still valid.

> 10.1
>
> Shouldn't you be incrementing L_DAT_last_pkt_seqno?  (It doesn't matter
> much, since you'll reset the value the next time you receive a Hello, but
> you've concluded that you've lost a packet, so you might as well reflect
> it.  Doing this, however, implies further complexity in step 2.3 of 9.3.)

No.

This function does two jobs...

if you have a neighbor that does not deliver packet sequence numbers
(or you just cannot receive them), it will count a Hello-timeout as a
packet loss.

if you have a neighbor that DOES deliver packet sequence numbers, you
count the Hello-timeout events since the last received packet. It is
used to estimate the loss, but this estimation is instantly removed
when you receive the next packet.

Henning Rogge