Re: [aqm] [Cerowrt-devel] ping loss "considered harmful"

Dave Taht <dave.taht@gmail.com> Thu, 05 March 2015 20:53 UTC

Return-Path: <dave.taht@gmail.com>
X-Original-To: aqm@ietfa.amsl.com
Delivered-To: aqm@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 06BBF1A8AE1 for <aqm@ietfa.amsl.com>; Thu, 5 Mar 2015 12:53:42 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, SPF_PASS=-0.001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id bJXAOlULU78X for <aqm@ietfa.amsl.com>; Thu, 5 Mar 2015 12:53:35 -0800 (PST)
Received: from mail-oi0-x236.google.com (mail-oi0-x236.google.com [IPv6:2607:f8b0:4003:c06::236]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 12EA01A8AE2 for <aqm@ietf.org>; Thu, 5 Mar 2015 12:53:31 -0800 (PST)
Received: by oiba3 with SMTP id a3so14062498oib.7 for <aqm@ietf.org>; Thu, 05 Mar 2015 12:53:30 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=yykONdW7aKKF3COp8cLUJ8AYrBxI5MlV0VeqObb0otI=; b=pBzudvM0QsWy5wCyKSol74LpAyj9w0Kzz3zzaR021fBaEP1aZzjisIxgpA1EVqAzFf WRbj6m+JMcZIJjTrxIDyDx7ZWckE/ISjXMfiYMTZ9W7EoW9depVcqUMCxYDEWeBZiSfW ug2YsLfLMtd4EKWNurL8i/moIfXHIIzK0Z2DLw3sclVSj443zKrQf4SHgPKyMlQwNw6L 5VPMImAn/9aPZ6A5JVfAL80LC3njf8rtLvolTXp5ihNz4ZWYJCQtKbFKIzMplA1qfKQB bvG44e5kysQqAEQepLJgJ0U6Ug3mMInStlSPYYkmJzFG56NFP1dr6y+0YQ9FH1l3oT8A f+zg==
MIME-Version: 1.0
X-Received: by 10.202.3.65 with SMTP id 62mr7906230oid.11.1425588810455; Thu, 05 Mar 2015 12:53:30 -0800 (PST)
Received: by 10.202.51.66 with HTTP; Thu, 5 Mar 2015 12:53:30 -0800 (PST)
In-Reply-To: <20150305203813.0818D1F1@taggart.lackof.org>
References: <CAA93jw7KW=9PH002d3Via5ks6+mHScz5VDhpPVqLUGK2K=Mhew@mail.gmail.com> <20150305203813.0818D1F1@taggart.lackof.org>
Date: Thu, 05 Mar 2015 12:53:30 -0800
Message-ID: <CAA93jw4g+aZwjL5Gfm25rMbhCs2JSONncLYMiwh_NAqJZ_surA@mail.gmail.com>
From: Dave Taht <dave.taht@gmail.com>
To: Matt Taggart <matt@lackof.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <http://mailarchive.ietf.org/arch/msg/aqm/1xQjtUVljrI0o8GBVNcvBvL1jUY>
Cc: "NZNOG@list.waikato.ac.nz" <NZNOG@list.waikato.ac.nz>, "aqm@ietf.org" <aqm@ietf.org>, "cerowrt-devel@lists.bufferbloat.net" <cerowrt-devel@lists.bufferbloat.net>, bloat <bloat@lists.bufferbloat.net>
Subject: Re: [aqm] [Cerowrt-devel] ping loss "considered harmful"
X-BeenThere: aqm@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: "Discussion list for active queue management and flow isolation." <aqm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/aqm>, <mailto:aqm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/aqm/>
List-Post: <mailto:aqm@ietf.org>
List-Help: <mailto:aqm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/aqm>, <mailto:aqm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 05 Mar 2015 20:53:42 -0000

I had spoken to someone at nznog that promised to combine mrtg +
smokeping or cacti + smokeping so as to be able to get long term
latency and bandwidth numbers on one graph. cc added.

On Thu, Mar 5, 2015 at 12:38 PM, Matt Taggart <matt@lackof.org> wrote:
> Dave Taht writes:
>
>> wow. It never registered to me that users might make a value judgement
>> based on the amount of ping *loss*, rather than latency, and in looking back in time, I can
>> think of multiple people that have said things based on their
>> perception that losing pings was bad, and that sqm-scripts was "worse
>> than something else because of it."
>
> This thread makes me realize that my standard method of measuring latency
> over time might have issues. I use smokeping
>
>   http://oss.oetiker.ch/smokeping/


in sqm-scripts's case, possibly, all you have been collecting is
largely worst case behavior, which I don't mind collecting as it tends
to be pretty good. :)

However, I have been unclear. In the main (modern - I don't know what
version you have) sqm code, IF you enable dscp squashing on inbound
(the default), you do end up with a single fq_codel queue, not 3, no
classification or ping prioritization. (it is the default because of
all the re-marking I have seen from comcast)

So if you are, as I am, monitoring your boxes from the outside, there
is no classification and prioritization present for ping.

do a tc -s qdisc show ifbwhatever (varies by platform) to see how many
queues you have. Example of a single queued inbound rate limiter +
fq_codel (yea! packet drop AND ecn working great!)

root@lorna-gw:~# tc -s qdisc show dev ifb4ge00
qdisc htb 1: root refcnt 2 r2q 10 default 10 direct_packets_stat 0
direct_qlen 32
 Sent 168443514948 bytes 334370551 pkt (dropped 0, overlimits
143273498 requeues 0)
 backlog 0b 0p requeues 0
qdisc fq_codel 110: parent 1:10 limit 1001p flows 1024 quantum 300
target 5.0ms interval 100.0ms ecn
 Sent 168443514948 bytes 334370551 pkt (dropped 17480, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  maxpacket 1514 drop_overlimit 0 new_flow_count 125872421 ecn_mark 1044
  new_flows_len 0 old_flows_len 1

root@lorna-gw:~# uptime
 12:45:35 up 54 days, 22:33,  load average: 0.05, 0.05, 0.04

dscp classification in general, is only useful from within your own
network, going outside.

> which is a really nice way of measuring and visualizing packet loss and
> variations in latency. I am using the default probe type which uses fping
> (ICMP http://www.fping.org/ ).

I LOVE smokeping and wish very much we had a way to combine it with
mrtg data to see latency AND bandwidth at the same time.

>
> It has been working well, I set it up for a site in advance of setting up
> SQM and then afterwards I can see the changes and determine if more tuning
> is needed.  But if ICMP is having it's priority adjusted (up or down), then
> the results might not reflect the latency of other services.
>
> Fortunately the nice thing is that many other probe types exist
>
>   http://oss.oetiker.ch/smokeping/probe/index.en.html
>
> So which probe types would be good to use for bufferbloat measurement? I
> guess the answer is "whatever is important to you", but I also suspect
> there is a set of things that ISPs are known to mess with.
> HTTP? But also maybe HTTPS in case they are doing some sort of transparent
> proxy?
> DNS?
> SIP?
> I suppose you could even do explicit checks for things like Netflix (but
> then it's easy to go off on a tangent of building a net neutrality
> observatory).
>
> On a somewhat related note, I was once using smokeping to measure a fiber
> link to a bandwidth provider and had it configured to ping the router IP on
> the other side of the link. In talking to one of their engineers, I learned
> that they deprioritize ICMP when talking _with_ their routers, so my
> measurement weren't valid. (I don't know if they deprioritize ICMP traffic
> going _through_ their routers)

I do strongly recomend deprioritizing ping slightly, and as I noted, I
have seen many a borken
script that actually prioritized it, which is foolish, at best.

I keep hoping multiple (many!) someones here will go have lunch with
their company's oft lonely, oft starving sysadmin(s), to ask them what
they are doing as to firewalling, QoS and traffic shaping. Most of the
ones I have talked are quite eager to show off their work, which is
unfortunately often of wildly varying quality and complexity.

I find that an offer of saki and sushi are most conducive to getting
that conversation started.

I certainly would like to see more default corporate
firewall/QoS/shaping rules than I have personally, for various
platforms. Someone's got to have some good ideas in them... and it
would be nice to know how far the bad ones, have propagated.

> --
> Matt Taggart
> matt@lackof.org
>
>



-- 
Dave Täht
Let's make wifi fast, less jittery and reliable again!

https://plus.google.com/u/0/107942175615993706558/posts/TVX3o84jjmb