Re: [ippm] [Bloat] Fwd: New Version Notification for draft-cpaasch-ippm-responsiveness-00.txt

Toerless Eckert <tte@cs.fau.de> Tue, 21 September 2021 20:50 UTC

Return-Path: <eckert@i4.informatik.uni-erlangen.de>
X-Original-To: ippm@ietfa.amsl.com
Delivered-To: ippm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9988C3A12A5; Tue, 21 Sep 2021 13:50:18 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.899
X-Spam-Level:
X-Spam-Status: No, score=-1.899 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ash7oFloj_FV; Tue, 21 Sep 2021 13:50:12 -0700 (PDT)
Received: from faui40.informatik.uni-erlangen.de (faui40.informatik.uni-erlangen.de [131.188.34.40]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B2F7E3A129E; Tue, 21 Sep 2021 13:50:10 -0700 (PDT)
Received: from faui48e.informatik.uni-erlangen.de (faui48e.informatik.uni-erlangen.de [IPv6:2001:638:a000:4134::ffff:51]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by faui40.informatik.uni-erlangen.de (Postfix) with ESMTPS id 0B102548059; Tue, 21 Sep 2021 22:50:04 +0200 (CEST)
Received: by faui48e.informatik.uni-erlangen.de (Postfix, from userid 10463) id EBAA84E9883; Tue, 21 Sep 2021 22:50:03 +0200 (CEST)
Date: Tue, 21 Sep 2021 22:50:03 +0200
From: Toerless Eckert <tte@cs.fau.de>
To: Erik Auerswald <auerswal@unix-ag.uni-kl.de>
Cc: bloat@lists.bufferbloat.net, draft-cpaasch-ippm-responsiveness@ietf.org, ippm@ietf.org
Message-ID: <YUpFeyWcEzSoXrUG@faui48e.informatik.uni-erlangen.de>
References: <YRbm8ZqLdi3xs3bl@MacBook-Pro-2.local> <20210815133922.GA18118@unix-ag.uni-kl.de>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <20210815133922.GA18118@unix-ag.uni-kl.de>
Archived-At: <https://mailarchive.ietf.org/arch/msg/ippm/1MB7tHrINzMSWF9XzaBBOQy_a14>
Subject: Re: [ippm] [Bloat] Fwd: New Version Notification for draft-cpaasch-ippm-responsiveness-00.txt
X-BeenThere: ippm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: IETF IP Performance Metrics Working Group <ippm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ippm>, <mailto:ippm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ippm/>
List-Post: <mailto:ippm@ietf.org>
List-Help: <mailto:ippm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ippm>, <mailto:ippm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 21 Sep 2021 20:50:19 -0000

Dear authors

Thanks for the draft

a) Can you please update naming of the draft so people remembering RPM will find the draft ?
   Something like:

   draft-cpaasch-ippm-rpm-bufferbloat-metric-00
   Round-trips Per Minute (RPM) under load - a Metric for bufferbloat.

b) The draft does not mention, or at least does not have a
   separate section to discuss where the server is against which the test is run.
   It should have such a section. I can hink of at least two key options,
   - the server used for the service in question (e.g.: where contents comes from),
   - a server at a wll defined location in the access network provider.

c) I fear that b) leads to be biggest current issue with the metric:
   The longer the path is, such as full path to a server, the more useful the
   metric is for the user. But the user will effectively get a per-service metric.
   To make this more fun to the authors: Imagine the appleTV server nodes have a worse
   path to a particular user than the Netflix servers. Or vice versa.

   If we just use a path to some fixed point in the access provider,
   then we take away the users ability to beat up their OTT services to
   improve their paths. 

   If we use only a path toward the service, it will be harder to 
   hit on the service provider, if the service provider is bad.

   So, obviously, i would like to have all three RPM: to Netflix, AppleTV
   and a well defined server in Comcast. Then i can triangulate where
   my bufferbloat problem is.

d) Worst yet, without having seen more example numbers (a reference pointing
   to some good collected RPM numbers would be excellent), my
   concern is that instead of fixing bufferbloat on paths, we would simply
   encourage OTT to co-locate servers to the access providers own measurement
   point, aka: as close to the subscriber.

e) To solve d), maybe two ideas:

   - relevant to improve bufferbloat is only (lRPM - iRPM), where
     lRPM would be your current RPM, e.g.: under (l)oaded condition),
     and iRPM is idle RPM. This still does not take away from the
     fact that a path with more queuing hops or higher queue loads
     will fare worse than the shorter physcial propagation latency path,
     but it does mke the metric significantly be focussed on queueing,
     and should help a lot when we do compare service that might not
     have servers in the users metro area.

   - lRPM/m - RPM under load per mile (roughly).
     - Measure idle RTT in units of msec (iRTT)

     - Measure load RTT in units of msec (lRTT)

     - Just take iRTT as a measure for the path lenth.
       normalizing it absolutely is not of first order
       important, we are primarily interested in relative number,
       and this keeps the example calculation simple.

     - the RTT increase because of queueing is (lRTT - iRTT).

     - (lRTT - iRTT) / iRTT is therefore something like queuing RTT
       per path stretch. I think this is th relative number we want.

     - RPM = iRTT / (lRTT - iRTT) * 1000 turns this into some 
       number increasing with desired non-bufferbloat performance
       with enough significant in non fractionals.

     - Example: 
        idle RTT:  5msec, loaded RTT: 20 msec =>  333 RPM
        idle RTT: 10msec, loaded RTT: 20 msec => 1000 RPM
        idle RTT: 15msec, loaded RTT: 20 msec => 3000 RPM

        This nicely shows how the RPM will go up when the physcial
        path itself gets longer, but the relevant load RTT stays 
        the same.

        idle RTT:  5msec, loaded RTT: 20 msec =>  333 RPM
        idle RTT: 10msec, loaded RTT: 40 msec =>  333 RPM
        idle RTT: 15msec, loaded RTT: 60 msec =>  333 RPM

        This nicely shows that we can have servers at different
        physical distance and get the same RPM number, when the
        bufferbloat is the same, e.g.: 15 msec worth of bufferbloat
        for every 5msec propagation latency segment.

f) I can see how you do NOT want the type of metric i am
   proposing, because it only focusses on the bufferbloat
   factor, and you may want to stick to the full experience of
   the user, where unmistakingly the propagation latency can
   not be ignored, but to repeat from above:

   If we do not use a metric that fairly treats paths of different
   propagation latencies as the same wrt. performance, i am
   quite persuaded we will continue to just see big services
   win out, because hey can more easily afford to get closer
   to the user with their (rented/time-shared/owned) servers.

   Aka: Right now RPM is a metric that will specifically 
   make it easier for one of the big providers of sttreaming
   such as that of the authors to position themselves better
   against smaller services streaming from further away.

Cheers
    Toerless