Re: [tcpm] draft-ietf-tcpm-rack-05 review

Yuchung Cheng <> Wed, 11 September 2019 19:20 UTC

Return-Path: <>
Received: from localhost (localhost []) by (Postfix) with ESMTP id 72CC0120865 for <>; Wed, 11 Sep 2019 12:20:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at
X-Spam-Flag: NO
X-Spam-Score: -17.501
X-Spam-Status: No, score=-17.501 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_MED=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_DEF_DKIM_WL=-7.5, USER_IN_DEF_SPF_WL=-7.5] autolearn=ham autolearn_force=no
Authentication-Results: (amavisd-new); dkim=pass (2048-bit key)
Received: from ([]) by localhost ( []) (amavisd-new, port 10024) with ESMTP id QitNhoaTaqSu for <>; Wed, 11 Sep 2019 12:19:58 -0700 (PDT)
Received: from ( [IPv6:2a00:1450:4864:20::344]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by (Postfix) with ESMTPS id 8F2E812006E for <>; Wed, 11 Sep 2019 12:19:58 -0700 (PDT)
Received: by with SMTP id q18so4789401wmq.3 for <>; Wed, 11 Sep 2019 12:19:58 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=pWXSIQYwoj2HsDaMsDqKf7fKxTb1hKIBELH+D33oUa8=; b=NyjXNE0k6agCp+ENmF9eZedvLrelwKx57TCHNHj6WR1TneHvVPxth3u8i6M2AyYvpO kzUPI5Wjb3FidMzEPCnzPN3eGMnE4odhZYUKDfHRq4NopTEgb30NdRoZR3+hIBomxd1E 2yEwEqrMLKDoKPe2l5hNjQIUk0cBXzZiZyRIQkzh0FGLC1+yuhDepE9SchqB3K6CusuH ww8Lkui6RULJcghoKdJea6YURRx8t6XON7WBmC71HCV6zVlrbyKRGfjvUgOhoNTXlfG7 R4Sc4/dYyEb6kWlnMF1gFD3SYAhHHxs6hjs/wIUlYHaITqFvu0hPBfouZiRmDGAwplnx /REg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=pWXSIQYwoj2HsDaMsDqKf7fKxTb1hKIBELH+D33oUa8=; b=GTkqZnu/XVBlOznFmpA6m8zHDfHt904QXBVb5MIabuwCDaWrrab14ZFzqkpICeP3Xa g9ZuCPkxo1GO7lKANt9oySO9d3kblVYbwvH1DTTkyK2Q9o5KWk2Im1MdPmA4GPHwA03/ QI25kx0m2GVjVjNqocaDYDCkkPQqrM37L5Pgtw4Z2BmetBYuRkw6Nb6ACm4JVyOwaae2 eEHggGBPKdpjb66lp0P+IBoIZ8jZnsPCpa+ZmFff4ODnpzJdY0HnnwP4Y5Mqnn0mmB/t fZCkVZk34SSYg10LuCgi6caFD3SmiSG16JmXw8zxYnk0U+EX4K1FE9/SaJsK2VlXmEcI asDg==
X-Gm-Message-State: APjAAAWcED8jL6gKoROdohELwwrLBYpTa2ccgxUgPXiV1p1e0wOyZkVh UH/FN4qWvMuJoAJsEWvRJxoNkEqWiC1Fpv9sdj/L/Q==
X-Google-Smtp-Source: APXvYqxtzLiTdqIyLe21ZdxGUbQWXBgTN90RGnJLHj4OO6TZ1osDqv34vQGJiMbTSc/FBx6QGbBKfuIUyU5b4c4+ZnQ=
X-Received: by 2002:a1c:7319:: with SMTP id d25mr5041781wmb.56.1568229596775; Wed, 11 Sep 2019 12:19:56 -0700 (PDT)
MIME-Version: 1.0
References: <> <> <> <> <> <> <>
In-Reply-To: <>
From: Yuchung Cheng <>
Date: Wed, 11 Sep 2019 12:19:16 -0700
Message-ID: <>
To: Emmanuel Lochin <>
Cc: " Extensions" <>, Kuhn Nicolas <>, Priyaranjan Jha <>, Neal Cardwell <>
Content-Type: text/plain; charset="UTF-8"
Archived-At: <>
Subject: Re: [tcpm] draft-ietf-tcpm-rack-05 review
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <>
List-Unsubscribe: <>, <>
List-Archive: <>
List-Post: <>
List-Help: <>
List-Subscribe: <>, <>
X-List-Received-Date: Wed, 11 Sep 2019 19:20:00 -0000

Re-sending in plainetext to avoid tcpm filter

Hi Emmanuel,

Sorry for the late reply and thanks for testing RACK. Some questions:
1) what's the congestion control? I assume it's the default (i.e. Cubic)

2) what's the value of sysctl net.ipv4.tcp_no_metrics_save? I assume
it's default 0 meaning TCP save some metrics to be reused by future
flows towards same IP

3) what's the bandwidth of the bottleneck, or is BDP generally beyond
the 5000 packets in your test

If my assumptions above are correct, I speculate that the performance
gap is caused by spurious early exit TCP Cubic slow-start. This is
caused by RACK not catching the some reordering well initially with
its init RTT/4 reordering window. Many RACK parameters are tuned based
on Google's traffic where nominal RTT is much lower than 500ms.

In contrast, the Linux dupack implementation (referred by 3DUPACK in
your graph) has two features:
a) it dynamically extends the dupthresh to a maximum of 300 packets
(sysctl net.ipv4.tcp_max_reordering), hence makes TCP very resilient
if only OOO is reordering not drops.
b) the high dupthresh is cached and reused for new connection toward
the same dst ip by (2).
Thus it's possible all subsequent flows in one test are starting with
dupthresh to the highest possible, making it bullet-proof to any
reordering by never entering fast recovery falsely.

To check my theory, could you provide some tcpdump of each test (RACK
vs non-RACK)? only header captures are needed.