[tsvwg] Fwd: [PATCH v5 net-next 00/13] tcp: BIG TCP implementation

Dave Taht <dave.taht@gmail.com> Wed, 11 May 2022 03:14 UTC

Return-Path: <dave.taht@gmail.com>
X-Original-To: tsvwg@ietfa.amsl.com
Delivered-To: tsvwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 091AAC1594AC for <tsvwg@ietfa.amsl.com>; Tue, 10 May 2022 20:14:46 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.098
X-Spam-Level:
X-Spam-Status: No, score=-2.098 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RCI6egl0rIHh for <tsvwg@ietfa.amsl.com>; Tue, 10 May 2022 20:14:41 -0700 (PDT)
Received: from mail-ej1-x630.google.com (mail-ej1-x630.google.com [IPv6:2a00:1450:4864:20::630]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 9B66CC157B3E for <tsvwg@ietf.org>; Tue, 10 May 2022 20:14:41 -0700 (PDT)
Received: by mail-ej1-x630.google.com with SMTP id i19so1354442eja.11 for <tsvwg@ietf.org>; Tue, 10 May 2022 20:14:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :content-transfer-encoding; bh=TPgNOrcvAWkW/nCu2CEslqki7SrO6hTatQwKJO48PxI=; b=IBbWIDEbBNNjwIUVhxa9msqrxR2VljegYDtsjlGdlKkwwzv9/j1kw2RaGLVnp/LjQg Uv+vQjJqncVF30uZs5WxR75Cn/XLZEEUYpdBp1U4eBwRpQjhKKegaIsCAIwqf8zkjb+S 1/Uv3GScH5U9wvHXTA+w8hMFYnpYvsd9eZAr8OoSPuUpAtNH6x1hB0SXvRblVO56fBQd qV+zYgxIExTRrhyHmtzGU9LWnHTjU2xT8es3KxtuuW6T2/eVDT5YaycwwL8IEP2GW2W5 KpPBWCTw+YgglgQv3BNYWHnsji58GXFu8dJKz6VX9dMiwI6Tvi2xQWgesXP1I6y3r+cT V8uQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:content-transfer-encoding; bh=TPgNOrcvAWkW/nCu2CEslqki7SrO6hTatQwKJO48PxI=; b=NmPhhxWpqvb/GRfuJFPBaV+bzVqRwxgGjo6PXaE5gSVDAkOBo0ShePtTKME7J6xcw2 RraCHqRVuY+s8i6ZLmcWFd7oNDFBmr2v7nzLR9IcqRlBaA0BaWldghWdwCdG/7SY/8rF ZNBnmgQ65t/W+62Wa+IeRcPBtZhUl7sqhI+6CrB3hcW1w56ocvpUTXcIjGFhMKQWm6Td XYt3As2vezApiXi/7FRIJF7kGrQ7dHPfCE169yfMp0Dh3TW/4EmT6c2T15Hrh19hlaS/ oTbYGRPFjvxGXvcBgh26R5L56kk2znJGh2rfzS9grYcW3wZhfNzn6NPG9XEPvF/jAiZp Vofg==
X-Gm-Message-State: AOAM531hLShcHb+qbyT8te9gPEugW6RgYVTqRQihq/Igmatr3j/ObG/1 drmQPdYBV4yp8oFsWdM5eC1e2lztsC7Pb9Lezdsom8byzbA=
X-Google-Smtp-Source: ABdhPJzhz0HT5T6smvd+lHIj2rtFfnpHpMSDu6I4rr+m+eLzaKjwLs3EOllOl+3Aodra/W+SDKTAN9tLWrER0CIyOr0=
X-Received: by 2002:a17:907:7f12:b0:6f4:57e7:b20a with SMTP id qf18-20020a1709077f1200b006f457e7b20amr21757121ejc.538.1652238879539; Tue, 10 May 2022 20:14:39 -0700 (PDT)
MIME-Version: 1.0
References: <20220509222149.1763877-1-eric.dumazet@gmail.com>
In-Reply-To: <20220509222149.1763877-1-eric.dumazet@gmail.com>
From: Dave Taht <dave.taht@gmail.com>
Date: Tue, 10 May 2022 20:14:26 -0700
Message-ID: <CAA93jw7-okiSuFU-hxBW1SH6u4aU0T6rXqDpj6aTv1ZPo5qvOA@mail.gmail.com>
To: tsvwg IETF list <tsvwg@ietf.org>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://mailarchive.ietf.org/arch/msg/tsvwg/yxdMVZ6Pu4JnJZp9YO5s4kZiWM0>
Subject: [tsvwg] Fwd: [PATCH v5 net-next 00/13] tcp: BIG TCP implementation
X-BeenThere: tsvwg@ietf.org
X-Mailman-Version: 2.1.34
Precedence: list
List-Id: Transport Area Working Group <tsvwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tsvwg/>
List-Post: <mailto:tsvwg@ietf.org>
List-Help: <mailto:tsvwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tsvwg>, <mailto:tsvwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 11 May 2022 03:14:46 -0000

I am not aware of how well this list keeps up with developments in my world.

But it is really remarkable to finally see this facility appear, so
many years after it was first thought about.

https://lwn.net/Articles/884104/

---------- Forwarded message ---------
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, May 9, 2022 at 3:34 PM
Subject: [PATCH v5 net-next 00/13] tcp: BIG TCP implementation
To: David S . Miller <davem@davemloft.net>, Jakub Kicinski
<kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: netdev <netdev@vger.kernel.org>, Alexander Duyck
<alexanderduyck@fb.com>, Coco Li <lixiaoyan@google.com>, Eric Dumazet
<edumazet@google.com>, Eric Dumazet <eric.dumazet@gmail.com>


From: Eric Dumazet <edumazet@google.com>

This series implements BIG TCP as presented in netdev 0x15:

https://netdevconf.info/0x15/session.html?BIG-TCP

Jonathan Corbet made a nice summary: https://lwn.net/Articles/884104/

Standard TSO/GRO packet limit is 64KB

With BIG TCP, we allow bigger TSO/GRO packet sizes for IPv6 traffic.

Note that this feature is by default not enabled, because it might
break some eBPF programs assuming TCP header immediately follows IPv6 header.

While tcpdump recognizes the HBH/Jumbo header, standard pcap filters
are unable to skip over IPv6 extension headers.

Reducing number of packets traversing networking stack usually improves
performance, as shown on this experiment using a 100Gbit NIC, and 4K MTU.

'Standard' performance with current (74KB) limits.
for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000
-O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
77           138          183          8542.19
79           143          178          8215.28
70           117          164          9543.39
80           144          176          8183.71
78           126          155          9108.47
80           146          184          8115.19
71           113          165          9510.96
74           113          164          9518.74
79           137          178          8575.04
73           111          171          9561.73

Now enable BIG TCP on both hosts.

ip link set dev eth0 gro_max_size 185000 gso_max_size 185000
for i in {1..10}; do ./netperf -t TCP_RR -H iroa23  -- -r80000,80000
-O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done
57           83           117          13871.38
64           118          155          11432.94
65           116          148          11507.62
60           105          136          12645.15
60           103          135          12760.34
60           102          134          12832.64
62           109          132          10877.68
58           82           115          14052.93
57           83           124          14212.58
57           82           119          14196.01

We see an increase of transactions per second, and lower latencies as well.

v5: Replaced two patches (that were adding new attributes) with patches
    from Alexander Duyck. Idea is to reuse existing gso_max_size/gro_max_size

v4: Rebased on top of Jakub series (Merge branch 'tso-gso-limit-split')
    max_tso_size is now family independent.

v3: Fixed a typo in RFC number (Alexander)
    Added Reviewed-by: tags from Tariq on mlx4/mlx5 parts.

v2: Removed the MAX_SKB_FRAGS change, this belongs to a different series.
    Addressed feedback, for Alexander and nvidia folks.




Alexander Duyck (2):
  net: allow gso_max_size to exceed 65536
  net: allow gro_max_size to exceed 65536

Coco Li (2):
  ipv6: Add hop-by-hop header to jumbograms in ip6_output
  mlx5: support BIG TCP packets

Eric Dumazet (9):
  net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes
  net: limit GSO_MAX_SIZE to 524280 bytes
  tcp_cubic: make hystart_ack_delay() aware of BIG TCP
  ipv6: add struct hop_jumbo_hdr definition
  ipv6/gso: remove temporary HBH/jumbo header
  ipv6/gro: insert temporary HBH/jumbo header
  net: loopback: enable BIG TCP packets
  veth: enable BIG TCP packets
  mlx4: support BIG TCP packets

 drivers/net/ethernet/amd/xgbe/xgbe.h          |  3 +-
 .../net/ethernet/mellanox/mlx4/en_netdev.c    |  3 +
 drivers/net/ethernet/mellanox/mlx4/en_tx.c    | 47 +++++++++--
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  1 +
 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  2 +-
 .../net/ethernet/mellanox/mlx5/core/en_tx.c   | 84 +++++++++++++++----
 drivers/net/ethernet/sfc/ef100_nic.c          |  3 +-
 drivers/net/ethernet/sfc/falcon/tx.c          |  3 +-
 drivers/net/ethernet/sfc/tx_common.c          |  3 +-
 drivers/net/ethernet/synopsys/dwc-xlgmac.h    |  3 +-
 drivers/net/hyperv/rndis_filter.c             |  2 +-
 drivers/net/loopback.c                        |  2 +
 drivers/net/veth.c                            |  1 +
 drivers/scsi/fcoe/fcoe.c                      |  2 +-
 include/linux/ipv6.h                          |  1 +
 include/linux/netdevice.h                     | 16 +++-
 include/net/ipv6.h                            | 44 ++++++++++
 include/uapi/linux/if_link.h                  |  2 +
 net/bpf/test_run.c                            |  2 +-
 net/core/dev.c                                |  7 +-
 net/core/gro.c                                |  8 ++
 net/core/rtnetlink.c                          | 16 ++--
 net/core/sock.c                               |  4 +
 net/ipv4/tcp_bbr.c                            |  2 +-
 net/ipv4/tcp_cubic.c                          |  4 +-
 net/ipv4/tcp_output.c                         |  2 +-
 net/ipv6/ip6_offload.c                        | 56 ++++++++++++-
 net/ipv6/ip6_output.c                         | 22 ++++-
 net/sctp/output.c                             |  3 +-
 tools/include/uapi/linux/if_link.h            |  2 +
 30 files changed, 291 insertions(+), 59 deletions(-)

--
2.36.0.512.ge40c2bad7a-goog



-- 
FQ World Domination pending: https://blog.cerowrt.org/post/state_of_fq_codel/
Dave Täht CEO, TekLibre, LLC