[tcpm] Comments on draft-ietf-tcpm-tcp-edo-13
"Iwashima, Kuniyuki" <kuniyu@amazon.co.jp> Wed, 10 July 2024 01:39 UTC
Return-Path: <prvs=91489875d=kuniyu@amazon.co.jp>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F352FC15106C for <tcpm@ietfa.amsl.com>; Tue, 9 Jul 2024 18:39:08 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.249
X-Spam-Level:
X-Spam-Status: No, score=-2.249 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.148, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_NONE=0.001, T_SCC_BODY_TEXT_LINE=-0.01, UNPARSEABLE_RELAY=0.001, URIBL_BLOCKED=0.001, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=amazon.co.jp
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id lqMmoChO07lI for <tcpm@ietfa.amsl.com>; Tue, 9 Jul 2024 18:39:05 -0700 (PDT)
Received: from smtp-fw-52002.amazon.com (smtp-fw-52002.amazon.com [52.119.213.150]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DC4CBC14CE55 for <tcpm@ietf.org>; Tue, 9 Jul 2024 18:39:04 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1720575545; x=1752111545; h=from:to:cc:subject:date:message-id:content-id: content-transfer-encoding:mime-version; bh=5TVSMdhCcDnCXy1lspeY+DG+TuiBht/7I1ZM05dGv9I=; b=l6OwI9NayYsElJO2hTthTwjrK+GD+3Zy8Ca1HbL8ViJnXzbcQ8TK5VYu 35qEZ+SFDyPHDy2E6nu78gx96C/vI7TmjIkdrwbHgXDpaqe8t4r+DAcfs U8l1JPK+j5h53Xp8gXczOAdXTDWEnjMpAqvwukzCJkhWD5edPi2qrScZ4 k=;
X-IronPort-AV: E=Sophos;i="6.09,196,1716249600"; d="scan'208";a="644822959"
Received: from iad12-co-svc-p1-lb1-vlan3.amazon.com (HELO smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev) ([10.43.8.6]) by smtp-border-fw-52002.iad7.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 Jul 2024 01:39:04 +0000
Received: from EX19MTAUWC001.ant.amazon.com [10.0.21.151:9561] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.59.52:2525] with esmtp (Farcaster) id 23256a36-415e-499c-82e8-94c581e73d33; Wed, 10 Jul 2024 01:39:02 +0000 (UTC)
X-Farcaster-Flow-ID: 23256a36-415e-499c-82e8-94c581e73d33
Received: from EX19D004ANA004.ant.amazon.com (10.37.240.146) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 10 Jul 2024 01:39:02 +0000
Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19D004ANA004.ant.amazon.com (10.37.240.146) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1258.34; Wed, 10 Jul 2024 01:39:01 +0000
Received: from EX19D004ANA001.ant.amazon.com ([fe80::f099:cbca:cc6b:91ec]) by EX19D004ANA001.ant.amazon.com ([fe80::f099:cbca:cc6b:91ec%5]) with mapi id 15.02.1258.034; Wed, 10 Jul 2024 01:39:01 +0000
From: "Iwashima, Kuniyuki" <kuniyu@amazon.co.jp>
To: "touch@strayalpha.com" <touch@strayalpha.com>
Thread-Topic: [tcpm] Comments on draft-ietf-tcpm-tcp-edo-13
Thread-Index: AQHa0mnwUqWs6Wjs4kmf7Z8xGFNXzA==
Date: Wed, 10 Jul 2024 01:39:01 +0000
Message-ID: <78EC7C86-74C9-4660-80A0-4DBE7A2080FD@amazon.co.jp>
Accept-Language: ja-JP, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.119.149.200]
Content-Type: text/plain; charset="utf-8"
Content-ID: <D48C78F68B7A204F8EDCB362EAD346CA@amazon.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Message-ID-Hash: DKJPHFUNR4FMWNCS5N4JWHJOMQRXD7YX
X-Message-ID-Hash: DKJPHFUNR4FMWNCS5N4JWHJOMQRXD7YX
X-MailFrom: prvs=91489875d=kuniyu@amazon.co.jp
X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-tcpm.ietf.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; digests; suspicious-header
CC: "tcpm@ietf.org Extensions" <tcpm@ietf.org>
X-Mailman-Version: 3.3.9rc4
Precedence: list
Subject: [tcpm] Comments on draft-ietf-tcpm-tcp-edo-13
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/a67_gGtjAAesMzQa-YpppMCCPGw>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Owner: <mailto:tcpm-owner@ietf.org>
List-Post: <mailto:tcpm@ietf.org>
List-Subscribe: <mailto:tcpm-join@ietf.org>
List-Unsubscribe: <mailto:tcpm-leave@ietf.org>
Hi Joe,
I wanted to send summary of feedback to EDO draft and related works after IETF 118, but it somehow slipped on my mind.
Sorry for the long delay. (and thanks for the heads-up, Yoshi).
## Suggestion
I have three suggestions about EDO fallback, EDO Extension limit, and logging.
1) EDO fallback
In the sentence below,
>> The EDO Extension option MAY be used only if confirmed when the
connection transitions to the ESTABLISHED state, e.g., a client is
enabled after receiving the EDO Supported option in the SYN/ACK and
the server is enabled after seeing the EDO Extension option in the
-> the server is enabled after seeing the _valid _ EDO Extension option in the <--
final ACK of the three-way handshake. If either of those segments
lacks the appropriate EDO option, the connection MUST NOT use any
EDO options on any other segments.
I'd suggest adding "valid" like above to make it clear that the following "lacks the appropriate EDO" means
i) ACK of 3WHS lacks EDO Extension option
ii) ACK of 3WHS has invalid/malformed EDO Extension
and these segments should be dropped instead of triggering non-EDO fallback.
2) EDO Extension limit
As I told before, I think it would be better to clarify that EDO Extension MUST NOT be included more than once in a segment.
If TCP header includes multiple same options, Linux uses the latest one and ignores the other preceding ones.
However, EDO Extension requires copying data into the linear buffer, the number of such an operation should be limited within the single segment to avoid DoS.
3) Logging
In the following sentence, the 2nd MUST should be changed to SHOULD.
>> If EDO has been negotiated, any subsequent segments arriving
without the EDO Extension option MUST be silently ignored. Such
events MAY be logged as warning errors and logging MUST be rate
-> events MAY be logged as warning errors and logging SHOULD be rate <-
limited.
Because there are two other places that mentions that EDO SHOULD log some events, not MUST.
>> When an endpoint receives a segment using the 6-byte EDO
Extension option, it MUST validate the Segment_Length field with the
length of the segment as indicated in the TCP pseudoheader. If the
segment lengths do not match, the segment MUST be discarded and an
error SHOULD be logged in a rate-limited manner.
>> Due to the potential impacts of legacy middleboxes (discussed in
Section 7), a TCP implementation supporting EDO SHOULD log any
events within an EDO connection when options that are malformed or
show other evidence of tampering arrive.
That's all of my suggestions.
In my implementation, EDO options are parsed as follows:
- SYN segments
- EDO Supported (length == 2) is parsed
- other EDO option (length != 2) is ignored (option is skipped)
- non-SYN segments
- if not negotiated, all EDO options are ignored (option is skipped)
- if negotiated,
- EDO Supported or EDO option with invalid length (!= 4/6) is ignored (option is skipped)
- segment with multiple EDO Extension is dropped
- segment with invalid header/segment length is dropped
- EDO Extension is parsed
If this behaviour is correct, I think I read the draft as intended and have no other comments.
## Implementation
Here's the EDO implementation of Linux kernel, tcpdump, and packetdrill.
- https://github.com/q2ven/linux/tree/edo
- https://github.com/nsdyoshi/tcpdump/tree/tcp-edo-option
- https://github.com/q2ven/packetdrill/tree/edo
The basic uAPI can be checked in this presentation.
https://www.youtube.com/watch?si=bbBuPaAE5y__3IZW&t=2575&v=Na1FYAjLBeU&feature=youtu.be
Note that the constant TCP_EXT_DATA_OFFSET for setsockopt() is 44.
The packetdrill repo has some scripts to test all possible EDO fallback scenarios, and SNMP stats can be used to check the negotiation result.
---8<---
# ./packetdrill ../tcp/edo/edo-fallback-server-syn.pkt
# nstat | grep EDO
TcpExtTCPEDOFallbackSYN 1 0.0
# ./packetdrill ../tcp/edo/edo-fallback-client-synack.pkt
# nstat | grep EDO
TcpExtTCPEDOFallbackSYNACK 1 0.0
# ./packetdrill ../tcp/edo/edo-fallback-server-ack.pkt
# nstat | grep EDO
TcpExtTCPEDOFallbackACK 1 0.0
# ./packetdrill ../tcp/edo/edo-fallback-cross-syn.pkt
# nstat | grep EDO
TcpExtTCPEDOFallbackSYN 1 0.0
# ./packetdrill ../tcp/edo/edo-fallback-cross-synack.pkt
# nstat | grep EDO
TcpExtTCPEDOFallbackSYNACK 1 0.0
# ./packetdrill ../tcp/edo/edo-sack.pkt
# nstat | grep EDO
TcpExtTCPEDOSuccess 1 0.0
---8<---
Another script shows that more than 60 bytes header can be sent with EDO Extension.
In this example, 4 SACK blocks are sent with 8 bytes EDO Extension (including ExID 0x0ed0).
---8<---
# ./packetdrill ../tcp/edo/edo-sack.pkt -v
...
connect syscall: 1720572792.643631
outbound sniffed packet: 0.000044 S 2116065219:2116065219(0) win 65535 <edoOK,mss 1440,sackOK,TS val 3401991024 ecr 0,nop,wscale 8>
inbound injected packet: 0.000101 S. 0:0(0) ack 2116065220 win 65535 <nop,nop,edoOK,mss 9000,sackOK>
outbound sniffed packet: 0.000114 . 2116065220:2116065220(0) ack 1 win 65535 <edo 28 28>
inbound injected packet: 0.000128 . 12:13(1) ack 2116065220 win 65535 <edo 28 29>
outbound sniffed packet: 0.000133 . 2116065220:2116065220(0) ack 1 win 65535 <edo 40 40,nop,nop,sack 12:13>
inbound injected packet: 0.000147 . 10:11(1) ack 2116065220 win 65535 <edo 28 29>
outbound sniffed packet: 0.000154 . 2116065220:2116065220(0) ack 1 win 65535 <edo 48 48,nop,nop,sack 10:11 12:13>
inbound injected packet: 0.000168 . 8:9(1) ack 2116065220 win 65535 <edo 28 29>
outbound sniffed packet: 0.000172 . 2116065220:2116065220(0) ack 1 win 65535 <edo 56 56,nop,nop,sack 8:9 10:11 12:13>
inbound injected packet: 0.000185 . 6:7(1) ack 2116065220 win 65535 <edo 28 29>
outbound sniffed packet: 0.000299 . 2116065220:2116065220(0) ack 1 win 65535 <edo 64 64,nop,nop,sack 6:7 8:9 10:11 12:13>
---8<---
Also, with MPTCP, we can see the effect of EDO Extension easily. Note that GRO must be turned off.
---8<---
Client:
$ sudo ethtool -K enp39s0 gro off
$ sudo sysctl net.ipv4.tcp_ext_data_offset=2
$ mptcpize run iperf3 -c 10.0.0.36 -t 3
Server:
$ sudo ethtool -K enp39s0 gro off
$ sudo sysctl net.ipv4.tcp_ext_data_offset=2
$ mptcpize run iperf3 -s
tcpdump :
IP 10.0.0.241.59560 > 10.0.0.36.5201: Flags [.], seq 84863089:84872006, ack 1, win 491, options [exp-edo hlen 64 seglen 8981 ,nop,nop,TS val 3197826953 ecr 2719828622,mptcp 22 dss ack 3678162956 seq 2861367221292019959 subseq 2596157316 len 8917,nop,nop], length 8917
---8<---
Moreover, there is a debug sysctl knob to include NOPs after EDO.
---8<---
$ sudo sysctl net.ipv4.tcp_nops=128
tcpdump:
IP 10.0.0.36.5201 > 10.0.0.241.44092: Flags [.], ack 47501509, win 41719, options [exp-edo hlen 180 seglen 180 ,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,nop,TS val 2719945746 ecr 3197944077,mptcp 12 dss ack 8813009640328282684], length 0
---8<---
Please let me know if you find weird/non-compliant behaviour.
I hope this help make things forward.
Best regards,
Kuniyuki
- [tcpm] Comments on draft-ietf-tcpm-tcp-edo-13 Iwashima, Kuniyuki
- [tcpm] Re: Comments on draft-ietf-tcpm-tcp-edo-13 touch@strayalpha.com
- [tcpm] Re: Comments on draft-ietf-tcpm-tcp-edo-13 Iwashima, Kuniyuki
- [tcpm] Re: Comments on draft-ietf-tcpm-tcp-edo-13 touch@strayalpha.com
- [tcpm] Re: Comments on draft-ietf-tcpm-tcp-edo-13 Iwashima, Kuniyuki