Re: [tcpm] Comments on draft-ietf-tcpm-tcp-edo-13

"Iwashima, Kuniyuki" <kuniyu@amazon.co.jp> Tue, 22 August 2023 01:34 UTC

Return-Path: <prvs=591e1719c=kuniyu@amazon.co.jp>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 9C7BCC17EB44 for <tcpm@ietfa.amsl.com>; Mon, 21 Aug 2023 18:34:00 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -7.106
X-Spam-Level:
X-Spam-Status: No, score=-7.106 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_ZEN_BLOCKED_OPENDNS=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01, URIBL_DBL_BLOCKED_OPENDNS=0.001, URIBL_ZEN_BLOCKED_OPENDNS=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=amazon.co.jp
Received: from mail.ietf.org ([50.223.129.194]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id cgsR3I2Ei9Ap for <tcpm@ietfa.amsl.com>; Mon, 21 Aug 2023 18:33:56 -0700 (PDT)
Received: from smtp-fw-9102.amazon.com (smtp-fw-9102.amazon.com [207.171.184.29]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B59ABC169526 for <tcpm@ietf.org>; Mon, 21 Aug 2023 18:33:56 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.co.jp; i=@amazon.co.jp; q=dns/txt; s=amazon201209; t=1692668037; x=1724204037; h=from:to:cc:subject:date:message-id:content-id: content-transfer-encoding:mime-version; bh=SBbxH1moWA1kPQksJFkK3JEn+LXFBZAm7srnHp5zY7o=; b=msU8pmUmXCnS04rrgcN4cd3vqiMWocuNFY7szzFP0C2PuOe9Fu4kEXXD h7xQt2+ITQ0G5thQL8X26ny3TwF3fcePbEIw178Akbwp0kaxJn4gCM2/K U6uCr0UI5lkKWMWPv/UztEtv0bH0cBcpmhVnFnTYtKmYo1f6p37Gr+CwG I=;
X-IronPort-AV: E=Sophos;i="6.01,191,1684800000"; d="scan'208";a="358658617"
Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-pdx-2b-m6i4x-a893d89c.us-west-2.amazon.com) ([10.25.36.214]) by smtp-border-fw-9102.sea19.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Aug 2023 01:33:56 +0000
Received: from EX19MTAUWC002.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan3.pdx.amazon.com [10.236.137.198]) by email-inbound-relay-pdx-2b-m6i4x-a893d89c.us-west-2.amazon.com (Postfix) with ESMTPS id 972B440DD8; Tue, 22 Aug 2023 01:33:55 +0000 (UTC)
Received: from EX19D004ANA003.ant.amazon.com (10.37.240.184) by EX19MTAUWC002.ant.amazon.com (10.250.64.143) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.30; Tue, 22 Aug 2023 01:33:51 +0000
Received: from EX19D004ANA001.ant.amazon.com (10.37.240.138) by EX19D004ANA003.ant.amazon.com (10.37.240.184) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.1118.37; Tue, 22 Aug 2023 01:33:50 +0000
Received: from EX19D004ANA001.ant.amazon.com ([fe80::f099:cbca:cc6b:91ec]) by EX19D004ANA001.ant.amazon.com ([fe80::f099:cbca:cc6b:91ec%5]) with mapi id 15.02.1118.037; Tue, 22 Aug 2023 01:33:50 +0000
From: "Iwashima, Kuniyuki" <kuniyu@amazon.co.jp>
To: "touch@strayalpha.com" <touch@strayalpha.com>
CC: "tcpm@ietf.org Extensions" <tcpm@ietf.org>, "Nishida, Yoshi" <nyoshif@amazon.com>, "Iwashima, Kuniyuki" <kuniyu@amazon.co.jp>
Thread-Topic: [tcpm] Comments on draft-ietf-tcpm-tcp-edo-13
Thread-Index: AQHZ1Ji0QmyDJOizWkylh2nkt4AAHw==
Date: Tue, 22 Aug 2023 01:33:50 +0000
Message-ID: <B3DFD665-F0D5-49AF-9F46-87FDF498B9D7@amazon.co.jp>
Accept-Language: ja-JP, en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [10.135.203.70]
Content-Type: text/plain; charset="utf-8"
Content-ID: <148516AA9CB5C5498CAA315C01967613@amazon.com>
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/Fr5xjQBMZWREF8M8AbOi07-w66k>
Subject: Re: [tcpm] Comments on draft-ietf-tcpm-tcp-edo-13
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 22 Aug 2023 01:34:00 -0000

From: "touch@strayalpha.com" <touch@strayalpha.com>
Date: Fri, 18 Aug 2023 20:34:40 -0700
> Hi, Kuniyuki,
> 
> > On Aug 18, 2023, at 4:11 PM, Iwashima, Kuniyuki <kuniyu=40amazon.co.jp@dmarc.ietf.org> wrote:
> > 
> > From: "touch@strayalpha.com" <touch@strayalpha.com>
> > Date: Wed, 16 Aug 2023 18:01:19 -0700
> >> Hi, Kuniyui (if I read your email address correctly)
> >> 
> >>> On Aug 16, 2023, at 3:47 PM, Iwashima, Kuniyuki <kuniyu@amazon.co.jp> wrote:
> >>> 
> >>> Hello Joe,
> >>> 
> >>> I'm Kuniyuki from AWS and recently working on this topic with Yoshi.
> >>> I have prototyped Extended Data Offset in Linux based on net-next.git and now
> >>> extending packetdrill to test EDO.
> >>> 
> >>> I have two comments about EDO fallback mechanism and option processing.
> >>> 
> >>> Let's say a middlebox merges the final ACK of 3WHS and the following segment
> >>> (3rd & 4th packets in the script below).
> >> 
> >> At that point, you have a misbehaving middlebox. It’s bad enough that it modifies TCP packets in-flight (that’s not supported by TCP), but that’s especially true if an unrecognized option is seen. At that point, the middle box should be “hands off’ that flow” at a minimum.
> >> 
> >>> ---8<---
> >>> 0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
> >>> +0  setsockopt(3, SOL_TCP, TCP_EXT_DATA_OFFSET, [1], 4) = 0
> >>> +0  bind(3, ..., ...) = 0
> >>> +0  listen(3, 1) = 0
> >>> 
> >>> +0  < S 0:0(0) win 1000 <mss 1000,edoOK>
> >>> +0  > S. 0:0(0) ack 1 <edoOK, mss 1460>
> >>> +0  < . 1:1(0) ack 1 win 1000 <edo>
> >>> +0  < . 1:2(1) ack 1 win 1000 <edo>
> >>> ---8<---
> >>> 
> >>> If the client uses the longer variant or the middle box changes the header length, 
> >>> the server notices that the packet has an invalid EDO options.
> >>> So, the server falls back to non-EDO mode as mentioned in this part.
> >> 
> >> Here’s the flaw with even analyzing this:
> >> 
> >> 	TCP options are offered in a SYN by a client
> >> 		using EDO Supported, where the client goes to SYN_SENT
> >> 	TCP options are offered by by the server in the SYN+ACK
> >> 		using EDO Supported, where the server goes to SYN_RECEIVED
> >> 	TCP options are confirmed by the client to the server in the final ACK
> >> 		using EDO Extension where the client goes to ESTABLISHED
> >> 		and the client can now start sending data 
> >> 	TCP options are confirmed at the server upon receipt of the final ACK
> >> 		where the server goes to ESTABLISHED
> >> 
> >> Once in ESTABLISHED, packets without EDO Extension are considered invalid.
> >> 
> >> So here’s the case you describe above:
> >> 	TCP options are offered in a SYN by a client
> >> 		using EDO Supported, where the client goes to SYN_SENT
> >> 	TCP options are offered by by the server in the SYN+ACK
> >> 		using EDO Supported, where the server goes to SYN_RECEIVED
> >> 	TCP options are confirmed by the client to the server in the final ACK
> >> 		using EDO Extension where the client goes to ESTABLISHED
> >> 		and the client can now start sending data 
> >>>> (which it does - the 4th packet)
> >>>> final ACK and data packets get merged and the result has an invalid EDO Extension 
> >> see below **
> >> 	TCP options are  NOT confirmed at the server upon receipt of the final ACK
> >> 		so the server stays in SYN_RECEIVED
> >> 
> >> At this point, the connection will fail. The final ACK goes back to the server but will lack a valid EDO Extension.
> >> 
> >> So the client is in ESTABLISHED but the server stays in SYN_RECEVED. It will try to resend the previous ACK, so the server should resend the final ACK of the TWHS, which will NOT confirm receipt of the data the client sent.
> >> 
> >> At this point, no progress is made on any packets that the middlebox alters. So either one side times out or they both sit there and get no data through, at which point one side times out. At least that’s what I think so far…
> >> 
> >> Note that the server never “falls back” to non-EDO mode here. Malformed packets are silently ignored.
> >> 
> >>> ---8<---
> >>>>> The EDO Extension option MAY be used only if confirmed when the
> >>>  connection transitions to the ESTABLISHED state, e.g., a client is
> >>>  enabled after receiving the EDO Supported option in the SYN/ACK and
> >>>  the server is enabled after seeing the EDO Extension option in the
> >>>  final ACK of the three-way handshake. If either of those segments
> >>>  lacks the appropriate EDO option, the connection MUST NOT use any
> >>>  EDO options on any other segments.
> >>> ---8<---
> >>> 
> >>> If the server accepts the packet and 3WHS completes,
> >> 
> >> It doesn’t (see above).
> >> 
> >>> EDO is disabled at 
> >>> the server but enabled at the client.
> >> 
> >> That doesn’t happen; the server backs off only if the ACK gets through without EDO Extension, not if the option is there and malformed.
> > 
> > Thank you for explanation.
> > This is what I wanted to confirm and how my implementation behaves.
> > 
> >> 
> >>> The server will sends an ACK for the (originally 4th) segment without EDO, 
> >> 
> >> Won’t happen (see above).
> >> 
> >>> but the segment will be discarded at the client side.
> >>> In this case, the client cannot receive any ACK and data from the server.
> >>> 
> >>> So, I think the following parts MUST be applied to the final ACK of 3WHS as well.
> >>> It would be helpful to clarify that we MUST NOT complete 3WHS with an invalid EDO.
> >> 
> >> That’s possibly useful to explain, but it’s already in the requirements AFAICT. Please let me know if you see otherwise.
> > 
> > I read "lacks the appropriate EDO option" differently:
> > 
> >  (1) lacks EDO Extension in ACK of 3WHS
> >  (2) has an invalid EDO Extension in ACK of 3WHS
> 
> It really means:
> 1) lacks EDO Supported in SYN or SYN-ACK
> 2) lacks EDO Extension in all other segments
> 
> > In the situation above, the server side (SYN_RECV) "has not been
> > negotiated" EDO yet because of (2), and I read the following part
> > as "option MUST be ignored"
> 
> The server side has not yet successfully negotiated EDO because it is not in ESTABLISHED yet. So it MUST ignore EDO Established.
> 
> But the client is in ESTABLISHED, so it successfully negotiated EDO and MUST include EDO Extension in all segments.
> 
> > ---8<---
> >>> If EDO has not been negotiated and agreed, the EDO Extension
> >   option MUST be silently ignored on subsequent segments.
> > ---8<---
> > 
> > and was wondering if it conflicts with this part and which should be
> > prioritised.
> > 
> > ---8<---
> >>> ... When the
> >   EDO Header Length is invalid, the TCP segment MUST be silently
> >   dropped.
> > ---8<---
> 
> If a device is in “ignore EDO Extension”, it ignores it. It doesn’t attempt to check whether EDO Header Length is valid.
> 
> > If the server ignores the option only (this should not happen though),
> > the server can process the ACK and transitions to ESTABLISHED state.
> 
> Yes. So to correct the steps above (see >> **), it’s 
> 
> >> So here’s the case you describe above:
> >> 	TCP options are offered in a SYN by a client
> >> 		using EDO Supported, where the client goes to SYN_SENT
> >> 	TCP options are offered by by the server in the SYN+ACK
> >> 		using EDO Supported, where the server goes to SYN_RECEIVED
> >> 	TCP options are confirmed by the client to the server in the final ACK
> >> 		using EDO Extension where the client goes to ESTABLISHED
> >> 		and the client can now start sending data 
> >>>> (which it does - the 4th packet)
> >>>> final ACK and data packets get merged and the result has an invalid EDO Extension 
> 
> 
> 	So at this point, the server gets an ACK with a malformed EDO Extension.
> 
> Here’s why:
>    >> The EDO Extension option indicates the total length of the
>    header. The EDO Header_length field MUST NOT exceed that of the
>    total segment size (i.e., TCP Length).
> When an option is malformed, it is basically “not present”.

Yes, basically it is except for the ACK of 3WHS, otherwise we
could fall back.

We should fall back to non-EDO mode only if the ACK does not have
EDO-Extension.


> So if this happens, I would assume the entire packet would be dropped. Malformed options should always cause TCP to drop the entire segment.

For EDO, I agree.


> 
> Or this could happen, which states the reaction explicitly:
>    >> The EDO Header Length MUST be at least as large as the TCP Data
>    Offset field of the segment in which they both appear. When the EDO
>    Header Length equals the Data Offset length, the EDO Extension
>    option is present but it does not extend the option space. When the
>    EDO Header Length is invalid, the TCP segment MUST be silently
>    dropped.
> Either way, the server should NOT transition to ESTABLISHED. It should silently drop the ACK.
> 
> At this point, one of two things happens:
> 	a) the client resends the ACK and it gets through
> 		at this point, the client and server both think EDO is active
> 		but the same thing happens every time either path experiences a middlebox that messes up the EDO option
> 			both sides REQUIRE the option
> 			if a packet goes through with messed up EDO Extension, it gets silently ignored
> 			if not, it works fine
> 
> 	b) the client resends the ACK and data again and they get merged
> 		at this point, the server again silently ignores the packet because of a malformed option.
> So we could update the first quoted paragraph above to be more explicit about the silent drop,

That sounds better to me because...

> but IMO TCP already assumes that all malformed options cause silent drop of the entire packet.

... at least Linux does not drop the segment even if the option is
malformed (e.g. length is invalid).  tcp_parse_options() returns void
and just ignores the malformed option.

https://github.com/torvalds/linux/blob/master/net/ipv4/tcp_input.c#L4037-L4164



> 
> > Then, the server did not agree EDO and "MUST NOT use any EDO options
> > on any other segments", leading to non-EDO fallback.
> > 
> > ---8<---
> > The EDO Extension option MAY be used only if confirmed when the
> > connection transitions to the ESTABLISHED state
> > ---8<---
> > 
> > 
> >> 
> >>> 
> >>> ---8<---
> >>>>> The EDO Header Length MUST be at least as large as the TCP Data
> >>>  Offset field of the segment in which they both appear. When the EDO
> >>>  Header Length equals the Data Offset length, the EDO Extension
> >>>  option is present but it does not extend the option space. When the
> >>>  EDO Header Length is invalid, the TCP segment MUST be silently
> >>>  dropped.
> >>> ...
> >>>>> When an endpoint receives a segment using the 6-byte EDO
> >>>  Extension option, it MUST validate the Segment_Length field with the
> >>>  length of the segment as indicated in the TCP pseudoheader. If the
> >>>  segment lengths do not match, the segment MUST be discarded and an
> >>>  error SHOULD be logged in a rate-limited manner.
> >>> ---8<---
> >>> 
> >>> Also, after that, client may send out another packet for retransmission and 
> >>> newly queued data.
> >> 
> >> I don’t see that happening, as per above.
> >> 
> >>> On the server side, EDO is disabled, but the packet will be processed ignoring
> >>> only EDO option.
> >>> Then, options after 60 bytes would be recv()ed accidentally by user.
> >>> 
> >>> ---8<---
> >>>>> If EDO has not been negotiated and agreed, the EDO Extension
> >>>  option MUST be silently ignored on subsequent segments.
> >>> ---8<---
> >>> 
> >>> I think this behaviour is not intended.
> >>> So, I think it's better to make it clear that what MUST be ignored should be
> >>> the whole segment, instead of EDO Extension option.
> >> 
> >> 
> >> We did say that elsewhere (the MUST NOT in the first quoted text above), but yes, this one should have said to drop the segment, not just the option.
> >> 
> >> Was that the only change?
> > 
> > Another change I added in my kernel is like:
> > 
> >  EDO Extension MUST NOT be used more than once in a segment.
> 
> Yes; that’s generally true of all options except NOP (are there any exceptions?? I haven’t done an exhaustive search).
> 
> > In the current draft, we can use multiple EDO Extension if all of them
> > meet these conditions:
> > 
> >  (1) The EDO Extension option MUST occur within the space indicated by
> >      the TCP Data Offset.
> > 
> >  (2) The EDO Header Length MUST be at least as large as the TCP Data
> >      Offset field of the segment in which they both appear.
> > 
> > For example, we can craft a header like this.
> > 
> >  doff : 15 (60 bytes)
> >  EDO1 : 61
> >  EDO2 : 62
> >  EDO3 : 63
> >  ...
> >  <-- 60 bytes -->
> > 
> > This just slows down parsing and could be exploited for DoS.
> > 
> > Also, an EDO could have length less than other preceding EDO, and this
> > is confusing.
> > 
> >  EDO1 : 80
> >  EDO2 : 65
> >  EDO3 : 70
> > 
> > I do not see a good reason to support these use cases.  So, to make EDO
> > handling simpler and safer, I think we MUST restrict this kind of EDO 
> > Extension use.
> 
> Yes - so we do need to add explicit “no more than once” for EDO Extension.
> 
> Do we need to explicitly add “drop entire segment if malformed?” - it would only apply to EDO-capable endpoints, but in the example here, it would make it more clear that the server never proceeds to ESTABLISHED.

I think it is worth adding not to confuse future readers.

And yes, it's only applied to EDO-capable endpoints.

On an endpoint that is not aware of EDO, EDO Extension will be just
ignored.  It's sender's bad to send EDO Extension to such an endpoint.

On an EDO-capable endpoint, when EDO should be ignored depends on the state.

  LISTEN  : EDO Extension in SYN     <-- should be ignored
  SYN_SENT: EDO Extension in SYN+ACK <-- should be ignored
  SYN_RECV: EDO Extension in ACK     <-- must be checked

On the server side, SYN_RECV socket is EDO-capable but could be non-EDO
endpoint depending on the following ACK.  At this point, the socket is
not non-EDO-capable endpoint.

  TCP_ESTABLISHED
    1) EDO mode    : EDO Extension must be checked
    2) non EDO mode: EDO Extension should be ignored

If it falls back to non-EDO, EDO Extension in the following segments
should be ignored as if the endpoint is non-EDO-capable endpoint.