[tcpm] FW: Draft-ietf-lwig-tcp-constrained-node-networks-04

"Scharf, Michael" <Michael.Scharf@hs-esslingen.de> Wed, 13 March 2019 09:44 UTC

Return-Path: <Michael.Scharf@hs-esslingen.de>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E1B0F130EC6 for <tcpm@ietfa.amsl.com>; Wed, 13 Mar 2019 02:44:29 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=hs-esslingen.de
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h8_gvfPx2bbL for <tcpm@ietfa.amsl.com>; Wed, 13 Mar 2019 02:44:28 -0700 (PDT)
Received: from mail.hs-esslingen.de (mail.hs-esslingen.de [134.108.32.78]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id B76FE130EBF for <tcpm@ietf.org>; Wed, 13 Mar 2019 02:44:27 -0700 (PDT)
Received: from localhost (localhost.localdomain [127.0.0.1]) by mail.hs-esslingen.de (Postfix) with ESMTP id 02B3825A19; Wed, 13 Mar 2019 10:44:26 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=hs-esslingen.de; s=mail; t=1552470266; bh=bBuzXQ531SWE5YdhdTVS9wmSQHPpiiBd5epvd1L+utw=; h=From:To:CC:Subject:Date:From; b=BFUizd40l3ffqISn1jn3qI5vwp53lIBSY5SY1SJLnNSjWPioFgBh3DFR/JzJoevdL G7Zo+sP1fVxdqvOnRP5GQd7z9TmJ87NlNyL68iDxvMoKQc6IMP1ZEjQtaWUM1Ps6hi 6llcdmdJbRcSPfJrmxkz/Q7lr3Ou30N0xQBTzuAg=
X-Virus-Scanned: by amavisd-new-2.7.1 (20120429) (Debian) at hs-esslingen.de
Received: from mail.hs-esslingen.de ([127.0.0.1]) by localhost (hs-esslingen.de [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 5eEg_4dkqo-b; Wed, 13 Mar 2019 10:44:25 +0100 (CET)
Received: from rznt8102.rznt.rzdir.fht-esslingen.de (rznt8102.rznt.rzdir.fht-esslingen.de [134.108.29.102]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-SHA384 (256/256 bits)) (No client certificate requested) by mail.hs-esslingen.de (Postfix) with ESMTPS; Wed, 13 Mar 2019 10:44:25 +0100 (CET)
Received: from RZNT8114.rznt.rzdir.fht-esslingen.de ([169.254.3.183]) by rznt8102.rznt.rzdir.fht-esslingen.de ([fe80::f977:d5e6:6b09:56ac%10]) with mapi id 14.03.0415.000; Wed, 13 Mar 2019 10:44:24 +0100
From: "Scharf, Michael" <Michael.Scharf@hs-esslingen.de>
To: "tcpm@ietf.org" <tcpm@ietf.org>
CC: "cheshire@apple.com" <cheshire@apple.com>, Carles Gomez <carlesgo@entel.upc.edu>, Jon Crowcroft <jon.crowcroft@cl.cam.ac.uk>, Ted Lemon <elemon@apple.com>, Vividh Siddha <vividh@apple.com>, Vincent Lubet <vlubet@apple.com>, Christoph Paasch <cpaasch@apple.com>
Thread-Topic: Draft-ietf-lwig-tcp-constrained-node-networks-04
Thread-Index: AQHU2TX1/fKUvmMCbU+1+9M5UQ7ZpaYJT4Lw
Date: Wed, 13 Mar 2019 09:44:24 +0000
Message-ID: <6EC6417807D9754DA64F3087E2E2E03E2D259C73@rznt8114.rznt.rzdir.fht-esslingen.de>
Accept-Language: de-DE, en-US
Content-Language: de-DE
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [134.108.29.249]
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
MIME-Version: 1.0
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/af1lXhcCeU79LkKVqnWu7mrxZkk>
Subject: [tcpm] FW: Draft-ietf-lwig-tcp-constrained-node-networks-04
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 13 Mar 2019 09:44:30 -0000

Forwarded to the TCPM list (with Stuart's permission)

Thanks!

Michael


-----Original Message-----
From: cheshire@apple.com [mailto:cheshire@apple.com] 
Sent: Wednesday, March 13, 2019 1:45 AM
To: Carles Gomez; Jon Crowcroft; Scharf, Michael
Cc: Ted Lemon; Vividh Siddha; Vincent Lubet; Christoph Paasch
Subject: Draft-ietf-lwig-tcp-constrained-node-networks-04

Dear authors,

I read your document draft-ietf-lwig-tcp-constrained-node-networks-04 last weekend.

Thank you for writing this. It is an excellent resource. I am constantly battling in the Zigbee and Thread communities about why for many jobs TCP is the right protocol, and we have been pushing hard for more Thread device vendors to make TCP available in their SDKs. Having this document to recommend to people is extremely helpful.

Some feedback:

   A TCP implementation needs to support options 0, 1 and 2 [RFC0793].

Since your primary audience is people uninformed about TCP, it would be good to explain what this means, instead of making them hunt in an ancient document.

Perhaps say: “A TCP implementation needs to support, at a minimum, TCP options 2, 1 and 0. These are, respectively, the Maximum Segment Size (MSS) option, the No-Operation option, and the End Of Option List marker [RFC0793]. None of these are a substantial burden to support. A TCP implementation is permitted to silently ignore all other TCP options.”

   An IPv6 datagram size exceeding 1280 bytes can be avoided by setting
   the TCP MSS not larger than 1220 bytes.  (Note: IP version 6 is assumed.)

This advice assumes the remote peer sends no TCP options. Perhaps say: “This assumes that the remote sender will use no TCP options, aside from possibly the MSS option, which is only used in the initial TCP SYN packet.”

Note that some platforms will unconditionally include TCP timestamps, which add 12 bytes to the TCP header. Maybe to accommodate this, the constrained device should advertise an MSS of 1200 bytes, to leave 20 bytes for any (unrequested) TCP options?

   If in such conditions the peer
   device is administered by the same entity managing the constrained
   device, it is recommended to disable delayed ACKs at the peer side.

I disagree that it is good advice to recommend disabling delayed ACKs, especially on an already-constrained network, where wasteful traffic makes a bad situation worse.

Attached below is some text I wrote for draft-ietf-dnsop-session-signal (DNS Stateful Operations) on the subject of delayed ACKs.

The “split hack” is also bad because it sends two packets instead of one; more wasteful traffic that makes a bad situation worse.

Perhaps we need to define a mechanism that allows a stack with a single-MSS transmit buffer to avoid the delayed ACK penalty? Perhaps a new TCP “ack now please” option? Perhaps some other creative crafting of the packet (imaginative use of SACK, DSACK, invalid ACK number, etc.) that will cause the remote peer to bypass its usual delayed ACK timer?

My colleague Christoph Paasch suggested one idea for how to do this: if a segment is sent marked ECT, and subsequent segments are sent *not* marked ECT, then Linux will ACK those non-ECT segments immediately because Linux assumes they must be retransmissions. This, or something like it, could be a good hack for single-MSS constrained senders.

Stuart Cheshire

--

From draft-ietf-dnsop-session-signal (DNS Stateful Operations):

9.5.  TCP Delayed Acknowledgement Considerations

   Most modern implementations of the Transmission Control Protocol
   (TCP) include a feature called "Delayed Acknowledgement" [RFC1122].

   Without this feature, TCP can be very wasteful on the network.  For
   illustration, consider a simple example like remote login using a
   very simple TCP implementation that lacks delayed acks.  When the
   user types a keystroke, a data packet is sent.  When the data packet
   arrives at the server, the simple TCP implementation sends an
   immediate acknowledgement.  Mere milliseconds later, the server
   process reads the one byte of keystroke data, and consequently the
   simple TCP implementation sends an immediate window update.  Mere
   milliseconds later, the server process generates the character echo
   and sends this data back in reply.  The simple TCP implementation
   then sends this data packet immediately too.  In this case, this
   simple TCP implementation sends a burst of three packets almost
   instantaneously (ack, window update, data).

   Clearly it would be more efficient if the TCP implementation were to
   combine the three separate packets into one, and this is what the
   delayed ack feature enables.

   With delayed ack, the TCP implementation waits after receiving a data
   packet, typically for 200 ms, and then sends its ack if (a) more data
   packet(s) arrive, (b) the receiving process generates some reply
   data, or (c) 200 ms elapse without either of the above occurring.

   With delayed ack, remote login becomes much more efficient,
   generating just one packet instead of three for each character echo.

   The logic of delayed ack is that the 200 ms delay cannot do any
   significant harm.  If something at the other end were waiting for
   something, then the receiving process should generate the reply that
   the thing at the other end is waiting for, and TCP will then
   immediately send that reply (combined with the ack and window
   update).  And if the receiving process does not in fact generate any
   reply for this particular message, then by definition the thing at
   the other end cannot be waiting for anything.  Therefore, the 200 ms
   delay is harmless.

   This assumption may be true unless the sender is using Nagle's
   algorithm, a similar efficiency feature, created to protect the
   network from poorly written client software that performs many rapid
   small writes in succession.  Nagle's algorithm allows these small
   writes to be coalesced into larger, less wasteful packets.

   Unfortunately, Nagle's algorithm and delayed ack, two valuable
   efficiency features, can interact badly with each other when used
   together [NagleDA].

   DSO request messages elicit responses; DSO unidirectional messages
   and DSO response messages do not.

   For DSO request messages, which do elicit responses, Nagle's
   algorithm and delayed ack work as intended.

   For DSO messages that do not elicit responses, the delayed ack
   mechanism causes the ack to be delayed by 200 ms.  The 200 ms delay
   on the ack can in turn cause Nagle's algorithm to prevent the sender
   from sending any more data for 200 ms until the awaited ack arrives.
   On an enterprise Gigabit Ethernet (GigE) backbone with sub-
   millisecond round-trip times, a 200 ms delay is enormous in
   comparison.

   When this issues is raised, there are two solutions that are often
   offered, neither of them ideal:

   1.  Disable delayed ack.  For DSO messages that elicit no response,
       removing delayed ack avoids the needless 200 ms delay and sends
       back an immediate ack that tells Nagle's algorithm that it should
       immediately grant the sender permission to send its next packet.
       Unfortunately, for DSO messages that *do* elicit a response,
       removing delayed ack removes the efficiency gains of combining
       acks with data, and the responder will now send two or three
       packets instead of one.

   2.  Disable Nagle's algorithm.  When acks are delayed by the delayed
       ack algorithm, removing Nagle's algorithm prevents the sender
       from being blocked from sending its next small packet
       immediately.  Unfortunately, on a network with a higher round-
       trip time, removing Nagle's algorithm removes the efficiency
       gains of combining multiple small packets into fewer larger ones,
       with the goal of limiting the number of small packets in flight
       at any one time.

   The problem here is that with DSO messages that elicit no response,
   the TCP implementation is stuck waiting, unsure if a response is
   about to be generated or whether the TCP implementation should go
   ahead and send an ack and window update.

   The solution is networking APIs that allow the receiver to inform the
   TCP implementation that a received message has been read, processed,
   and no response for this message will be generated.  TCP can then
   stop waiting for a response that will never come, and immediately go
   ahead and send an ack and window update.

   For implementations of DSO, disabling delayed ack is NOT RECOMMENDED
   because of the harm this can do to the network.

   For implementations of DSO, disabling Nagle's algorithm is NOT
   RECOMMENDED because of the harm this can do to the network.

   At the time that this document is being prepared for publication, it
   is known that at least one TCP implementation provides the ability
   for the recipient of a TCP message to signal that it is not going to
   send a response, and hence the delayed ack mechanism can stop
   waiting.  Implementations on operating systems where this feature is
   available SHOULD make use of it.

--