[tcpm] 793bis: question on TCP RTO timer

Geert Jan de Groot <GeertJan.deGroot@ymbk.nl> Fri, 27 August 2021 10:52 UTC

Return-Path: <GeertJan.deGroot@ymbk.nl>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 27C353A2DA5 for <tcpm@ietfa.amsl.com>; Fri, 27 Aug 2021 03:52:48 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.901
X-Spam-Level:
X-Spam-Status: No, score=-1.901 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id wMWr0KF12rHw for <tcpm@ietfa.amsl.com>; Fri, 27 Aug 2021 03:52:43 -0700 (PDT)
Received: from outbound.soverin.net (outbound.soverin.net [IPv6:2a01:4f8:fff0:2d:8::215]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 928E03A2DA9 for <tcpm@ietf.org>; Fri, 27 Aug 2021 03:52:43 -0700 (PDT)
Received: from smtp.freedom.nl (unknown [10.10.3.84]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) (No client certificate requested) by outbound.soverin.net (Postfix) with ESMTPS id C26E960FA8 for <tcpm@ietf.org>; Fri, 27 Aug 2021 10:52:34 +0000 (UTC)
Received: from smtp.freedom.nl (smtp.freedom.nl [116.202.127.71]) by soverin.net
To: tcpm@ietf.org
From: Geert Jan de Groot <GeertJan.deGroot@ymbk.nl>
Message-ID: <9caba130-cebc-3235-8eac-7ff08fa3809c@ymbk.nl>
Date: Fri, 27 Aug 2021 12:52:30 +0200
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/bgcy9gR98O2obie-dEfNXai-B4A>
Subject: [tcpm] 793bis: question on TCP RTO timer
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 27 Aug 2021 10:52:48 -0000

Hi folks,

I do apologise for bringing up this question this late in the game, but 
lab tests are not pretty..

The issue is on 793bis section 3.8.1: the RTO timer. The 793bis document 
correctly points to RFC1122, RFC2988, RFC6298.

RFC6298 (and 2988) describe:
    (2.4) Whenever RTO is computed, if it is less than 1 second, then the
          RTO SHOULD be rounded up to 1 second.

I now have discussions with an embedded OS manufacturer who clamps the 
RTO timer at 1 second, even if the network delay and SRTT is much 
smaller (say, two hosts connecting over a typical 1000BASE-T network). 
That means that if a single packet is lost, the TCP connection freezes 
for one second because it needs the RTO-timer to trigger which is one 
second, minimal, per requirement RFC6298/2.4 above.

Put like this, a connection pushing 1gbit/sec traffic freezes for one 
second if a single packet is dropped and the OS manufacturer claims this 
is correct. My feeling says it isn't - we measure SRTT for a reason, a 
packet dropped is simply a signal to lower bandwith (congestion) but not 
to freeze traffic for a second waiting for RTO to trigger. However, 
reading the draft and 6298/2988 I can't definitively fault the OS for 
their one second freeze.

Before sending this to the list, for fear of kicking up dust, I asked 
Wesley and he referred to RFC8961, which suggests to use the RTT, with 
exponential backoff, to estimate the RTO timer value without the one 
second minimal value of RFC6298.

I wonder what the list thinks. When the one-second rule was first 
mentioned, networks were significantly slower than they are today, and 
it only seems logical that timers based on traffic (with appropiate 
backoff measures) make more sense than a one second timer.

Clue appreciated,

Geert Jan