[tcpm] increased back-off across different retransmissions

Marc <gaardiolor@gmail.com> Fri, 17 November 2017 14:59 UTC

Return-Path: <gaardiolor@gmail.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 528C3126B72 for <tcpm@ietfa.amsl.com>; Fri, 17 Nov 2017 06:59:19 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.999
X-Spam-Level:
X-Spam-Status: No, score=-1.999 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id pia9mcxtPpox for <tcpm@ietfa.amsl.com>; Fri, 17 Nov 2017 06:59:14 -0800 (PST)
Received: from mail-qt0-x231.google.com (mail-qt0-x231.google.com [IPv6:2607:f8b0:400d:c0d::231]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id E4AF3126C2F for <tcpm@ietf.org>; Fri, 17 Nov 2017 06:59:13 -0800 (PST)
Received: by mail-qt0-x231.google.com with SMTP id r58so6711676qtc.0 for <tcpm@ietf.org>; Fri, 17 Nov 2017 06:59:13 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to; bh=XS9JZ5t4vT8NZtZ8nOajFl3J05YUVXB8oiEr/DINY94=; b=nT3jeDQRY+crhzLmAXb5U5uppET8XMvHwt9h6Qh/b2PoScFxuajtRhVTK8CtiNXZPg U7vsTpgKHFmad6OcyAtYsN4mOZQ0Vn0zV8cpQdV/3iDwQxqzUhCff0RVSaadyebfTFch svCJtLw/VaaTIa6ywDvwoFGDM+hHuS8ZhlYtDrInk5zgOieyzXFQsCO2iORW2+cyopez avP/SukbInoG5e9piYuUR0/TDnUR6c4NuvnmyFjJN2wvyGxvHv7R0nbXS5AbP1RKqloU uXP/K8xmZIDVTppHizuSOUlae3znPTlhht5UyMkhs+bLPDQrpazlC36pADpr+7kwKneg HOMQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=XS9JZ5t4vT8NZtZ8nOajFl3J05YUVXB8oiEr/DINY94=; b=dYmC1ed/zxNGJEU/0cu1gsGGHxF+JaLmMagqBNNwPESqcQix7LNyh35gjaTsh8+69W Fw7MSkJp/zON7XvuH0SfYjiMvNKxla9AZrtPC3cAryPzFLg77Tf9jQ0O31GnfHvQmgyV MdJRHc9lEZYBPjBtsvJqgBsR/FJdF+ywQ51tZfgxfOZODnY1GmmB4kO9ii14cQdG5dMl B2I4xvf/A4/6yOymGSv3pJ41q9Uo7C49WHoW1E2hAW+nJmX4GXvCMvUMi0qibZveTp6H svbxxTqzCDGyDLs4/23w3vcTEZpixHVxPwQsqkJovel3EoOGnjPvaj0qhheXWxMdFZWL hPaA==
X-Gm-Message-State: AJaThX4j504tu3xOWtYcHJzHR3pbbbNrVw6y6SAeOOtIGxw4Jnob1FhJ H39gi3PWteXf/LwMMv5gCXZLqGmu142DhR9JEMU=
X-Google-Smtp-Source: AGs4zMZr3TJS2XVgckY3A+ISCOtHbKvgXjHYs9Ht1nHOWUHL5ibcpQuIi2wCsp0AptZrxXYjce5Qq8ejRGMuUWrfopw=
X-Received: by 10.55.121.2 with SMTP id u2mr8253976qkc.19.1510930752921; Fri, 17 Nov 2017 06:59:12 -0800 (PST)
MIME-Version: 1.0
Received: by 10.237.37.25 with HTTP; Fri, 17 Nov 2017 06:59:12 -0800 (PST)
From: Marc <gaardiolor@gmail.com>
Date: Fri, 17 Nov 2017 15:59:12 +0100
Message-ID: <CAPxJK5BEHuxoH0eOkVUunkXSam-P2nUgrZ6Qi5Ncg9OAW9uQTw@mail.gmail.com>
To: tcpm@ietf.org
Content-Type: multipart/mixed; boundary="94eb2c062734cefe59055e2efa7b"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/I0vgQ4GMI92Xd74o_yAFNpLEinA>
Subject: [tcpm] increased back-off across different retransmissions
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Nov 2017 21:45:13 -0000

Hello,

I'm reading RFC 6298, which mentions the TCP back-off algorithm. One detail
is not very clear in this RFC, or maybe I'm missing it. Hope someone can
point me in the right direction!

In our network we have a link that's sometimes congested. When that's the
case, sometimes we have a TCP connection that just has a streak of bad luck
and loses multiple packets in a row. See attachment for an example, taken
on a switch connected to the server. Unfortunately I'm not allowed to share
the PCAP.. But I think / hope the tcpdump information contains everything
needed (time, ip, port, win, seq, len, ack, sack, sle, sre..)

Client: 1.1.1.1
Server: 2.2.2.2

So some packets from Server to Client are dropped. I've marked the packets
below in attachment with P1 until P7.

Packet 1: seq 135650, len 1348, dropped
Packet 2: seq 136998, len 1348, received <-- so this one is not dropped
Packet 3: seq 138346, len 1348, dropped
Packet 4: seq 139694, len 81, dropped
Packet 5: seq 139775, len 1348, dropped
Packet 6: seq 141123, len 1348, dropped
Packet 7: seq 142471, len 1221, dropped

Then the retransmissions, marked in attachment with R1 until R5:
- Retransmission 1: 0.5 sec after dropped 'Packet 1', packet with seq
135650, len 1348 is retransmitted.
Ack <10ms. Now the ack # is 138346, higher than seq135650 + len1348 =
136998, the client is also ack'ing 'Packet 2'.

- Retransmission 2: 1.1 seconds after Retransmission 1, packet with seq
138346, len 1348 is retransmitted.
Ack <10ms, ack # 139694 (matches seq+len of Retransmission 2). This
Retransmission covers dropped 'Packet 3'

- Retransmission 3: 2.8 seconds after Retransmission 2, packet with seq
139694, len 1348 is retransmitted.
Ack <10ms, ack # 141042 (matches seq+len of Retransmission 3). Note that
this packet contains data from dropped 'Packet 4' and 'Packet 5'

- Retransmission 4: 7.1 seconds after Retransmission 3, packet with seq
141042, len 1348 is retransmitted.
Ack <10ms, ack # 142390 (matches seq+len of Retransmission 4). Note that
this packet contains data from dropped 'Packet 5' and 'Packet 6'

- Retransmission 5: 17.8 seconds after Retransmission 4, packet with seq
142390, len 1302 is retransmitted.
Ack <10ms, ack # 143692 (matches seq+len of Retransmission 5). Note that
this packet contains data from dropped 'Packet 6' and 'Packet 7'

So the back-off is increased across _different_ retransmissions, despite
the fact those retransmissions are ack'ed very quickly. Now I know there is
a chance those acks are coming from the original tranmission so the server
can't calculate a new RTT, but well.. I'm not sure if exponentially (~ x2.5
in my case actually..) increasing the timer across different ack'ed
retransmissions is the right strategy here. 10 lost packets (which isn't
pretty, I know) in my case would mean 500 seconds of waiting.

Am I right that the RFC (or Karn's algorithm) doesn't make a statement
about this particular scenario ? In that case, should it be clarified ? Or
is this such an obvious bad strategy that it's overkill to explicitly
forbid it per RFC ?

Thanks,

Marc