Re: [tcpm] increased back-off across different retransmissions

Yuchung Cheng <ycheng@google.com> Sat, 18 November 2017 23:56 UTC

Return-Path: <ycheng@google.com>
X-Original-To: tcpm@ietfa.amsl.com
Delivered-To: tcpm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 08ADD126C22 for <tcpm@ietfa.amsl.com>; Sat, 18 Nov 2017 15:56:52 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2
X-Spam-Level:
X-Spam-Status: No, score=-2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=google.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id nqPltYvbfmje for <tcpm@ietfa.amsl.com>; Sat, 18 Nov 2017 15:56:49 -0800 (PST)
Received: from mail-wr0-x229.google.com (mail-wr0-x229.google.com [IPv6:2a00:1450:400c:c0c::229]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 572F91201F2 for <tcpm@ietf.org>; Sat, 18 Nov 2017 15:56:49 -0800 (PST)
Received: by mail-wr0-x229.google.com with SMTP id 11so1692426wrb.6 for <tcpm@ietf.org>; Sat, 18 Nov 2017 15:56:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=LSyM+W+ThbOsenuo2NyWP/THpFxuebcuZxcLlFXxrWY=; b=ZO3SyADulbvAkGYVuFDpoZV+MpSq/v88iQOrr50QLpS+6QLpcl8Uy9ZTj1eg5F90LN eLlhdQUvsOHbyQ03fHeFqaRIZ/cndjRyP3aXmehe/Y2SfyccAQG5eiU2g+8IiW3q/4Rx PYbqbWnZc4JTbLQ0fzJZOdYx+y5OX/TThiq4heLEce7shG+DT6qa7AFirVg8A72mC3mX GvuVasELPSQ+aQg5SAEpybI9g542QKMkum+zK2ApykDFkXVTzuBb8F5bJsrI1CxDyNRv wUbx6R65OfARnMJEMS56xN4eEAmfwI2zSmdsQ+kyJppSIFLgnGj9VEygiT3gOVbpd65M BZoA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=LSyM+W+ThbOsenuo2NyWP/THpFxuebcuZxcLlFXxrWY=; b=Dg4rBhdFxbSpD2hk5U10S8iV63QaQIzbRLmYy5rAAVySdQ7q4oVH+ayOs+9VUqIGFZ MaVfrsfZebmImbtVIWoA9hl4jmg5+wUEt8f2mbGoqMgLoWhDl9InTJswoN7ZGdf4JYdp sTQMSZJg3QWgI3qQG/ZcMSk1c/WevokFWD+lBK2Q0EC05jX/y4YaRnX/cDEF0dDTDY8k /a0ICghUB+iMLoSjhr9djDZ4Zwpyndo2TvCOIQmufeFpgw/ntLDtwLncir+QZUX5/VGO x+uhsdPdRO15XOGQozEom3fihVQcQMtZd2lYNfkS+AnysAidUfMOZ9ZcGFlnTt1jq7B1 hgCg==
X-Gm-Message-State: AJaThX6zzSnR1f4tHtK9jjQumFbPQypfUAvdo7MOYmZ/lix+43F2aElH Yg/WFJp3PNB1Y711ePCUVMSftL68I5cxJR5N4eNWVA==
X-Google-Smtp-Source: AGs4zMa5cXUElEp33ZoCF+Jw2qs0gmkibTdpDQg3mVaaJ+AL1gCE6wPu7exjWKQiYH13TTCBkZ7xQ8kLPsAL+99zlhQ=
X-Received: by 10.223.136.253 with SMTP id g58mr8209560wrg.86.1511049407432; Sat, 18 Nov 2017 15:56:47 -0800 (PST)
MIME-Version: 1.0
Received: by 10.28.16.10 with HTTP; Sat, 18 Nov 2017 15:56:06 -0800 (PST)
In-Reply-To: <CAPxJK5BEHuxoH0eOkVUunkXSam-P2nUgrZ6Qi5Ncg9OAW9uQTw@mail.gmail.com>
References: <CAPxJK5BEHuxoH0eOkVUunkXSam-P2nUgrZ6Qi5Ncg9OAW9uQTw@mail.gmail.com>
From: Yuchung Cheng <ycheng@google.com>
Date: Sat, 18 Nov 2017 15:56:06 -0800
Message-ID: <CAK6E8=dLrMALMPyaqVDsNyRgXNgn004YP88WJxAnaHyGLRj97A@mail.gmail.com>
To: Marc <gaardiolor@gmail.com>
Cc: "tcpm@ietf.org Extensions" <tcpm@ietf.org>
Content-Type: multipart/alternative; boundary="001a11461c722b9f32055e4a9bab"
Archived-At: <https://mailarchive.ietf.org/arch/msg/tcpm/hXPr3IVh-hNkhbZhSWeWLdFZ8-I>
Subject: Re: [tcpm] increased back-off across different retransmissions
X-BeenThere: tcpm@ietf.org
X-Mailman-Version: 2.1.22
Precedence: list
List-Id: TCP Maintenance and Minor Extensions Working Group <tcpm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tcpm>, <mailto:tcpm-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tcpm/>
List-Post: <mailto:tcpm@ietf.org>
List-Help: <mailto:tcpm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tcpm>, <mailto:tcpm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 18 Nov 2017 23:56:52 -0000

On Fri, Nov 17, 2017 at 6:59 AM, Marc <gaardiolor@gmail.com> wrote:

> Hello,
>
> I'm reading RFC 6298, which mentions the TCP back-off algorithm. One
> detail is not very clear in this RFC, or maybe I'm missing it. Hope someone
> can point me in the right direction!
>
> In our network we have a link that's sometimes congested. When that's the
> case, sometimes we have a TCP connection that just has a streak of bad luck
> and loses multiple packets in a row. See attachment for an example, taken
> on a switch connected to the server. Unfortunately I'm not allowed to share
> the PCAP.. But I think / hope the tcpdump information contains everything
> needed (time, ip, port, win, seq, len, ack, sack, sle, sre..)
>
> Client: 1.1.1.1
> Server: 2.2.2.2
>
> So some packets from Server to Client are dropped. I've marked the packets
> below in attachment with P1 until P7.
>
> Packet 1: seq 135650, len 1348, dropped
> Packet 2: seq 136998, len 1348, received <-- so this one is not dropped
> Packet 3: seq 138346, len 1348, dropped
> Packet 4: seq 139694, len 81, dropped
> Packet 5: seq 139775, len 1348, dropped
> Packet 6: seq 141123, len 1348, dropped
> Packet 7: seq 142471, len 1221, dropped
>
> Then the retransmissions, marked in attachment with R1 until R5:
> - Retransmission 1: 0.5 sec after dropped 'Packet 1', packet with seq
> 135650, len 1348 is retransmitted.
> Ack <10ms. Now the ack # is 138346, higher than seq135650 + len1348 =
> 136998, the client is also ack'ing 'Packet 2'.
>
> - Retransmission 2: 1.1 seconds after Retransmission 1, packet with seq
> 138346, len 1348 is retransmitted.
> Ack <10ms, ack # 139694 (matches seq+len of Retransmission 2). This
> Retransmission covers dropped 'Packet 3'
>
The sender does not seem to implement TCP RFCs correctly. According to the
RFC5681, upon the first timeout the sender should reduce cwnd to 1 and
enter slow start. Upon receiving the ACK of rtx of P2, cwnd would be 2 so
the sender should have retransmitted P3 and P4 right away. The next round
trip would retransmit P5,P6,P7. This is called slow start after timeout and
it should take only 0.5s + 10ms * 3.

Regarding the exponential backoff, RFC6298 section 5.2 (end) states that
RTO is recomputed whenever a new RTT sample is acquired, hence clearing the
backoff. But in your case there shouldn't be backoff at all. Normally we'd
see exponentially backoffs if the first unacked packet after timeout is
lost again (and again), which indicates severe congestion.

HTH


>
> - Retransmission 3: 2.8 seconds after Retransmission 2, packet with seq
> 139694, len 1348 is retransmitted.
> Ack <10ms, ack # 141042 (matches seq+len of Retransmission 3). Note that
> this packet contains data from dropped 'Packet 4' and 'Packet 5'
>
> - Retransmission 4: 7.1 seconds after Retransmission 3, packet with seq
> 141042, len 1348 is retransmitted.
> Ack <10ms, ack # 142390 (matches seq+len of Retransmission 4). Note that
> this packet contains data from dropped 'Packet 5' and 'Packet 6'
>
> - Retransmission 5: 17.8 seconds after Retransmission 4, packet with seq
> 142390, len 1302 is retransmitted.
> Ack <10ms, ack # 143692 (matches seq+len of Retransmission 5). Note that
> this packet contains data from dropped 'Packet 6' and 'Packet 7'
>
> So the back-off is increased across _different_ retransmissions, despite
> the fact those retransmissions are ack'ed very quickly. Now I know there is
> a chance those acks are coming from the original tranmission so the server
> can't calculate a new RTT, but well.. I'm not sure if exponentially (~ x2.5
> in my case actually..) increasing the timer across different ack'ed
> retransmissions is the right strategy here. 10 lost packets (which isn't
> pretty, I know) in my case would mean 500 seconds of waiting.
>
> Am I right that the RFC (or Karn's algorithm) doesn't make a statement
> about this particular scenario ? In that case, should it be clarified ? Or
> is this such an obvious bad strategy that it's overkill to explicitly
> forbid it per RFC ?
>
> Thanks,
>
> Marc
>
> _______________________________________________
> tcpm mailing list
> tcpm@ietf.org
> https://www.ietf.org/mailman/listinfo/tcpm
>
>