Re: [quicwg/base-drafts] ECN verification text (#2752)

Martin Thomson <notifications@github.com> Wed, 22 May 2019 17:33 UTC

Return-Path: <noreply@github.com>
X-Original-To: quic-issues@ietfa.amsl.com
Delivered-To: quic-issues@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 328351201CA for <quic-issues@ietfa.amsl.com>; Wed, 22 May 2019 10:33:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.01
X-Spam-Level:
X-Spam-Status: No, score=-3.01 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, MAILING_LIST_MULTI=-1, SPF_PASS=-0.001, T_DKIMWL_WL_HIGH=-0.01] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=github.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id xpXOZ48EfV18 for <quic-issues@ietfa.amsl.com>; Wed, 22 May 2019 10:32:58 -0700 (PDT)
Received: from out-20.smtp.github.com (out-20.smtp.github.com [192.30.252.203]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 12517120172 for <quic-issues@ietf.org>; Wed, 22 May 2019 10:32:58 -0700 (PDT)
Date: Wed, 22 May 2019 10:32:56 -0700
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=github.com; s=pf2014; t=1558546376; bh=APAW2NEOYCgMYZFv6YbZonpTuzCW3eZHQckEqiIIblo=; h=Date:From:Reply-To:To:Cc:In-Reply-To:References:Subject:List-ID: List-Archive:List-Post:List-Unsubscribe:From; b=SNVjW79yFR2TD6YSPtAnJRtZ0EYEqCUbWpXgdZwU4HWmXUKSAdK9TlDGPX+eNnEub DFzjZpMep13xN4PakFeOtTC6paiADjKBmobIgzO76JVCgMZnsUpEZYOJEHG5xu0V5d OQJp4MQ/bjfqAKStyJpJFGXZD4JIKCvEMWxWjyHc=
From: Martin Thomson <notifications@github.com>
Reply-To: quicwg/base-drafts <reply+AFTOJKZPBTU4M6H26O67FXV26K5EREVBNHHBVKABIA@reply.github.com>
To: quicwg/base-drafts <base-drafts@noreply.github.com>
Cc: Subscribed <subscribed@noreply.github.com>
Message-ID: <quicwg/base-drafts/pull/2752/review/240766617@github.com>
In-Reply-To: <quicwg/base-drafts/pull/2752@github.com>
References: <quicwg/base-drafts/pull/2752@github.com>
Subject: Re: [quicwg/base-drafts] ECN verification text (#2752)
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="--==_mimepart_5ce587c8a771a_5b323ff8a82cd964128984"; charset=UTF-8
Content-Transfer-Encoding: 7bit
Precedence: list
X-GitHub-Sender: martinthomson
X-GitHub-Recipient: quic-issues
X-GitHub-Reason: subscribed
X-Auto-Response-Suppress: All
X-GitHub-Recipient-Address: quic-issues@ietf.org
Archived-At: <https://mailarchive.ietf.org/arch/msg/quic-issues/dzoCaDp7QZWVcwcywnjo7FY66Rg>
X-BeenThere: quic-issues@ietf.org
X-Mailman-Version: 2.1.29
List-Id: Notification list for GitHub issues related to the QUIC WG <quic-issues.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/quic-issues>, <mailto:quic-issues-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/quic-issues/>
List-Post: <mailto:quic-issues@ietf.org>
List-Help: <mailto:quic-issues-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/quic-issues>, <mailto:quic-issues-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 22 May 2019 17:33:01 -0000

martinthomson commented on this pull request.



> -experiencing congestion.
+It is possible for faulty network devices on path to corrupt or erroneously drop
+all IP packets with ECN Capable Transport (ECT) codepoints set.  To provide
+robust connectivity in the presence of such devices, each endpoint independently
+verifies and enables use of ECN.  Even if not setting ECT codepoints on packets
+it transmits, the endpoint SHOULD provide feedback about ECN markings received
+if they are accessible.
+
+To verify both that a path supports ECN and that the peer can provide ECN
+feedback, an endpoint initially sets the ECT(0) codepoint in the IP header of
+all outgoing packets {{!RFC8311}}.
+
+If all packets with ECT codepoints are dropped by a network device, endpoints
+can detect loss of the packets and MAY cease setting ECT codepoints in
+subsequent packets.  As one concrete strategy, an endpoint could disable setting
+of ECT codepoints if Initial packets with ECT codepoints set are deemed lost.

I think that it is going to need to be all.  But that leads to the question: do you let the handshake time out? Do you interleave ECT-marked and unmarked packets? Or do you start with ECT and switch the markings off after a few tries (and how many tries)?

Also, this needs to be all marked packets sent on a given path.  It is not just Initial packets, but also Handshake or 0-RTT and maybe those with short headers after a migration.

>  If verification fails, then the endpoint ceases setting ECT codepoints in
 subsequent IP packets with the expectation that either the network path or the
 peer does not support ECN.
 
-If an endpoint sets ECT codepoints on outgoing IP packets and encounters a
-retransmission timeout due to the absence of acknowledgments from the peer (see
-{{QUIC-RECOVERY}}), or if an endpoint has reason to believe that an element on
-the network path might be corrupting ECN codepoints, the endpoint MAY cease
-setting ECT codepoints in subsequent packets.  Doing so allows the connection to
-be resilient to network elements that corrupt ECN codepoints in the IP header or
-drop packets with ECT or CE codepoints in the IP header.
+If an ECT codepoint set is not dropped or corrupted by a network device, then a

This paragraph doesn't really seem to say anything now.  You have deleted the timeout thing.  So it's not clear how you decide that a path *doesn't* support ECN now.

The point of having a timeout is to detect a black hole.  One nice thing about QUIC is that the idle timeout can be longer than any ECN validation timeout, so - at the risk of suggesting a design - maybe we could recommend that endpoints send the first packets it sends on any path marked, then they might stop marking until ECN support is confirmed.  We don't need to set a fixed rule, but it probably makes sense to recommend something.  The first N packets (where N of 10 produces pretty high confidence) or one round trip (which will be estimated based on the previous path, but that is OK) might work.  If you don't get a positive result, which might happen if the first few packets are lost for other reasons, we'll disable ECN more than is ideal, but we can allow the test to be retried.

(Maybe you could add this above the bullet list above, where I can't comment:
"A path is verified if ECT markings are either forwarded without change or the network changes the marking to Congestion Experience (ECN-CE)."
)

>  If verification fails, then the endpoint ceases setting ECT codepoints in
 subsequent IP packets with the expectation that either the network path or the
 peer does not support ECN.
 
-If an endpoint sets ECT codepoints on outgoing IP packets and encounters a
-retransmission timeout due to the absence of acknowledgments from the peer (see
-{{QUIC-RECOVERY}}), or if an endpoint has reason to believe that an element on
-the network path might be corrupting ECN codepoints, the endpoint MAY cease
-setting ECT codepoints in subsequent packets.  Doing so allows the connection to
-be resilient to network elements that corrupt ECN codepoints in the IP header or
-drop packets with ECT or CE codepoints in the IP header.
+If an ECT codepoint set is not dropped or corrupted by a network device, then a
+received packet contains either the codepoint sent by the peer or the Congestion
+Experienced (CE) codepoint set by a network device that is experiencing
+congestion.
+
+Upon successful verification, an endpoint continues to set ECT codepoints in
+subsequent packets with the expectation that the path is ECN-capable.

This is not a permanent condition.  If only one packet was marked, you might consider the path ECN-capable based on an acknowledgement of that packet.  However, that could be the result of random corruption.

I think that we can just say "until validation of ECN counts fails".  So we have the following algorithm:

For each path:

0. set ECN state to "testing"
1. after N packets or M round trips, set ECN state to "unknown"
2. on every ACK, check ECN counts
3. if ECN validation fails set ECN state to "failed".
4. if ECN validation passes and ECN state is "uncertain", set it to "capable".
5. when sending packets, mark with ECT(0) if state is "testing" or "capable"

(I know that this is unchanged text, but I just realized that we could be clearer.)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/quicwg/base-drafts/pull/2752#pullrequestreview-240766617