[PANRG] New ECN section for

Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com> Sun, 14 February 2021 20:02 UTC

Return-Path: <spencerdawkins.ietf@gmail.com>
X-Original-To: panrg@ietfa.amsl.com
Delivered-To: panrg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 46ED83A093D for <panrg@ietfa.amsl.com>; Sun, 14 Feb 2021 12:02:41 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -0.198
X-Spam-Level:
X-Spam-Status: No, score=-0.198 tagged_above=-999 required=5 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 4YeqZkmjgTFI for <panrg@ietfa.amsl.com>; Sun, 14 Feb 2021 12:02:39 -0800 (PST)
Received: from mail-yb1-xb30.google.com (mail-yb1-xb30.google.com [IPv6:2607:f8b0:4864:20::b30]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0C15C3A0935 for <panrg@irtf.org>; Sun, 14 Feb 2021 12:02:39 -0800 (PST)
Received: by mail-yb1-xb30.google.com with SMTP id m9so3074561ybk.8 for <panrg@irtf.org>; Sun, 14 Feb 2021 12:02:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:from:date:message-id:subject:to:cc; bh=jaYuVb6fd94/lJlfpYUzJSvTublYACKNrrdW2oR0scI=; b=IlacdOWXGsB29LomNxAr9YLKzE8DnTlSX5fiwZqFO3CbxNNR1IrORgQja/zvvKX6nd nyPc15HfmJZRh6F0RehFcVOPUasQwdlYNQvA9OYtVWcpPeqQEzrboVawI3IzSJuLGMwQ mdifXEwrVmqrClMxVwfk8n+XXGvd4XD6QO1QflGDtk7qly0MNV37lN+yxrv/jH36QMz4 dO0gI5k6Ib+PsM1csRPeddEjNwD0j2OylsgPs9tF/Mu+RHJ2vwNgxNcUlOn06zJrMl1K AIVJXZ5oUeXOK7FPjFYt1Wuzhn0xEw6bjMeQKEgOGURNAZYtkbEHK2me6SZuNrgqZ1++ IUtg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:from:date:message-id:subject:to:cc; bh=jaYuVb6fd94/lJlfpYUzJSvTublYACKNrrdW2oR0scI=; b=hF9W5f8sohxDJ8t7dKKxFzpJjA1MB/8zXPVuuzJ2on1wq5TiKAjOcV0a6QPzH3VKYA CEi6TGEngpGcKz5FjRk1Rm6t/lxv022IciBkkLgAJht/b6jcSU9FBQPXFpEfZdiB3EfG jpXewu/Udv9nPkZxmYDetfndnen/bXo3BcQhE0uxN3at9vu/4Rm4edefVviMssUEvH9Q xBWBVL9IZFrqikzFooIzg2b/fku2msOt+RGGKajrIMN9rZ7/uNsmUHDhH9LqBcO9T4nT OhckN26Sp3dn6J9PsFnxfiFAM+xBmXDoOlJ427k52JiTLi/kguYm3U8s82UHNpt1NLks eopA==
X-Gm-Message-State: AOAM530zhPXbX1YKUgXxovbMGtDzmcktjcsflxAf8BLPM85/eGoggqDs SL23VnyMo28XKNyy4kO3bw1aJM9Y0iqQ5dgIMLHAGolbZtE=
X-Google-Smtp-Source: ABdhPJxdNwBVmk/aNfgYAkmZJ9rNrvl0dWG3nnaF3CyjokoTqTXbNyTELS6Q8U80p6rtPUfWGOsjdNHkVDk5/YKMhyY=
X-Received: by 2002:a25:782:: with SMTP id 124mr18161616ybh.53.1613332957896; Sun, 14 Feb 2021 12:02:37 -0800 (PST)
MIME-Version: 1.0
From: Spencer Dawkins at IETF <spencerdawkins.ietf@gmail.com>
Date: Sun, 14 Feb 2021 14:02:12 -0600
Message-ID: <CAKKJt-fUSg8ro1YGm2bRWQXf862RPbEZjzmUHwb+RMcce9YFmw@mail.gmail.com>
To: panrg@irtf.org
Cc: Martin Duke <martin.h.duke@gmail.com>, Colin Perkins <csp@csperkins.org>
Content-Type: multipart/alternative; boundary="000000000000dc21cf05bb5159a9"
Archived-At: <https://mailarchive.ietf.org/arch/msg/panrg/kjoRxcVrOmHwr_lnBj_CSHiLskw>
Subject: [PANRG] New ECN section for
X-BeenThere: panrg@irtf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Path Aware Networking \(Proposed\) Research Group discussion list" <panrg.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/panrg>, <mailto:panrg-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/panrg/>
List-Post: <mailto:panrg@irtf.org>
List-Help: <mailto:panrg-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/panrg>, <mailto:panrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Feb 2021 20:02:41 -0000

Dear PANRG.

I have https://github.com/panrg/draft-dawkins-panrg-what-not-to-do/pull/6
in Github, to add this material, as we've talked about on the mailing list.

I would love to have comments from anyone familiar with the history of ECN,
or its current practice.

If you don't Github (there may be a few), here's what's new:

I added this material to Section 4,

4.13.  Ability to Recover From Missteps

   If early implementers discover problems with a new feature, that
   feature is likely to be disabled, and convincing implementers to re-
   enable that feature can be very difficult, and can require years or
   decades.  (See Section 6.9).

I asserted that "One Chance" was invariant, in Table 1, as

    +-----------------------------------------------------+-----------+
    | One Chance to Achieve Deployment (Section 4.13)     | Invariant |
    +-----------------------------------------------------+-----------+

I added this material to Section 6.

6.9.  Explicit Congestion Notification (ECN)

   The suggested references for Explicit Congestion Notification (ECN)
   are:

   *  Recommendations on Queue Management and Congestion Avoidance in
      the Internet [RFC2309]

   *  A Proposal to add Explicit Congestion Notification (ECN) to IP
      [RFC2481]

   *  The Addition of Explicit Congestion Notification (ECN) to IP
      [RFC3168]

   *  Implementation Report on Experiences with Various TCP RFCs
      [vista-impl], slides 6 and 7

   *  Implementation and Deployment of ECN [SallyFloyd]

   In the early 1990s, the large majority of Internet traffic used TCP
   as its transport protocol, but TCP had no way to detect path
   congestion before the path was so congested that packets were being
   dropped, and these congestion events could affect all senders using a
   path, either by "lockout", where long-lived flows monopolized the
   queues along a path, or by "full queues", where queues remain full,
   or almost full, for a long period of time.

   In response to this situation, "Active Queue Management" (AQM) was
   deployed in the network.  A number of AQM disciplines have been
   deployed, but one common approach was that routers dropped packets
   when a threshold buffer length was reached, so that transport
   protocols like TCP that were responsive to loss would detect this
   loss and reduce their sending rates.  Random Early Detection (RED)
   was one such proposal in the IETF.  As the name suggests, a router
   using RED as its AQM discipline that detected time-averaged queue
   lengths passing a threshold would choose incoming packets
   probablistically to be dropped [RFC2309].

   Researchers suggested that providing "explicit congestion
   notifications" to senders when routers along the path detected their
   queues were building, giving them an indication that router queues
   along the path were building, so that some senders would "slow down"
   as if a loss had occurred, so that the path queues had time to drain,
   and the path still had sufficient buffer capacity to accommodate
   bursty arrivals of packets from other senders.  This was proposed as
   an Experiment in [RFC2481], and standardized in [RFC3168].

   A key aspect of ECN was the use of IP header fields rather than IP
   options to carry explicit congestion notifications, since the
   proponents recognized that

      Many routers process the "regular" headers in IP packets more
      efficiently than they process the header information in IP
      options.

6.9.1.  Reasons for Non-deployment

   The proponents of ECN did so much right, anticipating many of the
   Lessons Learned now recognized in Section 4.  They recognized the
   need to support incremental deployment (Section 4.2).  They
   considered the impact on router throughput (Section 4.8).  They even
   considered trust issues between end nodes and the network, both for
   non-compliant end nodes (Section 4.10) and non-compliant routers
   (Section 4.9).

   They were rewarded with ECN being implemented in major operating
   systems, both for end nodes and for routers.  A number of
   implementations are listed under "Implementation and Deployment of
   ECN" at [SallyFloyd].

   What they did not anticipate, was routers that would crash, when they
   saw bits 6 and 7 in the IPv4 TOS octet [RFC0791]/IPv6 Traffic Class
   field [RFC2460], which [RFC2481] redefined to be "currently unused",
   being set to a non-zero value.

   As described in [vista-impl],

      Intermediate Gateway Device problem #1: one of the most popular
      versions from one of the most popular vendors.  When a data packet
      arrives with either ECT(0) or ECT(1) (indicating successful ECN
      capability negotiation) indicated, router crashed.  Cannot be
      recovered at TCP layer

   This implementation, which would be run on a significant percentage
   of Internet end nodes, was shipped with ECN disabled, as was true for
   several of the other implementations listed under "Implementation and
   Deployment of ECN" at [SallyFloyd].  Even if subsequent router
   vendors fixed these implementations, ECN was still disabled on end
   nodes, and given the tradeoff between the benefits of enabling ECN
   (somewhat better behavior during congestion) and the risks of
   enabling ECN (possibly crashing a router somewhere along the path),
   ECN tended to stay disabled on implementations that supported ECN for
   decades afterwards.

6.9.2.  Lessons Learned

   Of the contributions included in Section 6, ECN may be unique in
   providing these lessons:

   *  Even if you do everything right, you may trip over implementation
      bugs in devices you know nothing about, that will cause severe
      problems that prevent successful deployment of your path aware
      technology.

   *  After implementations disable your path aware technology, it may
      take years, or even decades, to convince implementers to re-enable
      it by default.

   These two lessons, taken together, could be summarized as "you get
   one chance to get it right".

Best,

Spencer