SUGGESTED ARQ TEXT Advice for Internet Subnetwork Designers ID
Dr G Fairhurst <gorry@erg.abdn.ac.uk> Tue, 24 April 2001 12:51 UTC
Message-ID: <3AE576D2.24827CD@erg.abdn.ac.uk>
Date: Tue, 24 Apr 2001 13:51:30 +0100
From: Dr G Fairhurst <gorry@erg.abdn.ac.uk>
Reply-To: gorry@erg.abdn.ac.uk
Organization: erg.abdn.ac.uk
X-Mailer: Mozilla 4.75 (Macintosh; U; PPC)
X-Accept-Language: en
MIME-Version: 1.0
To: pilc@grc.nasa.gov
Subject: SUGGESTED ARQ TEXT Advice for Internet Subnetwork Designers ID
References: <0703A3E1D430D411866100508BDFCF3328B89A@CTOEXCH1>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: owner-pilc@lerc.nasa.gov
Precedence: bulk
Status: RO
Content-Length: 8500
Lines: 170
Some people have asked me on progress with clarification of the ARQ text included in the current link ID. Following the IETF meeting in Minneapolis, the following co-authors suggest a replacement of the ARQ text in the link draft. This is based on the text in the last issued link ID. We hope this new text clarifies the debate on ARQ persistency at IETF-49 and IETF-50. We also intend it to be consistent with the remainder of the link draft and with the related ARQ draft (draft-ietf-pilc-link-arq-issues-XX.txt). The replacement text below has not changed since 12th April 2001. Gorry Fairhurst Lloyd Wood Reiner Ludwig In the coming weeks, a revision of the ARQ draft, based on feedback received - and any further comments - will be issued to give more detailed discussions of the ARQ issues (draft-ietf-pilc-link-arq-issues-XX.txt). -------- TCP vs Link Layer Retransmission Error recovery generally involves the retransmission of lost or corrupted data when explicitly or implicitly requested by the receiver. It can also involve the generation and transmission of redundant information that lets the receiver regenerate or correct some amount of lost or corrupted data without needing explicit retransmission of that amount. The retransmission approach, widely known as "ARQ" (Automatic ReQuest repeat) for largely historical reasons, is found in many computer networking protocols. The redundant information approach, using error control coding (of which Forward Error Correction, or FEC, is a well-known example) takes place in the data-link layer, very close to the physical layer. Many link layers use a combination of both coding and ARQ retransmissions to improve performance. Depending on the layer where it is implemented, error control can operate on an end-to-end basis or over a shorter span such as a single link. TCP is the most important example of an end-to-end protocol that uses an ARQ strategy. A large number of link layer protocols use ARQ, most often some flavor of HDLC [ISO3309]. Examples include the X.25 link layer, the AX.25 protocol used in amateur packet radio, 802.11 wireless LANs, and the reliable link layer specified in IEEE 802.2. As explained in the introduction, only end-to-end error recovery can ensure a reliable service to the application. But some subnetworks (e.g., many wireless links) also require link layer error recovery as a performance enhancement. For example, many cellular links have small physical frame sizes (< 100 bytes) and relatively high frame loss rates. Relying entirely on end-to-end error recovery clearly yields a performance degradation, as retransmissions across the end-to-end path take much longer to be received than when link-local retransmissions are used. Thus, link-layer error recovery can often increase end-to-end performance. As a result, link-layer and end-to-end recovery often co-exist; this can lead to the possibility of inefficient interactions between the two layers of ARQ protocols. This inter-layer "competition" might lead to the following wasteful situation. When the link layer retransmits a packet, the link latency momentarily increases. Since TCP bases its retransmission timeout on prior measurements of end-to-end latency, including that of the link in question, this sudden increase in latency may trigger an unnecessary retransmission by TCP of a packet that the link layer is still retransmitting. Such spurious end-to-end retransmissions generate unnecessary load and reduce end-to-end throughput. One may even have multiple copies of the same packet in the same link queue at the same time. In general, one could say the competing error recovery is caused by an inner control loop (link layer error recovery) reacting to the same signal as an outer control loop (end- to-end error recovery) without any coordination between both loops. Note that this is an efficiency issue, TCP continues to provide reliable end-to-end delivery over such links. This raises the question of how persistent a link layer sender should be in performing retransmission. We define as link layer (LL) ARQ persistency the maximum time that a particular link will spend trying to transfer a packet before it can be discarded. This deliberately simplified definition says nothing about maximum number of retransmissions, retransmission strategies, queue sizes, queuing disciplines, transmission delays, or the like. The reason we use the term LL ARQ persistency instead of a term such as 'maximum link layer packet holding time' is that the definition closely relates to link layer error recovery. For example, on links that implement straightforward error recovery strategies, LL ARQ persistency will often correspond to a maximum number of retransmissions permitted per link layer frame [ARQ-DRAFT]. For link layers that do not or cannot differentiate between flows (e.g., due to network layer encryption), the LL ARQ persistency should be small. This avoids any harmful effects or performance degradation resulting from indiscriminate high persistence. A detailed discussion of these issues is provided in [ARQ-DRAFT]. However, when a link layer is able to identify separate flows [ARQ-DRAFT] and isolate the effects of ARQ on different flows sharing the same link (or all flows observe common patterns of loss (e.g. an outage). The link ARQ persistency for a flow should be high for a flow using reliable unicast transport protocols (e.g., TCP) and must be low for all other flows. Setting the link ARQ persistency larger than the largest link outage, would allow TCP to rapidly restore transmission without the need to wait for a retransmission time out, generally improving TCP performance in the face of transient outages. However, excessively high persistence may be disadvantageous (a practical upper limit of 30-60 seconds may be desirable). Implementation of such schemes remains a research issue. (See also Section "Recovery from Subnetwork Outages"). Recovery from Subnetwork Outages Some types of subnetworks, particularly mobile radio, are subject to frequent but temporary outages. For example, an active cellular data user may drive or walk into an area (such as a tunnel) that is out of range of any base station. No packets will be successfully delivered until the user returns to an area with coverage. The Internet protocols currently provide no standard way for a subnetwork to explicitly notify an upper layer protocol (e.g., TCP) that it is experiencing an outage, as distinguished from severe congestion. Under these circumstances TCP will, after each unsuccessful retransmission, wait even longer before trying again; this is its "exponential backoff" algorithm. And since there is also currently no way for a subnetwork to explicitly notify TCP when it is again operational, TCP will not discover this until its next retransmission attempt. If TCP has backed off, this may take some time. This can lead to extremely poor TCP performance over such subnetworks. It is therefore highly desirable that a subnetwork subject to outages not silently discard packets during an outage. Ideally, it should define an interface to the next higher layer (i.e., IP) that allows it to refuse packets during an outage, and to automatically ask IP for new packets when it is again able to deliver them. If it cannot do this, then the subnetwork should hold onto at least some of the packets it accepts during an outage and attempt to deliver them when the subnetwork comes back up. Note that it is *not* necessary to avoid any and all packet drops during an outage. The purpose of holding onto a packet during an outage, either in the subnetwork or at the IP layer, is so that its eventual delivery will implicitly notify TCP that the subnetwork is again operational. This is to enhance performance, not to ensure reliability -- a task that as discussed earlier can only be done properly on an end-to-end basis. Only a single packet per TCP connection need be held in this way to generate a TCP ack to cause the TCP sender to recover from the additional losses once the flow resumes. Because it would be a layering violation (and possibly a performance hit) for IP or a subnetwork to look at the TCP headers of the packets it carries (which would in any event be impossible if IPSEC encryption is in use), it would be reasonable for the IP or subnetwork layers to choose, as a design parameter, some small number of packets that it will retain during an outage.
- pilc minutes from IETF-50 aaron
- SUGGESTED ARQ TEXT Advice for Internet Subnetwork… Dr G Fairhurst
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Phil Karn
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Lloyd Wood
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Nitin H Vaidya
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Lloyd Wood
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Nitin H Vaidya
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Gorry Fairhurst
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Phil Karn
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Lloyd Wood
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Phil Karn
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Nitin H Vaidya
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Lloyd Wood
- Re: SUGGESTED ARQ TEXT Advice for Internet Subnet… Phil Karn