Re: Comments to "Advice for Internet Subnetwork Designers"

Reiner Ludwig <Reiner.Ludwig@ericsson.com> Thu, 23 November 2000 15:17 UTC

Message-Id: <5.0.1.4.0.20001123152149.0244ab10@chapelle.ericsson.se>
X-Sender: eedrel@chapelle.ericsson.se
X-Mailer: QUALCOMM Windows Eudora Version 5.0.1
Date: Thu, 23 Nov 2000 16:17:19 +0100
To: karn@qualcomm.com
From: Reiner Ludwig <Reiner.Ludwig@ericsson.com>
Subject: Re: Comments to "Advice for Internet Subnetwork Designers"
Cc: pilc@grc.nasa.gov
In-Reply-To: <200011230045.QAA17520@patty.ka9q.net>
References: <4.3.2.7.0.20000801115412.00b847e0@chapelle.ericsson.se> < <5E5172B4DE05D311B3AB0008C75DA941019EF683@edeacnt100.eed.ericsson.se> <5E5172B4DE05D311B3AB0008C75DA941019EF683@edeacnt100.eed.ericsson.se> <4.3.2.7.0.20000801115412.00b847e0@chapelle.ericsson.se>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Sender: owner-pilc@lerc.nasa.gov
Precedence: bulk
Status: RO
Content-Length: 7893
Lines: 180

At 01:45 23.11.00, Phil Karn wrote:
>Going back to some mail of yours from early August, I've been giving
>your comments a lot of thought, and I have to admit I'm moving closer
>to your point of view on some things.

I'm glad to hear this. Ideally, we can get our views converged on this 
matter so that we can get this section "Reliability and Error Control" 
straight in the "Advice for Internet Subnetwork Designers".

Maybe, we can make this a topic at the PILC meeting in San Diego. I would 
be happy to present my view ... if the WG chairs think that that's appropriate.


> >>What you really want to avoid are the nasty interactions between TCP
> >>and link/physical layer retransmissions that can occur in certain link
> >>operating regions.
>
> >People often talk about those interactions as if there where so many of
> >those. What are those interactions you are talking about?
>
> >I only know of two such interactions:
> >(1) spurious timeouts that lead to a go-back-N retransmisison mode in 
> TCP, and
>
>A compliant TCP never does go-back-N; it only retransmits the oldest
>unacked segment.  But even this could be a spurious retransmission if the 
>earlier
>delay was caused by a link layer retransmission.

Please, check out section 3.1 in this paper:
http://www.acm.org/sigcomm/ccr/archive/2000/jan00/ccr-200001-ludwig.html

A compliant TCP has no other chance but to go into go-back-N after a 
spurious timeout, i.e., a spurious timeout forces TCP into the go-back-N.

I have just submitted an Internet Draft to TSVWG on this matter:
http://search.ietf.org/internet-drafts/draft-ludwig-tsvwg-tcp-eifel-alg-00.txt


> >(2) out-of-order delivery at the link layer that falsely triggers fast
> >retransmit / fast recovery.
>
>I think this is not as significant a problem in practice, since most
>link layers already enforce packet ordering.

I agree with respect to existing link layers. I'm not so sure, though, 
whether we should also recommend in-order delivery for future wireless links.


> >Given that PILC recommends that link layers should avoid out-of-order
> >delivery (also for the sake of not causing trouble for TCP/IP header
> >compression) this leaves spurious timeouts as the cardinal problem.
>
>But now I'm no longer sure that ordered delivery at the link level is
>even a good thing.
>
>The basic problem is that ordered delivery can only be done by
>intentionally delaying any packets that would otherwise be delivered
>out of order. This is harmless or even beneficial if the delayed
>packets belong to the same TCP connection.  But if they don't, then
>you merely impair the performance of the other connection(s) with no
>compensating benefit.  (True, the link *could* peek at the TCP and IP
>headers and only enforce in-order delivery of packets belonging to the
>same TCP connection, but that would be an egregious layer violation --
>i.e., it wouldn't work with IPSEC).

Again, I agree except to the layer violation concerns.

I would also like to stronlgly promote clean designs, i.e., without layer 
violations. But to implement what I call inter-flow out-of-order delivery 
(as opposed to intra-flow out-of-order delivery, i.e., o-o-o delivery 
within a single flow), you do not need to have the link layer inspect IP 
headers.

Instead, you could, e.g., run multiple logical link layers (one per flow) 
across a hop, and have IP distribute the packets onto those links based on 
port numbers. "one per flow" clearly does not scale, but for the first/last 
hop this shouldn't be a problem.

Yes, as with so many things, this does not work with IPsec.


>I believe this is much the same reasoning that originally led to the
>Internet architects to say that IP need not deliver packets in
>sequence, that reordering is a transport layer responsibility.
>
>Transport protocols can compensate for many subnetwork and Internet
>ills, including packet loss, duplication and reordering. But high
>delay is one subnetwork ill that a transport protocol can *never*
>overcome.  That's why I think any low-level mechanism that
>deliberately introduces delay should be viewed with great skepticism.
>
>This also leads to the conclusion that if packet reordering spuriously
>triggers TCP's dup ack mechanism, then this is really TCP's fault and
>it should be fixed inside TCP. (I didn't say this would be easy, though.)

The ID I mentioned above is such a fix (inside TCP).


>This leaves one remaining argument for ordered delivery on links: VJ
>header compression. So we could simply say that in-order delivery is
>important only if you plan to implement VJ header compression.

The evolved VJ header compression, RFC2507, optionally uses sequence 
numbers within the compressed headers to deal with links that re-order. So, 
this would eliminate the "one remaining argument". However, there would be 
no benefit in having the link layer do o-o-o delivery in this case, since 
now we have the header decompressor delaying the packets to bring them back 
into order.


>Anyway, back to retransmission interactions.
>
> >>But whenever the link layer retransmits, the RTT spikes
> >>and there's a spurious TCP retransmission.
>
> >This is a simply not true. You have to let things go really bad to force a
> >spurious timeout in TCP. Depending on your path characteristics especially
>
>You say it's rare, but it is something I've seen quite often in TCP
>traces. I think this difference is simply due to our having operated
>over different environments.

Plus, it very much depends on your TCP (timer granularity, restart of the 
timer after an ACK, ...). We see more spurious timeouts with LINUX for 
example. Still, with the current RTO calculation, spurious timeouts should 
be rare.


>Our air links tend to have relatively
>high latencies even without link level retransmissions, and more
>significantly this latency tends to dominate that of the total
>Internet path.  So we probably see many more cases of spurious TCP
>retransmissions being triggered by link retransmissions than you do.

I don't know what you mean when you say "our air links", but I did quite a 
few measurements in GSM which certainly qualifies as high latency, and also 
it dominates the e2e RTT. Still, spurious timeouts were extremely rare (in 
BSD) and when they occured, then this was always in the beginning of the 
connection. This is since the RTO had not been given enough time to adapt.


>Also, our link protocols have historically been implemented in cell
>phones with severely limited memory resources, and this has led us to
>implement non-persistent link ARQ schemes that don't require a lot of
>buffer memory.  Of course, this problem is going away over time.

I see highly persistent LL ARQ as independent from buffer memory. When some 
people hear highly persistent LL ARQ, they often think that this equals 
"buffering for ever". *No, it does not!* The queues need to remain 
appropriately small also with highly persistent LL ARQ.


>But perhaps you're right that, overall, it's better to not limit the
>persistence of the link layer ARQ scheme.

I think it should be limited to 64 seconds sicne this is TCP max. RTO. 
There is really no point in delaying a packet beyond that. In addition, 
should the system hosting the link layer get into memory problems, then 
this might also be a reason to drop (from front) packets before 64sec have 
passed.


>Fair queuing might help
>limit the effect of the spurious retransmissions on other competing
>flows, for example.

I agree, if you seperate flow classes (UDP vs. TCP) at the link layer, 
there needs to be some scheme in place that devides the link's bandwidth 
between those classes. E.g., 30% for elastic TCP traffic, the rest for 
delay-sensitive traffic.


>Also, having a highly persistent link layer ARQ
>might be the best way to help TCP recover quickly from sustained link
>outages.

Exactly.

///Reiner