Re: Comments on your SCPS-TP testing report

"Adrian J. Hooke" <adrian.j.hooke@jpl.nasa.gov> Thu, 03 October 2002 03:54 UTC

Message-Id: <5.1.0.14.2.20021002193212.01c6bb50@mail1.jpl.nasa.gov>
X-Sender: ahooke@mail1.jpl.nasa.gov
X-Mailer: QUALCOMM Windows Eudora Version 5.1
X-Priority: 2 (High)
Date: Wed, 02 Oct 2002 20:54:42 -0700
To: William Ivancic <wivancic@grc.nasa.gov>
From: "Adrian J. Hooke" <adrian.j.hooke@jpl.nasa.gov>
Subject: Re: Comments on your SCPS-TP testing report
Cc: "Adrian J. Hooke" <adrian.j.hooke@jpl.nasa.gov>, durst@mitre.org, kscott@mitre.org, feighery@mitre.org, tcpsat@grc.nasa.gov, Diepchi.T.Tran@grc.nasa.gov, Frances.J.Lawas-Grodek@grc.nasa.gov, Robert.P.Dimond@grc.nasa.gov
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="=====================_130789745==_.ALT"
Sender: owner-tcpsat@grc.nasa.gov
Precedence: bulk
Status: RO
Content-Length: 20390
Lines: 373

Hi William:

Sorry for the delayed reply, but we've had some other year-end commitments 
to deal with during the past week.  I discussed your response with some of 
the SCPS team members in Houston this evening, over a beer, and I have 
consolidated their comments into this single message.

Best regards
Adrian Hooke, NASA-JPL
----------------------

On 23 September 2002, William Ivancic wrote:
>We tested and set the optimum receiver buffer size starting a twice the 
>Bandwidth-Delay product (BDP) and found for our test setup with the OS 
>used (which appeared to be the most stable OS) that the optimal setting 
>was slightly less the 1 BDP.  So, we could say that "in theory" the 
>receive buffer should be 2 x BDP, but the data did not support that of Solaris.

This is very curious.  You are stating that the optimal setting of the 
buffers is less than the bandwidth delay product.  Are these results due to 
the constraints placed by the operating system?  The fact the none of the 
rate based protocols come even close to line rate seems to suggest the 
limitation is not protocol based, but O.S based.  Have you tried running 
these protocols on other operating systems (besides NetBSD) to determine 
where the bottleneck occurs?  On even modest (by today's standards) Intel 
based processors, throughput results using SCPS-TP that are much greater 
than yours can easily be obtained.

User-space (application) protocols have issues with things like the 
user/kernel boundary that are not present in kernel based 
protocols.  Developing tests to remove these biases in the results can be 
difficult. Have you tested in-kernel based implementations yet? This might 
allow greater insight.

William continued:
>IMHO, protocol tuning doesn't scale well - particularly in non-gateway 
>implementations and for systems such as NASA's planned sensorWeb.   I 
>would really like to see better auto-tuning features developed.

Auto-tuning (i.e., autonomous protocol tuning) in dynamic environments such 
as space is pretty darned hard and there is no evidence supporting or 
refuting the notion that auto-tuning features will scale well either. While 
auto-tuning makes for an interesting research area, a priori protocol 
tuning seems to be the most pragmatic option for medium term deployment of 
anything in space.

William continued:
>The paper on performance for "SCPS-TP assumed corruption", is for a 
>gateway implementations.  All data is received at the gateway and 
>buffered.  Thus, all data is buffered at one source and ECN type 
>mechanisms can be implemented on the point-to-point link.  This is fine, 
>but is a very limited case.  I would not be comfortable extrapolating such 
>performance to general deployments of "SCPS-TP assumed corruption" over 
>fully meshed networks.  Am I missing something?

To what "paper on performance for SCPS-TP assumed corruption" are you 
referring? If it is our forthcoming MILCOM paper "TCP CONGESTION CONTROL IN 
SHARED SATELLITE ENVIRONMENTS" then we are a bit confused.  If a TCP Vegas 
implementation is started up in a congested environment and that 
implementation and the congested part of the network both support ECN, TCP 
Vegas can establish a suitable operating point.  If the Vegas congestion 
control runs over a single hop (as we described in the MILCOM paper) then 
ECN needs only be implemented there.  If the congestion occurs past a 
'downstream' gateway then it will be dealt with by that downstream gateway 
using an appropriate congestion control mechanism (which may or may not be 
the same as is used between gateways).

Note that ECN was turned OFF in the MILCOM configuration. Subsequent flows 
were started up in environments that although not *congested* were 
certainly *occupied* - and performance was still excellent.

If "fully meshed networks" refers to a multiple access satellite system 
where multiple terminals share a single uplink/downlink then the channel 
access scheme, such as the military's SHF DAMA, would have to provide ECN 
feedback.

The only place that "assume corruption" should be in play is in 
environments where this is a credible assumption. If your fully meshed 
network only has a single subnet where this is true (the likely case), then 
one would isolate this subnet with a gateway. The intended purpose of 
gateways is to provide an impedance matching capability between two or more 
radically different network environments, e.g.,

   o  One would place a gateway between a tetherless network subnet
      and a more classic wired/fibered network subnet.

   o  One would place a gateway between a network subnet where you have
      complete control over resources (bandwidth allocation/reservation,
      etc.) and the rest of the world.

So, in your example, one would put a gateway between your ECN capable 
network subnet and the rest of the world. If your fully meshed network only 
exists in an environment where "assume corrupt" is a valid assumption, then 
it is also quite reasonable to engineer your infrastructure to enable ECN 
on the intermediate routers, right?

William continued:
>Gateway testing is great, but assuming everything will use a gateway is 
>..... maybe not to good.

Of course - but who makes that assumption?

William continued:
>If one encrypts prior to the gateway (very common in a military 
>environment), your limited as to what you can do.

Actually, it depends on where, when and what you are encrypting.

William continued:
>  I'm always a bit fearful when I see testing performed using gateways in 
> that I fear decision makers may not understand that those results may 
> only be valid when used in an architecture that allows for deployment of 
> a gateway.

The same fears are applicable for testing performed using optimally tuned 
end-systems, as those results are generally unattainable for most 
deployments (which is the main reason why people explore the use of 
gateways in the first place). Gateway operation and use of encryption need 
not be orthogonal in any architecture.

Yes, encrypting prior to the gateway leaves little that a gateway can 
do.  We can't speak for all military environments, but there are a number 
of organizations who use a combination of gateways and military grade 
(type-1) encryption (e.g., TACLANE encryptors).  These organizations 
realize that the performance of TCP leaves much to be desired in military 
GEO satellite environments and they therefore place the encryptors after 
the TCP gateway to increase the performance.  In fact this architecture has 
an additional advantage.  Since the  gateway is the single entry point to 
the encryptor,  it can emit packets of the correct size to optimize the 
transmission between the encryptors (i.e., encryptors add additional 
protocol overhead due to IP encapsulation, which can cause IP fragmentation 
issues.) Otherwise you would have to change every end system to avoid this 
problem.

We're unsure if you were aware of this, but the MILCOM paper referenced 
above and included with our previous email described testing done in 
conjunction with the US Army Information Systems Engineering Command at Ft. 
Huachuca, AZ in which military grade (type-1) encryption was used AFTER 
gateways to separate different classification levels across the same 
physical medium (shared satellite channel).  That is, two TCP flows of 
different classification levels were separately encrypted and 
gatewayed.  The gateway outputs were then combined in a router connected to 
the satellite uplink.  If the TCP flows sharing the satellite medium were 
to use Van Jacobson congestion control, they would perform *very poorly* 
due to the high BER of the channel (and VJ's characteristic of repeatedly 
driving the link to congestion then backing off).  The "Vegas assume 
corrupt" SCPS-TP configuration, however, allowed multiple gateways to 
*efficiently* (i.e. with few retransmissions) and *fairly* (i.e. each flow 
acquired and held roughly the same share of the overall bandwidth) share 
the channel.  Similar lab testing has shown that the results extend to more 
than just two flows.

William continued:
>Assuming ECN in a fully meshed network may be wishful thinking.

Probably true unless you control the full mesh. However, the correct thing 
to do is to exploit ECN when it is available rather than ignoring it 
because it generally isn't. If you had a private network, would you deploy 
ECN? If not, why?

William continued:
>Although ECN concepts have been around and experimentally show to be 
>advantageous in an ideal world,  I believe, to date, deployment of ECN is 
>limited at best.

But surely if your space network would benefit from ECN, then you could get 
it deployed? Remember, also, that the "assume corrupt" mechanisms do not 
*require* ECN, but they are able to take advantage of it.

William continued:
>I'm may be incorrect here as I do not have the compete  knowledge of what 
>is deployed in the backbone.  A quick breeze through the QoS commands on 
>Cisco routers did not show ECN settings, but that was a very quick search.

A number of router vendors, including Cisco, have implemented ECN.  You may 
want to try...
http://www.cisco.com/univercd/cc/td/doc/product/software/ios122/122newft/122t/122t8/ftwrdecn.htm 

WRED Explicit Congestion Notification

---end