IPAE problems

Eric Fleischman <ericf@atc.boeing.com> Fri, 11 June 1993 02:46 UTC

Received: from ietf.nri.reston.va.us by IETF.CNRI.Reston.VA.US id aa16027; 10 Jun 93 22:46 EDT
Received: from CNRI.RESTON.VA.US by IETF.CNRI.Reston.VA.US id aa16023; 10 Jun 93 22:46 EDT
Received: from Sun.COM by CNRI.Reston.VA.US id aa04250; 10 Jun 93 22:46 EDT
Received: from Eng.Sun.COM (zigzag-bb.Corp.Sun.COM) by Sun.COM (4.1/SMI-4.1) id AA11457; Thu, 10 Jun 93 19:35:29 PDT
Received: from sunroof2.Eng.Sun.COM by Eng.Sun.COM (4.1/SMI-4.1) id AA05103; Thu, 10 Jun 93 15:36:28 PDT
Received: from Eng.Sun.COM (engmail1) by sunroof2.Eng.Sun.COM (4.1/SMI-4.1) id AA26935; Thu, 10 Jun 93 15:39:49 PDT
Received: from Sun.COM (sun-barr) by Eng.Sun.COM (4.1/SMI-4.1) id AA13511; Thu, 10 Jun 93 15:36:10 PDT
Received: from atc.boeing.com by Sun.COM (4.1/SMI-4.1) id AA28973; Thu, 10 Jun 93 15:34:49 PDT
Received: by atc.boeing.com (5.57) id AA29171; Thu, 10 Jun 93 15:21:30 -0700
Date: Thu, 10 Jun 1993 15:21:30 -0700
Sender: ietf-archive-request@IETF.CNRI.Reston.VA.US
From: Eric Fleischman <ericf@atc.boeing.com>
Message-Id: <9306102221.AA29171@atc.boeing.com>
To: ip-encaps@sunroof.eng.sun.com, sip@caldera.usc.edu
Subject: IPAE problems

Dear IPAE/SIP working group,

Several weeks ago I distributed a list of what I perceived were the
PROs and CONs of the various IPng approaches for feedback.  I received
a "fair bit" of feedback which led me to modify my lists.  One of the 
comments was a lengthy list of perceived IPAE flaws which were given in 
response to my positive PRO IPAE statement.  This response disturbed me 
since I am not able to evaluate the significance of these low-level 
criticisms -- and I want IPAE to work as advertised.  Because the 
author of the comments requested anonymity, I corresponded with
Dave Crocker asking for advice.  He suggested that I massage the
criticisms (to protect the author's anonymity) and then send them to
the SIP working group for evaluation and consideration.  The purpose
of this note is to do exactly that in the hopes that any problems
may be expeditiously identified and resolved while the protocol is
still new and pliable.

I wish you every success in resolving any existing technical problems.

Sincerely yours,

--Eric Fleischman

P.S.  Please do me the favor of not trying to guess who the anonymous
criticizer may be:  Should you correctly guess his/her identity,
you will embarass me by pointing out my poor "massaging" ability.
Should you incorrectly guess the person's identity, then you may
perhaps anger the misidentified person.  

================ Anonymous IPAE Criticisms follow ================

The IPAE transiton is *very* similar to the original transition scheme 
that was proposed as a predecessor to TUBA. However, it was eventually 
abandoned because of terminal complexity. Unfortunately, I don't know 
all of the problems that were found at the time. Some of the ones that 
I know of which apply to IPAE are listed below:

 - How do IPAE hosts whose local routers are IP-only find the IPAE 
   routers?

I think that this can be handled by having the IPAE routers announce
an IP route to some fictitious IP network number, and having the IPAE
hosts send a packet to that particular IP address. This will allow
IPAE hosts to send packets to the nearest IPAE router. 

 - How do IPAE routers find other IPAE routers?

This will probably require a lot of manual configuration. I haven't 
seen this written down.

 - How to deal with the "enormous branchiness" problem?

For a large IP network, there could be a very large number of IPAE
routers which are all direct neighbors (in the IPAE sense). This
causes a difficult routing problem. This might be handled by 
manually configuring only a subset of the possible encapsulated
links, but this leads to extra hops. 

 - How to deal with the "lost ICMP" problem?

ICMP error reports sent via IP-only routers do not contain enough
of the discarded IP packet to include all of the SIP header. This
means that the error report cannot be translated into a SIP error
report to be sent to the source. This in turn implies that ICMP
error reports are lost. 

I have heard two possible solutions to this: 
(i) update all IP routers to send more data in error reports 
(which defeats part of the main purpose of encapsulation); 
(ii) Have IPAE routers maintain a cache of recently received 
IP-ICMP error reports, and return a SIP-ICMP error report to 
the source of any packet which would have the same IP appearance 
over the next hop. 
Naturally this latter solution is not real pleasant for a vendor 
which prides itself on its IP performance (since every SIP over IP 
packet would need to be looked up in the cache, and you might need 
to return error reports for packets which might not have been lost). 

 - How to make SIP MTU discovery work

One small NIT is that SIP MTU discovery has to interwork with
IP MTU discovery (complex but do-able). However, if ICMP error
reports are lost, then MTU discovery doesn't work, and SIP
relies on MTU discovery.

 - When translating from IP to SIP, how do you set the high 
order part of the address -- especially the C-bit?

The November 11th IPAE draft proposes using either a large 
table or a DNS lookup. However, if some hosts are old IP-
only hosts, and some are updated SIP/IPAE hosts, then the
first bit in the address will differ (the C bit) for adjacent
hosts on the same subnet, implying that the mapping table has 
to be maintained on a per-host basis. This implies that the 
table is potentially HUGE, and cannot be maintained in any 
one place (ie, *must* be maintained in a distributed fashion 
via DNS).

If you use DNS lookup for this, then what do you do with 
the IP packets in the mean time? The choices would seem to
be to discard them or cache them. However, several router
vendors (no names mentioned) have been beat-up emphatically 
in the past for discarding IP packets while waiting for ARP
replies, which implies that discarding is not acceptable to
customers. Thus we need to cache them. However, as we are
currently shipping routers which can handle more than 
400,000 packets per second, and are working on higher speed
routers, and given that DNS lookups tend to be slow, this
would seem to imply a very large cache and/or very optimistic
assumptions about how much of the traffic will be repeat
(and how often the cache will need to be flushed due to a
routing changes). 

Also, given that a complete flushing of the cache with 
this proposal would be a disaster (routers with 1,000,000
packet per second performance could drop hundreds of 
packets per second while trying to re-build their cache, 
with DNS straining mightily to try to keep up), it will be
necessary to put in partial cache flushing when routes 
change, which will be complex to make work. 

 - When translating from SIP to IP, how to set the DU ID

SIP does not have a data unit ID field. Thus, when translating
from SIP to IP, the router will have to pick a value. This
means that two packets which happen to take different paths
may be translated at different IPAE routers, and could get
the same data unit ID. 

 - Practicallity of IP --> SIP/IPAE --> IP translation.

The IPAE and SIP documents both make a big deal about the
fact that translation can be used to improve the routing
problem in Internet backbones *before* any hosts are updated.
However, as mentioned above, this runs into a problem with 
the C bit implying that mapping tables need to be on a per-
host basis (ie, enormous). 

Even if there weren't a C bit problem, the mapping tables
would still probably need an entry for each customer of a
regional. This implies that the mapping tables will be 
substantially larger than the associated routing tables 
(the ones which are allegedly too large to hold). This in
turn will almost certainly require the tables to be held
in DNS, which implies packet cacheing and rather slow
lookups in the fast path in our fastest routers (the 
routers which go into backbones). 

The alleged advantage of this approach is that the mapping
tables will be very static (change only with administrative
changes, not as a result of routing dynamics). Thus it is 
feasible to have a huge static table plus a small dynamic
routing table, even when it is not feasible to have a huge
dynamic routing table. 

However, there is a better approach which can be done 
with straight IP packet forwarding. This approach has been 
discussed in the TUBA working group. Here the very static 
table maps from IP address to regional (and is thus some 
number of orders of magnitude smaller than the static table 
required for the SIP/IPAE proposed feature -- how many orders 
of magnitude smaller depends upon how the C bit problem is 
handled, and thus whether the IPAE table needs one entry per 
customer of a regional, or one entry per host). Dynamic 
routing is then done to the regional. Given that this table 
is much smaller than the corresponding table in the IPAE 
plan, it can be maintained in the router, which does away 
with the need for a DNS lookup. 

Thus this alleged advantage of IPAE is really an order of
magnitude worse than with the other approaches (ie, worse
than can be done with straight IP). 

There is another minor problem with the IPAE transition scheme:  it 
requires routers to reassemble (when we were doing the IP<-->CLNP
transition with encapsultion a while ago we thought of this problem, 
and came up with a way to get around it -- however, this would not be 
feasible with SIP/IPAE, and in any case required routers to
deal with options efficiently). 

---------  the following is an "aside" in the message ----

Regarding the use of static tables to map from an IP address
to a topologically significant address (which may be a SIP 
address, used with IP --> SIP translation; or may be a
provider identifier, used with straight IP forwarding in 
order to extent the life of IP) please consider the following:

With the IPAE/SIP approach, if you ignore the C bit 
problem, then there is one entry in the 
static table for every customer of each regional. If CIDR 
is not religiously followed, then there will also need to
be an entry for every IP network number that does not
conform to CIDR. Thus if you ignore the C bit problem then 
the static table is exactly the same size as the Internet 
routing table would need to be (but is, of course, more 
static). However, due to the C-bit problem, (unless 
something major gets changed in the proposal, which is 
still possible) there will need to be one entry in the 
table for each host.

However, for the straight IP solutions this will be the 
exact same size as the IPAE solution would be if there 
were no C bit problem (Note: I was mistaken when I previously
said that it would be better than this.)

Thus, the pure IP solution is still preferable to the IPAE
solution for three reasons: 
(i) It eliminates the C bit problem, thus the static table is two 
or three orders of magnitude smaller in the pure IP solution than 
in the IPAE solution); 
(ii) It eliminates the need for packet translation (not something that 
we really want to do with every packet at OC3 rates!); 
(iii) It maintains the independence of short term steps taken to extend 
the life of IP from long term steps taken to deploy a new protocol. 

------- end of the aside -----

 - Training costs for personnel

Given the above problems, any savings of training costs for 
personnel by an IPAE/SIP solution is highly doubtful.
I think that it is probably possible to find some 
solutions (however complex) to each of the aforementioned 
IPAE problems, and I expect that any additional problems
caused by these additional solutions will themselves be
fixable (by adding even more complexity?). Thus my guess 
is that IPAE *can* be made to work with sufficient effort. 
However, the difficulty of getting all of this to actually 
work will be hard on network management personnel which equates
to a considerable expense. 
 
 - Dependence between old and new protocols

I understand that one thing which really hurt in the DECnet phase 4 to 
phase 5 transition is the very close coupling of the old and new
protocol suite. Thus, for example, DECnet phase 4 hidden areas
work fine as a way to extend phase 4; Phase 4 to phase 5
packet translation works fine as a transition technique; The
phase 5 assumption that addresses are globally unique (implicit
in the name to address lookups) is also reasonable. However,
these three facts in combination means that phase 4 to phase 5
transition is very difficult for customers which are already
using hidden areas.

IPAE similarly couples the old and new network layer protocols 
very closely (even more closely than the DECnet transition).
This is likely to constrain what we can do to try to keep
IP going as long as possible. However, whether this turns
out to be a problem is unclear until we see what folks do
to try to keep IP going. 

- Basic Motivation behind SIP

The basic motivation behind SIP is to make a protocol which
is so simple that it is possible to build very high speed 
routers. However, the router vendors are not involved in
any significant way with SIP development. I have not heard
anyone from any router vendor suggest that SIP is really the
right way to build high speed routers. Most of the work in
forwarding at very high speed is outside of the network 
layer protocol in any case, and alternative means exist 
which will allow forwarding of protocols with more flexible
addresses at the same high speeds. 

- SIP Address

All of this of course ignores the problems of the SIP 
address space. 

Clearly the whole point of transitioning to a new network layer
protocol is to come up with a network addressing scheme which
works. It is the height of folly to use incorrect assumptions
(about the need for small addresses to allow high speed 
forwarding) to force you to an address space which is not well
thought out, is not flexible, and is of dubious long-term 
sufficiency.

In 1980 a paper was published as an NBS proposal which proposed 
a 64-bit address space for an International version of IP 
(this sort of became input to what later became CLNP). Comments were
made that 64 bit addresses were too large (after all, 32 bit addresses 
are clearly large enough since IP uses them and they can support routing to 
2**32 hosts, which is much more than could ever exist in the
world), and comments were also made that 64 bits were not 
large enough. Thus, (to quote Yogi Berra) the current IP
address space discussions are a case of "Deja Vue all over
again".