Re: [rrg] IRON-RANGER scalability and support for packets from non-upgraded networks

Robin Whittle <rw@firstpr.com.au> Tue, 16 March 2010 11:40 UTC

Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4A8C83A6A41 for <rrg@core3.amsl.com>; Tue, 16 Mar 2010 04:40:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.412
X-Spam-Level:
X-Spam-Status: No, score=-1.412 tagged_above=-999 required=5 tests=[AWL=-0.117, BAYES_00=-2.599, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, J_CHICKENPOX_31=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id sfTlYmTaHTzk for <rrg@core3.amsl.com>; Tue, 16 Mar 2010 04:40:19 -0700 (PDT)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id 3E0583A6A48 for <rrg@irtf.org>; Tue, 16 Mar 2010 04:40:13 -0700 (PDT)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id A4987175A18; Tue, 16 Mar 2010 22:40:18 +1100 (EST)
Message-ID: <4B9F6E22.60509@firstpr.com.au>
Date: Tue, 16 Mar 2010 22:40:18 +1100
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.23 (Windows/20090812)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
References: <C7B93DF3.4F45%tony.li@tony.li> <4B94617E.1010104@firstpr.com.au > <E1829B60731D1740BB7A0626B4FAF0A649511933 94@XCH-NW-01V.nw.nos.boeing.co m > <4B953EA5.4090707@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A64951 19 34CF@XCH-NW-01V.nw.nos.boeing.com> <4B97016B.5050506@firstpr.com.au> <E1 829B60731D1740BB7A0626B4FAF0A6495119413D@XCH-NW-01V.nw.nos.boeing.com> <4B9 98826.9070104@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649511DCEA0@XCH-NW-01V.nw.nos.boeing.com> <4B9B0244.7010304@firstpr.com.au> <E1829B60731D1740BB7A0626B4FAF0A649511DD102@XCH-NW-01V.nw.nos.boeing.com>
In-Reply-To: <E1829B60731D1740BB7A0626B4FAF0A649511DD102@XCH-NW-01V.nw.nos.boeing.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: Re: [rrg] IRON-RANGER scalability and support for packets from non-upgraded networks
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Mar 2010 11:40:22 -0000

Short Version:    Fred describes a method for handling packets sent
                  from by non-upgraded networks - roughly similar
                  to Ivip's DITRs and LISP's PTRs.  However there
                  are unresolved questions about the commercial
                  arrangements for running these.

                  With this arrangement, my prior assumption that
                  there be 20 or so routers advertising a given VP
                  (because I had understood these were performing the
                  DITR/PTR functions) does not apply.

                  This reduces the the scaling difficulties I had
                  previously mentioned.

                  But there is not yet a clearly defined method of
                  registering with the 2 or 3 VP routers.  We discuss
                  some of the scaling problems and workarounds for
                  them - but a fuller discussion will depend on
                  Fred's choice of registration mechanism.  He is
                  contemplating replacing the overlay network's BGP
                  with OSPF, but maybe there's a way of doing it
                  while retaining BGP.



Hi Fred,

You wrote:


>> I think I-R needs to be described in a way that someone who is up to
>> speed on scalable routing in general can read one or perhaps two I-R
>> documents and have a good idea of how the whole thing is going to
>> work - including with respect to scaling and security.  This doesn't
>> require exact bits in headers, but that could be part of it.  I think
>>  it needs to be pretty-much self-contained rather than requiring
>> people to read other documents which are not part of I-R.
> 
> There is room in a future update to IRON to improve on this.

OK.


>>>> For instance, how many IRON routers are there in an IPv4 I-R system,
>>>> and how many individual EID prefixes?
>>>
>>> Let's suppose that each VP is an IPv6 ::/32, and that
>>> the smallest unit of PI prefix delegation from a VP is
>>> an IPv6 ::/56. In that case, there can theoretically be
>>> up to 4B VPs in the IRON RIB and 16M PI prefixes per VP.
>>> In practice, however, we can expect to see far fewer than
>>> that until the IPv6 address space reaches exhaustion
>>> which many believe will be well beyond our lifetimes.
>>
>> OK.  Still, depending on how the address space was allocated - or at
>> least that subset of the address space covered by I-R's VPs - there
>> could be high numbers, approaching 16M perhaps, of I-R PI prefixes
>> per VP.
> 
> Well, this is a tunable knob of course. We could for
> example set the length for VPs to ::/36, ::/40, etc.
> to reduce the number of PI prefixes per VP.
> 
> The tradeoff is in managing a RIB containing a large
> number of VPs (which are likely to be quite stable) vs.
> managing a large number of PI prefixes per VP (which
> require periodic keepalives to maintain). So, given a
> routing protocol that can maintain a large number of
> VPs in a relatively static topology it seems like a
> proper balance of PI prefixes per VP can be found.

OK.

>>> Still thinking (very) big, let's try sizing the system
>>> for 100K VPs; each with 100K ::/56 delegated PI prefixes.
>>> That would give 10B ::/56 PI prefixes, or 1 PI prefix
>>> for every person on earth (depending on when you sample
>>> the earth's population). Let's look at the scaling
>>> considerations under these parameters:
>>
>> OK, I think this is a good scenario to discuss.  I assume that the
>> VPs can be of various sizes, so some VPs could be a longer prefix,
>> covering less space, if there are a larger number of I-R PI prefixes
>> within that part of the address space.
> 
> The length of the VPs is a tunable. It may be that there
> can be VPs of varying lengths, but I chose to discuss as
> all VPs having the same length for simplicity.

OK.


>> As far as I know, you don't need VPs covering the entire advertised
>> subset of global unicast address space.  However, for worst-case
>> scaling discussions I think it is good to assume this.
>>
>>>> Then, how do these IRON
>>>> routers, for each of these EID prefixes continually and repeatedly (I
>>>> guess every 10 minutes or less) securely inform a given number of VP
>>>> routers they are the router, or one of the routers, to which packets
>>>> matching a given EID prefix should be tunneled.  Since there could be
>>>> multiple VP routers for a given VP, and the IRON routers don't and (I
>>>> think) can't know where they are, how does this process work securely
>>>> and scalably?
>>>
>>> Each IRON router R(i) discovers the full map of VPs in
>>> the IRON through participation in the IRON BGP.
>>
>> I recall that some IRON routers handle VPs and others don't.  As I
> 
> Not quite. All IRON routers by definition connect to the
> IRON. So, all IRON routers discover all VPs in the IRON,
> and *some* IRON routers also connect to the DFZ. Those
> that connect to the DFZ advertise one or a few very short
> prefixes (e.g., 4000::/3) that cover the set of all VPs
> in the IRON.

OK - so these are the I-R equivalents of Ivip's DITRs (Default ITRs
in the DFZ) and LISP PTRs.  In my previous message, I assumed that VP
routers were also advertising their VPs in the DFZ.  I recall I got
this from something you wrote, but it doesn't matter now.

But what are the scaling properties of these routers I will refer to
as being "DITR-like"?

Who runs them?  They are doing work, handling packets addressed to
very large numbers of I-R end-user network prefixes, who are the
parties which benefit.  So I think there needs to be an arrangement
for money to flow from those end-user networks, in rough proportion
to the traffic each DITR-like router handles for each end-user
network.  This is handled in Ivip, but with DITRs which advertise
specific subsets of the MABs (Mapped Address Blocks):

  http://tools.ietf.org/html/draft-whittle-ivip-arch-04#section-8.1.2

I suggest you devise a business case for these "DITR-like" routers -
and give them a name.

They are going to be busy, depending on where they are located, the
traffic patterns, how many of them there are etc.   So they need to
be able to handle the cached mapping of some potentially large number
of I-R end-user network prefixes.


>> wrote earlier, assuming VP routers advertise the VP in the DFZ, not
>> just in the I-R overlay network, then they are acting like LISP PTRs
>> or Ivip DITRs.  In order for them to do this in a manner which
>> generally reduces the path length from sending host, via VP router to
>> the IRON router which delivers the packet to the destination, I think
>> that for each VP something like 20 or more IRON routers need to be
>> advertising the same VP.
> 
> No; those IRON routers that also connect to the DFZ
> advertise very short prefixes into the DFZ; they do
> not advertise each individual VP into the DFZ else
> there would be no routing scaling suppression gain.

I think there would be, since each VP covers multiple individual
end-user network prefixes.  If there are 10^7 of these prefixes, and
on average each VP covers 100 of them, then there are 10^5 VPs and we
have excellent routing scalability, saving 9.9 million prefixes from
being advertised in the DFZ while providing 10 million prefixes for
end-user networks who use them to achieve portability, multihoming
and inbound TE.


>> I interpret your previous sentence to mean that all the IRON routers
>> are part of the IRON BGP overlay network, and that each one will
>> therefore get a single best path for each VP.  That will give it the
>> IP address of one IRON router which handles this VP.  It won't give
>> it any information on the full set of IRON routers which handle this VP.
> 
> Here, it could be that my cursory understanding of BGP
> is not matching well with reality. Let's say IRON routers
> A and B both advertise VP1. Then, for any IRON router C,
> C needs to learn that VP1 is reachable through both A and
> B. I was hoping this could be done with BGP, but I believe
> this could only happen if BGP supported an NBMA link model
> and could push next hop information along with advertised
> VPs. Do you know whether this arrangement could be realized
> using standard BGP?

Sorry, I can't reliably tell you what can and can't be done with BGP
- I don't try to do anything special with it with Ivip.

Still, if you assume that something could be done with BGP, consider
the potential scaling problems.  Somehow, for every one of X VPs, and
 for every Y IRON routers which handles a given VP, then you want
each IRON router to learn via BGP the address of every one of these
VP-advertising routers, and which VPs each one advertises.  This is
(X * Y) items of information you are expecting BGP to deliver to
every IRON router - so every BGP router needs to handle this
information.

The scaling properties of this would depend on how you get BGP to do
it, and how many VPs there are, and how many IRON routers advertise
the same VP.


> If we are expecting too much with BGP, then I believe we can
> turn to OSPF or some other dynamic routing protocol that
> supports an NBMA link model. In discussions with colleagues,
> we believe that the example arrangement I cited above can
> be achieved with OSPF.

OK . . . so you are considering using OSPF on the I-R overlay network
rather than BGP.  I can't discuss that without doing a lot of reading
- which I am not inclined to do.  But see below where I propose
methods of doing the registration within the limits imposed by BGP.


>>> That
>>> means that each R(i) would need to perform full database
>>> synchronization for 100K stable IRON RIB entries that rarely
>>> if ever change.
>>
>> I am not sure what you mean by "full database synchronization".  Only
>> a subset of IRON routers advertise a VP, and each IRON router would
>> get a best-path to a single IRON router out of potentially numerous
>> IRON routers which were advertising a given VP.  So any one IRON
>> router would not be able to use the IRON BGP overlay system to either
>> discover the IP addresses (or best paths) to all IRON routers, or to
>> all the IRON routers which advertise VPs, assuming that some VPs were
>> advertised by more than one IRON router.
> 
> What we need here is a dynamic routing protocol that
> supports an NBMA link model, and the IRON is treated
> as a gigantic NBMA link on which all IRON routers are
> attached. Maybe BGP won't fill the bill for that, but
> other dynamic routing protocols such as OSPF show some
> promise.


>>> This doesn't sound terrible even for existing
>>> core router equipment. As you noted, it is also possible that
>>> a given VP(j) would be advertised by multiple R(i)s - let's
>>> say each VP(j) is advertised by 2 R(i)s (call them R(x) and
>>> R(y)). But, since the IRON RIB is fully populated to all
>>> R(i)s, each R(i) would discover both R(x) and R(y) that
>>> advertise VP(j).
>>
>> I don't see how this would occur.  A given IRON router receives best
>> paths for each VP, so for VP(j) it will get a best path to (and IP
>> address of) either R(x) or R(y).
> 
> As above.
> 
>>> Now, for IRON router R(i) that is the provider for 100K PI
>>> prefixes delegated from VP(j), R(i) needs to send a "bubble"
>>> to both R(x) and R(y) for each PI prefix.
>>
>> Its no-doubt a relief to less muscle-bound scalable routing
>> architectures that the routers of IRON-RANGER are hurling about
>> merely "bubbles" rather than something with greater impact!
> 
> No worries; they are harmless, and not at all weapons
> of war.

Good!


>>> That would amount to 200K bubbles every 600 sec, or 333
>>> bubbles/sec.  If each bubble is 100bytes, the total bandwidth
>>> required for updating all of the 100K PI prefixes is 260Kbps.
>>
>> I am not sure each registration "bubble" would only be 100 bytes of
>> protocol-level data.  You need to specify, for IPv6:
>>
>>   1 - The IP address of the IRON sending the registration (16 bytes).
> 
> You mean in the data portion of the bubble or in the header?
> For IPv6-over-IPv4, the bubble does not need to include an IPv6
> header; it need only include the IPv4 header, since VET stateless
> address mapping allows the IPv6 link-local address to be discovered
> by knowing only the IPv4 address. I can't see why an IPv6 address
> would also be required in the data portion of the bubble if it can
> already be inferred from the IPv4 header?

I am definitely not going to try to think about mixed IPv4/v6
implementations of I-R.  I can handle thinking about purely IPv4 and
purely IPv6.


>>   2 - The prefix the IRON router is registering (18 bytes).
> 
> Not necessarily 18 bytes; prefix plus length is all that
> is needed. For a ::/32, that would be 4 bytes of prefix
> plus 1 length byte = 5 bytes. Since IPv6 likes to do
> things in blocks of 8, however, let's round up to 16
> to be safe.

OK.

>>   3 - Nonces and other stuff which invariably accompany messages
>>       such as this (10 to 20 bytes?).
> 
> The SEAL header with a sequence number that also
> serves as a nonce is used for this - the SEAL
> header plus sequence number length is accounted
> for below:

OK.

>>   4 - Authentication material, such as a digital signature for the
>>       above, including the public key of the signer (the
>>       IRON router itself?) and a pointer to one or more PKI CAs or
>>       whatever so the VP router can ascertain that this really is
>>       the public key of the signer.  These will be FQDNs - lets
>>       say 50 bytes or so.
> 
> I honestly do not know how much this would be. I will
> take your 50 byte estimation.

OK.


>> Maybe you could get the whole thing into 100 bytes.  Then add the
>> IPv6 header - 40 bytes - and a UDP header 8 bytes - and we are up to
>> about 150 bytes already.
> 
> No IPv6 header; only an IPv4 header (20 bytes) plus a SEAL
> header (8 bytes) plus possibly also a UDP header (8 bytes)
> for a total of 36.
> 
> Add in L2 headers - Ethernet is 46 octets -
> 
> I guess you are counting everything from the preamble to the
> end of the interframe gap? I come up with 42 (when 802.1Q header
> is added), but I'll use your 46 to be conservative.
> 
>>  and we are up to 200 bytes.  Multiply by 8 and this is 1600 bits.
> 
> I have (36 + 16 + 50 + 46) = 148. So, call it 150 to be
> safe, and the guesstimate is midway between your 200 and
> the 100 I said initially.

OK.

>>   1600 x 333 = 532,800 bits/sec ~=0.5Mbps
> 
> I get 1200 * 333 = 399,600 bps ~=0.4Mbps

OK.

>> This is the bandwidth of incoming packets to R(x) and likewise for
>> R(y) in your description.   This is assuming a two IRON routers
>> ("200k bubbles every 600 sec") per I-R PI prefix.
>>
>> But your description varies from mine already in two other important
>> respects.
>>
>> Firstly, if these VP-advertising routers are to operate properly like
>> DITRs or PTRs, they needs to be a lot more than 2 of them per VP.
> 
> No, because all that needs to be injected into the DFZ is
> one or a few very short prefixes (e.g., 4000::/3). It doesn't
> matter then which IRON router is chosen as the egress to get
> off of the DFZ, since that router will also have visibility
> to all VPs on the IRON.

OK.

Since you have what to me is a new "DITR-like" router plan for
supporting packets send from non-upgraded network, there is no need
for the larger number of VP routers as I assumed in my previous
message.  As long as you have two or three, that should be fine, I think.

There are two reasons an IRON router M might need to know about which
other IRON routers A, B and C advertise a given VP:

 1 - When M has a traffic packet.  (M is either an ordinary IRON
     router and advertises the I-R "edge" space in its own network
     or it is a "DITR-like" router advertising this space in the
     DFZ.)  M needs to tunnel the packet to one of these VP routers.

     The VP router will tunnel it to the IRON router Z it chooses as
     the best one to deliver the packet to the destination network
     and will send a "mapping" packet to M which will cache this
     information and from then on tunnel packets matching the
     end-user network prefix in the "mapping" to Z (or some other
     IRON router like Z, if there were two or more in the "mapping").

     In this case, M needs only the address of one of the A, B or C
     routers.  Ideally it would have the address of the closest one -
     but it doesn't matter too much if it has the address of a more
     distant one.  That would involve a somewhat longer trip to the
     VP router, and perhaps a longer or shorter trip from there to Z.
     (This would typically be shorter than the path taken through
     LISP-ALT's overlay network.)

     After M gets the "mapping", it tunnels traffic packets to Z - so
     the distance to the VP router no longer affects the path of
     traffic packets.

     In this case, BGP on the overlay would be perfectly good - since
     it provides the best path to one of A, B or C - typically that
     of the "closest" (in BGP terms).


 2 - When M is one of potentially multiple IRON routers which
     delivers packets to a given end-user network - packets whose
     destination address matches a given end-user network prefix P.

     M needs to "blow bubbles" (highly technical term from this
     R&D phase of IRON-RANGER) to A, B and C.  The most obvious
     way to do this is for M to be able to know, via the overlay
     network the addresses of all VP routers which advertise a given
     VP.  There may be two or three or a few more of these.  They
     could be anywhere in the world.

     BGP does not appear to be a suitable mechanism for this, since
     its "best path" basic functions would only provide M with
     the IP address of one of A, B and C.

     You could do it with BGP, by having A, B and C all know about
     each other, and with all three sending everything they get to
     the others.  This is not too bad in scaling terms for two,
     three of four such VP routers.

     Then, M sends its registration to one of them - whichever it
     gets the address of via the BGP of the overlay network - and
     A, B and C compare notes so they all get the registration.

     I will call this the "VP router flooding system".

     Later I suggest another alternative which would also work
     with BGP.


If you adopted something like the above-mentioned "VP-router flooding
system" I think you can retain BGP for the overlay network.  This
will tell each IRON router a best-path to one of the potentially
multiple routers which advertise a given VP.  If there are three such
routers for a given VP, known as A, B and C, and if for a given IRON
router, the BGP overlay network gives it a best path to B, then all
is well.  B will tend to be closer than the others.  If B dies or
becomes unreachable, this will cause the BGP overlay network to
withdraw the best path to B and then provide a best path to A or C
instead.



>> Let's say 20.  Maybe 10 would be acceptable, maybe more - but 20 will
>> do.  Let's call them RVP(j, 0) to RVP(j, 19) where, in your example:
>>
>>   R(x) == RVP(j, 0)
>>   R(y) == RVP(j, 1)
>>
>> Secondly, I don't see how R(i) could discover the IP addresses of
>> more than one of this set of 20 routers.
> 
> As above, it is only 2-3 IRON routers per VP; not 20.

OK.

>> In my model, if it could be shown how routers such as R(i) which
>> handle the 100k I-R PI prefixes in VP(j) could discover all the 20
>> routers RVP(j, 0) to RVP(j, 19), then each of these 20 routers has
>> this incoming bandwidth.
>>
>>> Now, let's say that each PI prefix is multihomed to 2 providers,
>>> then we get 2x the message traffic for 520Kbps total for the
>>> bubbles needed to keep the 100K PI prefixes refreshed.
>> You already assumed two IRON routers per I-R PI prefix in your
>> 260kbps figure above, so there's no need to double at again to 520kbps.
>>
>> 2 ISPs seems a reasonable figure, which was already part of my
>> calculations.
>>
>> Each provider has an IRON router which handles a given I-R IP prefix,
>> and each such IRON router is sending bubbles to all the VP routers
>> (though I don't yet understand how these VP routers would be
>> discovered - and I am assuming there are 20 of them while you are
>> assuming there will be 2 of them).
>>
>> My figure is 532kbps ~= 0.5Mbps incoming bandwidth per VP router.
>>
>>
>>>> If the VP routers act like DITRs or PTRs by advertising their VP in
>>>> the DFZ, then in order to make them work well in this respect - to
>>>> generally minimise the extra path length taken to and from them
>>>> compared to the path from the sending host to the proper IRON router
>>>> - I think you need at least a dozen of them.   This directly drives
>>>> the scaling problems in the process just mentioned where the IRON
>>>> routers continually register each of their EID prefixes with the
>>>> dozen or so VP routers which cover that EID prefix.
>>>
>>> I don't understand why the dozen - I think with IRON VP
>>> routers, the only reason for multiples is for fault tolerance
>>> and not for optimal path routing, since path optimization will
>>> be coordinated by secure redirection. So, just a couple (or a
>>> few) IRON routers per VP should be enough I think?
>>
>> Secure redirection works when an IRON router sends the initial packet
>> to a VP router, but it doesn't apply when the sending router is that
>> of a non-upgraded network.  To support generally low stretch paths
>> from those sending networks to the IRON router which is currently the
>> desired one for forwarding packets to the destination network, I
>> think you need a larger number.  20 is a rough figure, assuming a
>> global distribution of sending hosts and IRON routers which handle
>> the I-R PI prefixes - as is required for real portability.
> 
> Again, DFZ routers on the non-upgraded network would select
> the closest IRON router that advertises, e.g., 4000::/3 as
> the router that can get off the DFZ and onto the IRON. So,
> it would not be the case that all VPs would be injected into
> the DFZ.

OK - as per what to me is a new "DITR-like router" arrangement for
handling packets sent from non-upgraded networks.



>> If all the IRON routers for the I-R PI prefixes of a given VP were in
>> Europe, then it would suffice to have all the VP routers also in
>> Europe - so depending on the need for robustness and load sharing,
>> perhaps you wouldn't need 20 or them.  Maybe 5 would do.  But
>> generally, for this kind of scaling discussion, I think we need to
>> assume the goal of global portability of the new kind of address
>> space, with sending hosts likewise distributed globally.
>>
>> So I think that for a VP containing 100k I-R PI prefixes, there are
>> going to be 20 such VP routers, and each is going to get a continual
>> 1Mbps stream of registration packets.
> 
> Not 20; only 2 or 3. And, it would be less than 1Mbps per
> VP router.

OK.


>> This is not counting the work that VP router needs to do in order to
>> establish the authenticity of those registrations.  As far as I know,
>> it could only do this by looking up PKI CAs (Certification
>> Authorities) on a regular basis to ensure the signed registrations
>> were valid.
>>
>> There are serious scaling problems per VP router in handling 333
>> signed registrations per second. That's a lot of crypto stuff to do
>> just to check the signatures - and a lot more work and packets going
>> back and forth for regularly checking that the public keys provided
>> are still valid.
> 
> Crypto overhead can be greatly relaxed if the IRON router
> performs crypto only for the initial prefix registration
> then accepts bubbles without performing the crypto for
> subsequent prefix refreshments. This is because, using
> SEAL, there are synchronized sequence numbers for blocking
> off-path injections of bogus bubbles.

   (I had to Google "bogus bubbles" - what a great little phrase!
    There's no obvious domain name or rock-band or DJ using it.)

I agree, there may be some way of reducing the crypto overhead with
these sequence numbers.  But sooner or later, there would need to be
a check of the PKI to ensure the signature was made with a public key
which is still valid.


>> There is also the scaling problem of there being 20 or so of these VP
>> routers, so the entire Internet needs to handle 20 x 0.5Mbps = 10Mbps
>> continually just to handle the registration of these 100k I-R PI
>> prefixes.  Each such prefix requires 100 bits per second in continual
>> registration activity - 5 bits per second per VP router per I-R PI
>> prefix.  For each VP router, 5 bits per second on average comes from
>> each of the typically two IRON routers which are registering a given
>> I-R PI prefix.
>>
>> Checking this: If there was a single VP router and a single IRON
>> router registering an I-R PI prefix, the IRON router would send 1600
>> bits every 600 seconds. This is 2.66 bits a second.  Since there are
>> 20 VP routers, the figure per IRON router per I-R PI prefix is 53bps.
>>  Since there are two such IRON routers per I-R PI prefix, each such
>> IRON router sends 106bps per I-R PI prefix.  With 100k of these I-R
>> PI prefixes per VP, this is about 10Mbps.  This checks out OK.
> 
> You are off by a factor of 10 here, because there only needs
> to be 2 VP routers per VP.

Yes - with my new understanding of the "DITR-like" routers.


>> I think this is an unacceptable continual burden of registration traffic.
>>
>> Also, this is just for 10 minute registrations.  I recall that the 10
>> minute time is directly related to the worst-case (10 minute) and
>> average (5 minute) multihoming service restoration time, as per our
>> previous discussions.  I think that these are rather long times.
> 
> Well, let's touch on this a moment. The real mechanism
> used for multihoming service restoration is Neighbor
> Unreachability Detection. Neighbor Unreachability
> Detection uses "hints of forward progress" to tell if
> a neighbor has gone unreachable, and uses a default
> staletime of 30sec after which a reachability probe
> must be sent. This staletime can be cranked down even
> further if there needs to be a more timely response to
> path failure. This means that the PI prefix-refreshing
> "bubbles" can be spaced out much longer - perhaps 1 every
> 10hrs instead of 10min. (Maybe even 1 every 10 days!)

OK, I am not sure if I ever knew the details of "Neighbor
Unreachability Detection" - but shortening the time for these
mechanisms raises its own scaling problems.

Can you give some examples of how this would work?


> In this way, the PI prefix registration process begins
> to very much resemble DHCP prefix delegation.

I will pass on this for the moment.

At present, I can see these choices for this registration mechanism:

  1 - Keep BGP as the overlay protocol and use my proposed "VP router
      flooding system".

  2 - Retain your current plan of each IRON router like M needing to
      know the addresses of all the routers handing a given VP (A, B
      and C) which BGP can't do.  So you could:

      2a - keep BGP and add some other mechanism.  Maybe M sends a
           message to the one of A, B or C it has a best path to,
           requesting the full list of all routers A, B and C which
           handle a given VP.  When M gets the list, it sends
           registration "bubbles" to the routers on the list.  This
           needs to be repeated from time-to-time to discover
           new VP routers.

      2b - use something different from BGP which provides all the
           A, B and C router addresses to every IRON router, such as
           M.  This needs to dynamically change as A, B and C die and
           are restarted, or joined by others.



>>>> Your IDs tend to be very high level and tend to specify external RFCs
>>>> for how you do important functions in I-R.
>>>
>>> You may be speaking of IRON/RANGER, but the same is not
>>> true of VET/SEAL. VET and SEAL are fully functional
>>> specifications from which real code can be and has been
>>> derived.
>>
>> Yes - SEAL is a self-contained protocol, but I still found it hard to
>> navigate my way within the one document.
> 
> The IRON document has a lot of room to add more
> descriptive text on the architecture. But, the
> mechanisms are already specified in VET and SEAL.

OK.


>>>> Yet those RFCs say
>>>> nothing about I-R itself.  I think your I-Ds generally need more
>>>> material telling the reader specifically how you use these processes
>>>> in I-R.   Then, for each such process, have a detailed discussion
>>>> with real worst-case numbers to show that it is scalable at every
>>>> level for some worst-case numbers of EID prefixes, IRON routers etc.
>>>> - as well as secure against various kinds of attack.
>>>
>>> Does the analysis I gave above help? If so, I can put
>>> it in the next version of IRON.
>>
>> This is the sort of example I am hoping you will add.  But first I
>> think there are two questions I raised which would need to be
>> resolved before your example would be realistic according to my
>> understanding of I-R:
>>
>>   1 - How does an IRON router discover all the IRON routers
>>       advertising a VP?  The I-R BGP overlay network does not
>>       provide this, as far as I know.
> 
> We believe that OSPF with NBMA link model (or equivalent)
> could be used.

OK.


>>   2 - Allow for 20 or so routers each advertising the one VP,
>>       for the purposes of supporting packets from non-upgraded
>>       networks.
> 
> We don't need 20; we only need 2-3. And, the bubble
> interval (aka the "lease lifetime" can probably be
> pushed out by a factor of ~100.

OK.


>> Assuming 2 is accepted, and 1 is somehow achieved, we now have, for
>> each of the 20 VP routers, 0.5Mbps of registration traffic.  That's a
>> lot of traffic and a lot of crypto processing to do.
> 
> Crypto is not needed on each and every bubble;
> only on the first bubble.

OK.


>> It is no-doubt more efficient than the ~100k or so extremely
>> expensive BGP routers of today's DFZ fussing around comparing notes
>> about 300k prefixes.  However, I don't think it scales as well as an
>> alternative:
>>
>>   http://tools.ietf.org/html/draft-whittle-ivip-arch
>>   http://tools.ietf.org/html/draft-whittle-ivip-drtm
>>
>> which doesn't have such continual flows of registration, mapping etc.
>> data, unrelated to the traffic flowing to a given micronet, or to
>> changes in the ETR to which the micronet is mapped.
> 
> I think we have learned a few things about the scaling,
> and there are solutions. Consider now the bubble interval
> as being analogous to the DHCP lease lifetime, and scaling
> can be greatly improved for (much) longer bubble intervals.

OK - but you still need to design a registration mechanism before we
can think in detail about scaling.


>> I am not suggesting you adopt "ITR" and "ETR" instead of "ITE" and
>> "ETE" - which I agree are more apt terms.  I was just explaining why,
>> for now, I will stick with "ITR" and "ETR" for Ivip.
> 
> OK - Fred

OK.

   - Robin