Re: [rrg] Pumping IRON

Robin Whittle <rw@firstpr.com.au> Sun, 11 July 2010 17:28 UTC

Return-Path: <rw@firstpr.com.au>
X-Original-To: rrg@core3.amsl.com
Delivered-To: rrg@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 12A9A3A69DD for <rrg@core3.amsl.com>; Sun, 11 Jul 2010 10:28:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.75
X-Spam-Level:
X-Spam-Status: No, score=0.75 tagged_above=-999 required=5 tests=[AWL=-0.555, BAYES_50=0.001, HELO_EQ_AU=0.377, HOST_EQ_AU=0.327, J_CHICKENPOX_34=0.6]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ras1va+z3D1F for <rrg@core3.amsl.com>; Sun, 11 Jul 2010 10:28:17 -0700 (PDT)
Received: from gair.firstpr.com.au (gair.firstpr.com.au [150.101.162.123]) by core3.amsl.com (Postfix) with ESMTP id 8C5BE3A69DC for <rrg@irtf.org>; Sun, 11 Jul 2010 10:28:15 -0700 (PDT)
Received: from [10.0.0.6] (wira.firstpr.com.au [10.0.0.6]) by gair.firstpr.com.au (Postfix) with ESMTP id C2894175A6F; Mon, 12 Jul 2010 03:28:20 +1000 (EST)
Message-ID: <4C39FF33.106@firstpr.com.au>
Date: Mon, 12 Jul 2010 03:28:19 +1000
From: Robin Whittle <rw@firstpr.com.au>
Organization: First Principles
User-Agent: Thunderbird 2.0.0.24 (Windows/20100228)
MIME-Version: 1.0
To: RRG <rrg@irtf.org>
References: <E1829B60731D1740BB7A0626B4FAF0A649E12F245D@XCH-NW-01V.nw.nos.bo eing.com><C82E58BE.12C59%tony.li@tony.li><E1829B60731D1740BB7A0626B4FAF0A64 9E133AAE2@XCH-NW-01V.nw.nos.boeing.com><E1829B60731D1740BB7A0626B4FAF0A649E 13CD46B@XCH-NW-01V.nw.nos.boeing.com><E1829B60731D1740BB7A0626B4FAF0A649E14192D9@XCH-NW-01V.nw.nos.boeing.com> <E1829B60731D1740BB7A0626B4FAF0A649E1469BCE@XCH-NW-01V.nw.nos.boeing.com> <E1829B60731D1740BB7A0626B4FAF0A649E1469D73@XCH-NW-01V.nw.nos.boeing.com>
In-Reply-To: <E1829B60731D1740BB7A0626B4FAF0A649E1469D73@XCH-NW-01V.nw.nos.boeing.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 7bit
Subject: Re: [rrg] Pumping IRON
X-BeenThere: rrg@irtf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IRTF Routing Research Group <rrg.irtf.org>
List-Unsubscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=unsubscribe>
List-Archive: <http://www.irtf.org/mail-archive/web/rrg>
List-Post: <mailto:rrg@irtf.org>
List-Help: <mailto:rrg-request@irtf.org?subject=help>
List-Subscribe: <http://www.irtf.org/mailman/listinfo/rrg>, <mailto:rrg-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sun, 11 Jul 2010 17:28:23 -0000

Short version:   Fred's IRON proposal has changed a lot from the
                 06 revision.  The new 07 & 08 versions are
                 functionally the same.

                 I hope other RRG folks will read and comment on
                 this work.  This is a promising Core-Edge
                 Separation architecture being developed right now
                 - with a design which differs significantly from LISP
                 and Ivip.

                 With the 07-08 version, the whole system - not just
                 its mobility aspect - has some elements in common
                 with TTR Mobility.


Hi Fred,

Here are some comments on your part of your latest I-D:

  http://tools.ietf.org/html/draft-templin-iron-08

I got up to the end of section 6.1.  If you can respond to this, then
I will continue reading and commenting on the rest of the I-D.

Thanks very much for acknowledging the influence of TTR Mobility in
your new design.


  - Robin



Page 6:

  At the top of the page, I suggest you mention that VET and SEAL
  - or is it just SEAL - do encapsulation in a manner which handles
  Path MTU Discovery problems.  I am not sure I fully understand or
  agree with how this is done, but at least you do attempt to handle
  it.  I think that here, or somewhere later in the document, you
  should describe what the challenges are, and how VET+SEAL handles
  each challenge.

  For instance, does IRON for IPv4 handle DF=0 (fragmentable)
  packets - and if so, of what maximum length?  How does it handle
  them?  It is easy to imagine a host emitting 9k byte DF=0 packets
  and IRON-VET-SEAL dutifully chopping each packet into ~1400 byte
  pieces and reassembling them at the destination network's IR(CP)
  router.  This would be very inefficient and fragile, compared to
  having the host itself do DF=1 PMTUD and so craft its packets to
  a length with which they could be encapsulated and tunnelled
  across the world without fragmentation.

  With Ivip's approach (there's a lot of work to do on this):

    http://www.firstpr.com.au/ip/ivip/pmtud-frag/

  I plan not to handle DF=0 packets longer than some globally
  agreed length, such as 1200 bytes or so, which we think every ITR
  can tunnel to every ETR, without exceeding an MTU limit.  DF=0
  packets longer than this will be dropped.  Google and anyone
  else which habitually uses DF=0 packets longer than this will be
  expected to use shorter than 1200 byte packets or do what
  almost everyone else does - use socially responsible DF=1 packets.

  I guess most readers won't know what a "VET virtual tunnel
  interface" is.  I understand it is some kind of software construct
  in every IR router.  The tunnels are not previously established,
  and involve no set-up overhead.  It is not clear yet what packets
  get to this internal interface to be tunneled and which don't.

  Yet, below, it becomes clear that there is an important role for
  a 2-way tunnel, established by the IR(CP) router to an IR(BR)
  router, where the IR(BR) router sends packets to the IR(CP)
  router only after the tunnel has been established, and any NAT
  state established - with the destination address for the
  IR(BR) -> IR(CP) tunnel being that of the NAT box, not the
  behind NAT address of the IR(CP).


Page 7:

  I think section 3.2 on the IR(BR) IRON Border Router is too vague.

  Why is it called a "Border Router"?  A BR, I understand,
  traditionally means an ISP's router which connects to other ISPs -
  they are BGP routers.

  The term doesn't make proper sense as a "border" router for
  the IRON system, since IR(GW) and IR(CP) routers are also
  part of the IRON system and are not IR(BR) routers - and
  IR(GW) and IR(CP) routers do connect to non-IRON routers.

  The description:

        An "IR(BR)" is a Border Router that is managed by a VPC
        and that provides forwarding and mapping services for
        the EPs owned by their customer IR(CP)s.

  leaves the reader to guess that these IR(BR) routers are
  traditional ISP Border Routers, which are BGP routers - but
  this is not confirmed or denied by the diagram.

  Are they BGP (AKA "core") routers?  Later (page 10) it emerges
  they are not.

  What do they connect to?  The diagram gives no indication, other
  than the reader assuming they have at least one ordinary, non-EPA,
  global unicast address so they can communicate with other IR(BR),
  IR(CP) and IR(GW) routers.

  Since they require only a single physical connection, they are
  not routers in terms of handling two or more physical links.

  Instead of "commodity general purpose processors" it might be
  better to use "COTS (Commercial Off The Shelf) servers".


Page 9:

  Perhaps, if it is true, after:

      ". . . where each patch can be discussed independently of
       all others."

  add something like:

      "Each patch - each VP overlay network, with its constituent
       IR(CP/BR/GW) routers - can operate independently of the other
       patches. "

  (But later I found this is not entirely true - since each VPC
  system needs to know the total set of all VPs of all the VPCs.)

  If this is true, then does this mean that each EUN has its IR(CP)
  routers working with a single VPC company's IR(BR/GW) routers?

  If so, then this means that an EUN would get all its EP address
  space from a single VPC (Virtual Prefix Company) - though these
  EP prefixes would not all need to be in the same VP.

  Even then, an EUN could use EP space from two or more VPCs via
  the one physical IR(CP) router, if it was programmed suitably
  and so behaved as an independent IR(CP) router to each of the
  two or move VPC "patches" which it works with.

  Maybe there needs to be a better name for "patch" to describe these
  generally independent VPC-owned subsets of the IRON system.

  At the end of the second paragraph in section 4, perhaps it would
  be good to state, if it is true, something like:

      "A VPC might handle IPv4 only, IPv6 only or both IPv4 and IPv6
       protocols.  If the VPC handles both, then its systems for
       handling these two protocols are logically independent.  That
       is, they could in principle be implemented with completely
       separate sets of IR routers.  In practice, it seems more
       likely that some, many or perhaps all BR(VP/BR) routers
       operated by such a VPC would handle both protocols.  An
       EUN's IR(CP) router would connect to other IR(CP/GW/BR)
       routers by one or both protocols."

  However, I note in page 10 ("or, when the IR(CP) does not configure
  a locator of the same protocol version of its EPs") that you can
  have IPv6 packets tunnelled over IPv4 and vice-versa.  So the
  separation between IPv4 and IPv6 is not necessarily complete.


Page 10:

  Para 1:

     "a dynamic routing protocol".

     I think it would be good to be more specific about this, or
     to note where later in the document this is discussed.

     "Each IR(BR) will therefore commonly maintain only partial
      topology information representing the EPs in its working
      set, . . . "

     "Topology" doesn't seem the right word to me.  Each IR(BR)
     will have a "mapping" for a subset of the EPs in each of its
     VPC's VPs.  The mapping will be a single global unicast non-
     EP address of the IR(CP) which this EP is currently mapped to.


  The definition of EUN seems awkward at this point.

  If I had a company XYZ with 5 branches, each in a different
  city or country, then each of these 5 sites might be an "EUN"
  according to the current definition.  Each such site would have
  at least one - or only one? - IR(CP) router.

  However, perhaps "EUN" refers to the administrative object of my
  XYZ address space management system, where I get one or more
  "chunks" (I need a fresh word here) of space in one or more VPs
  (for simplicity of discussion) from a single VPC.  Then I split
  those chunks up into EPs as I wish, and map at least one, or
  perhaps more, of these EPs to one of the IR(CP) routers at my
  sites.

  In Ivip, the VP is called a MAB (Mapped Address Block).  The
  "chunk" is a UAB (User Address Block) and the contiguous
  range of addresses with a single mapping is a "Micronet".

  So in Ivip there are separate terms for the UAB and micronet
  while in IRON, the term EP seems to correspond to "micronet"
  while there is no term equivalent to "UAB".  This may be OK,
  since a UAB is an administrative arrangement between the EUN
  (the customer organization, not one of its sites or mobile
  devices) and the MABOC (MAB Operating Company), which is
  closely analogous to the IRON VPC.  In Ivip, the ITRs and
  ETRs know about micronets, but have no knowledge of UABs.

  In IRON, since the VPC and its VPs need to delegate specific
  EPs (like micronets) to specific IR(CP) routers, I guess there
  is a way the end user organization can request its overall
  "chunk" of address space be split up into these EPs as it
  desires, and also to specify which IR(CP) each EP is
  mapped to.

  The second paragraph on page 10 is a doozy.  Here are my
  thoughts, for the moment assuming purely IPv4 or purely IPv6.

         Customers establish IR(CP)s to connect their EUNs to the VPC
         overlay network.  Unlike IR(GW)s and IR(BR)s, IR(CP)s may use
         private addresses behind one or several layers of NATs.  The
         IR(CP) initially discovers a list of nearby IR(BR)s through an
         exchange with its VPC.  It then forms tunnels with one or more
         of the IR(BR)s through initial exchanges followed by periodic
         keepalives, and adds each IR(BR) to its default routers list.

  So no matter whether the IR(CP) router is on a global unicast
  ordinary IP address (not an EPA address) or whether it is behind
  one or more layers of NAT, with the outermost NAT box on such
  an address, it creates two-way tunnels from itself to one or
  more ideally nearby IR(BR) routers.

  When the IR(CP) router has a traffic packet to forward to the
  rest of the Net, all those packets go out via encapsulation
  to one of various possible IR(BR) routers.  This means that
  the ISP which provides connectivity used by this IR(CP) router
  doesn't need to alter anything.  Specifically, if the ISP
  drops any packets emerging from the IR(CP) router which do not
  have their source address within the subnet the ISP provides to
  that router, then the IRON system will still work fine, because
  the IR(CP) router never forwards packets directly to the ISP's
  network when their source address is that of the EUN: and EPA
  address within one of the EUN's EPs.

         When the IR(CP) configures a locator address behind a NAT
         (or, when the IR(CP) does not configure a locator of the
         same protocol version of its EPs), it uses encapsulation
         to forward all packets from its EUNs via one of the IR(BR)s
         in its default router list.  The IR(BR)s in turn will forward
         the packets further toward their final destination.  When the
         IR(CP) configures a locator on the public Internet with the
         same protocol version of its EPs, however, it can forward
         packets with EPA destination addresses directly to the
         IR(BR)s of its correspondents via encapsulation without
         involving one of the IR(BR)s in its default router list.
         (The IR(CP) must instead forward packets with non-EPA
         destination addresses to an IR(BR) in its default router
         list via encapsulation to avoid ISP ingress filtering.)

  Here, for simplicity, I assume a single "nearby" IR(BR) router
  which the IR(CP) router has already established a two-way tunnel
  to.

  If the IR(CP) router is on an ordinary, global unicast, non-EPA,
  non-NAT, address, here are the methods of handling outgoing
  packets:

    1 - Packet is addressed to an ordinary non-EPA address:

          Tunnel it to the nearby IR(BR) - which will
          forward it via the ordinary Internet routing
          system to the destination.

          This implies something important about the
          placement of each IR(BR) which has not so far
          been stated: It can't be on an ordinary address
          provided by an ISP, because the ISP is assumed to
          be potentially dropping packets emitted from any
          such address whose source address does not match
          that address.

          Therefore, I think this implies the IR(BP) routers
          are located in ISP networks somewhere where they are
          allowed to emit packets with any source address -
          in this case with source addresses of the EUN's
          EPA addresses.  They are not necessarily located in
          the same ISP network as the IR(SP).

    2 - Packet is addressed to an EPA address:

         "it can forward packets with EPA destination
          addresses directly to the IR(BR)s of its
          correspondents via encapsulation without
          involving one of the IR(BR)s in its default
          router list."

          I think this use of "forward" is confusing - only later
          in the sentence do we read "via encapsulation".  I think
          that you really mean "tunnel".

          The IR(CP) router tunnels the packet to an IR(BR)
          router somewhere which is a "nearby" IR(BR) router
          of the IR(CP) router of the EUN which is using this
          prefix of EPA address space.

          Therefore, (I thought at first, but read on . . . )
          the IR(CP) router must be able to do a lookup from the
          destination EPA address to get the address of this
          typically remote IR(BR) router.

          Later, I figured that this mapping was learnt by
          route redirection.


  If the IR(CP) router is behind one or more layers of NAT, here
  are two methods of handling outgoing packets:

    3 - Packet is addressed to an ordinary non-EPA address:

          Tunnel it to the nearby IR(BR) - which will
          forward it via the ordinary Internet routing
          system to the destination.

          This is identical to 1 above.

    4 - Packet is addressed to an EPA address:

          Tunnel it to the nearby IR(BR) - which will
          tunnel it to an IR(BR) router which is "nearby"
          the destination network's IR(CP) router.

          As far as the IR(CP) router is concerned, this
          is identical to 1 and 3 above.

          It implies that this IR(CP) router's nearby
          IR(BR) router does a lookup (or by some other
          means - route redirection - gains mapping) on
          the EPA destination address to determine the
          address of the IR(BR) router to tunnel it to.

  From the point of view of the IR(CP) router, there are only two
  approaches to handling outgoing packets:

     1, 3 & 4:  Tunnel it to the nearby IR(BR) router.

     2:         Tunnel it to whichever IR(BR) router is "nearby"
                to the destination network's IR(CP) router.

  Since the IR(CP) router receives all packets via its nearby
  IR(BR) router (or perhaps there are multiple of them, which I
  will ignore for the moment) then I think the first approach -
  1, 3 & 4 - is identical to TTR Mobility.

  There, the TTR performs the role of the nearby IR(BR) router.
  The micronet is mapped to the ETR part of this TTR, and the MN
  (Mobile Node) - which is like the IR(CP) router - establishes a
  2-way tunnel from its CoA (Care of Address) in the access
  network, which could be behind one or more layers of NAT.  Then
  the MN sends and receives all packets via this TTR.

  (There can be two or more TTRs, but at any one time, the micronet
  is mapped to one TTR.  Two TTRs are useful when switching mapping
  from one to the other, so there are two separate tunnels, and the
  MN gets incoming packets no matter whether they are tunneled to
  the old TTR or the new one.)


  For simplicity, I am ignoring IPv4 packets being tunneled to and
  from an IPv6-connected IR(SP) for it IPv4 IRON EPA-using EUN.


  Previously, my understanding of VET-SEAL tunneling was that it
  could - and perhaps would always - be based on a zero-set-up
  arrangement.  Device A could tunnel a traffic packet to device B
  without any prior arrangement.  All it needs is B's IP address
  and it encapsulates the traffic packet, and forwards it.  There
  is no acknowledgement - no flow control etc.  However, there
  could be a route redirection message, secured with a nonce which
  was sent in the header which accompanied the encapsulated traffic
  packet.  This is an entirely one-way process, devoid of any
  requirement for a packet flowing from B to A.

  However, with approaches 1, 3 & 4 above, something else must be
  happening.  Initially, the IR(BR) can't tunnel to the IR(CP)
  router, partly because it doesn't know its address until the
  IR(CP) tunnels to it - but also it would be impossible if the
  IR(CP) is behind NAT.

  The IR(CP) establishes an implicitly 2-way tunnel from its address,
  which could be behind one or more layers of NAT.

  That's fine for IR(CP) to the nearby IR(BR) router.  But when the
  IR(BR) router needs to tunnel a packet to the IR(CP) router, it
  can only do so after the IR(CP) router has sent it a packet first.

  Assuming the IR(CP) router is behind NAT and the NAT box's
  public IP address is PPPP, then the nearby IR(BR) router sees
  the initial packet from the IR(CP) router arriving with a source
  address of PPPP.  Therefore, it must be programmed to use this
  address, rather than whatever address the IR(CP) router is actually
  on, to tunnel packets to this IR(CP) router.  As long as the NAT
  box is working properly, and the IR(CP) router sends keepalives
  to ensure the NAT box retains its state, then the NAT box will
  dutifully rewrite the destination address PPPP of the packet and
  address it to the IR(CP) router.

  I don't recall much about VET and SEAL, but this is all feasible.
  I just want to note that in other circumstances I thought of these
  tunnels as being not necessarily 2-way - but in this case, it
  must be 2-way and it must be established by the IR(CP) router.
  This is identical to TTR Mobility.  This 2-way tunnel material
  seems to be new in version 07 and 08.

  The last paragraph of section 4 concerns protocols other than
  IP. I am definitely going to skip this stuff!


Section 5.1 IR(GW) Initialization:

   This is completely new material in versions 07-08.

         Before its first operational use, each IR(GW) in the VPC
         company's overlay network is pre-provisioned with the list
         of VPs that it will serve as well as the locators for all
         IR(BR)s that also serve the VPs.

  I understand the concept of the VPC's set of IR(GW) routers
  "serving" its one or more VPs, in that each such router is
  a BGP router and advertises the VP in the BGP DFZ.

  Perhaps I am being overly fussy about terminology, but I don't
  understand how there is a set of IR(BR) routers which "serve"
  one or more VPs.

  Again, for simplicity, I will consider a pure IPv4 or pure
  IPv6 system.  I assume the VPC runs:

     1 - A set of routers which perform IR(GW) functions for its
         one or more VPs.  These are all BGP routers.

     2 - A set of routers which perform IR(BR) functions.  Some
         or in principle perhaps all of these could be the same
         routers as the IR(GW) routers.

         An IR(BR) router is not required to be a BGP router.
         However, like an IR(GW) router, it must have a stable
         ordinary (non-EPA) global unicast address.  It must
         be able to emit packets whose source address are EPA
         addresses, at least of VPs belonging to this VPC,
         and have those forwarded to their destinations.  Since
         you assume that ISPs will generally prevent this
         from occurring on their ordinary customer services,
         it follows that these IR(BR) routers must be placed
         in parts of ISP networks where such source address
         filtering is not imposed.

   So I guess you mean the complete set of IR(BR) routers
   this VPC runs.

   Each such IR(BR) doesn't do anything like "serve a VP".
   It handles one or more particular subsets of a VP -
   one or more EP prefixes.


         Upon startup, the IR(GW) engages in BGP routing exchanges
         with its peers in the IPv4 and/or IPv6 Internets the same
         as for any BGP router.  It then connects to all of the
         IR(BR)s that service its VPs for the purpose of
         discovering EP->IR(BR) mappings.

   OK.  "connects to" means some kind of communication, perhaps
   in a tunnel to that IR(BR) router - which can also be used
   bidirectionally for traffic packets?

         After the IR(GW) has thus fully populated its EP->IR(BR)
         mapping information database, it is said to be "synchronized"
         wrt its VPs.  The IR(GW) then advertises its synchronized VPs

   I think the second "synchronized" is redundant and perhaps confusing.

         into the IPv4 and/or IPv6 Internet BGP routing systems and
         engages in ordinary packet forwarding operations.

   OK - section 6.3 describes how IR(GW) handles traffic packets.
   It is not exactly an ordinary router, so "ordinary packet
   forwarding" doesn't convey anything useful, I think.



Section 5.2: IR(BR) Initialization

   This is completely new material in versions 07-08.

         Before its first operational use, each IR(BR) in the VPC company's
         overlay network is pre-provisioned with the list of VPs that it will
         serve as well as the locators for all IR(GW)s that also serve the
         VPs.

   OK, except that I think an IR(BR) doesn't "serve" VPs as such.

   Maybe when I read further I will find that it advertises the VPs inside
   whatever ISP network it resides in, like an Ivip DITR or LISP PTR.
   IR(BR)s can't advertise VPs in the DFZ, because they are not DFZ routers
   - or at least need not be DFZ routers.

   I want to know where the DITR/PTR equivalents are in IRON, but there is
   a lot more text to go - so I figure all will be revealed in due course.

         In order to support route optimization, the IR(BR) must also be
         pre-provisioned with the list of all VPs in the IRON (i.e., and not
         just the VPs of this VPC) so that it can discern EPA and non-EPA
         addresses.

   OK - the various "patches", one patch for each VPC, are not
   totally independent, since the IR(BR) routers of one VPC's patch
   need to know about the VPs of every VPC.

         Upon startup, the IR(BR) connects to all of the IR(GW)s that service
         its VPs for the purpose of reporting its EP->IR(BR) mappings.

   OK.  There's no detail of the protocols, but it is easy to imagine
   this being done.

         The IR(BR) then actively listens for IR(CP) customers which will create
         a two-way tunnel while registering its EP prefixes.  When a new IR(CP)
         registers its EP prefixes, the IR(BR) informs all IR(GW)s of the new
         EP additions; when an existing IR(CP) unregisters its EP prefixes,
         the IR(BR) informs all IR(GW)s of the deletions.

   OK.  There's no details of exactly how the IR(CP) router securely
   tells the IR(BR) router which EP prefixes it is authoritative for -
   but there would be a way of implementing this and we don't really need
   to know the details yet.

   While I think there has been no mention of the number of IR(GW)
   routers each VPC runs, I guess the number is between 3 and 10.

   I suggest giving some indication of these numbers - since it helps the
   reader build their mental model.

   If it is 3 to 10 or so then this looks like it will scale well.


Section 5.3: IR(CP) Initialization

   This is substantially changed from version 06 to versions 07-08.

         Before its first operational use, each IR(CP) must obtain one or more
         EPs from a VPC along with a certificate and a public/private key pair
         from the VPC that it can later use to prove ownership of its EPs.
         This implies that each VPC must run its own key infrastructure to be
         used only for the purpose of verifying a customer's claimed right to
         use an EP.  Hence, the VPC need not coordinate its key infrastructure
         with any other organizations.

   OK.  The IR(CP) will need to be configured with some kind of secret
   or whatever so it can authenticate itself to one or more of the
   IR(GW) routers, via IR(BR) routers, for each of the one or more EP
   prefixes it claims to be responsible for.  However, I see below that
   the IR(CP) router doesn't deal directly with the IR(GW) routers - but
   with one or a few ideally nearby IR(BR) routers.  The IR(BR) routers
   would presumably act as intermediaries and involve one or more IR(GW)
   routers or some other VPC server to check each claim by an IR(CP)
   router to be handling some EP prefix.  This should scale OK.

                                         In order to support route
         optimization, the IR(CP) must also be pre-provisioned with the list
         of all VPs in the IRON (i.e., and not just the VPs of this VPC) so
         that it can discern EPA and non-EPA addresses.

   OK - just as is the case for IR(BR) routers.  We are yet to learn what
   "route optimization" means - but I already know it involves initial
   packets taking a path via a VP router to the destination network's
   currently mapped IR(BR) router, and subsequent packets taking a
   more direct path to it, after the IR(GW) router sends a route
   optimization message, as part of the VET/SEAL protocol, to whichever
   router is the start of the tunnel.


         Upon startup, the IR(CP) contacts its VPC (e.g., via a simple client/
         server exchange) to discover a list of locators of the company's
         nearby IR(BRs).  (This list is analogous to the ISATAP Potential
         Router List (PRL) [RFC5214].)  The IR(CP) then selects a subset of
         IR(BR)s from this list and tests them through a qualification
         procedure.  The IR(CP) then registers its EP prefixes with one or
         more qualified IR(BR)s and adds them to a default router list.

   OK.  There's no information on how the VPC control system (perhaps
   its IR(GW) routers) figures out a subset of its IR(BR) routers which
   may be close to this IR(CP) router.  I am not sure how this could
   be achieved reliably and/or without a lot of trouble.

   You could do a rough system based on the IP addresses of the
   IR(BR) routers and the source address of a packet received by
   an IR(GW) router when the packet is sent from the IR(CP) router.
   That source address is either the actual address of the IR(CP)
   router, which can easily be known by the IR(CP) router reporting
   its own address in the packet, or it is the public address of the
   NAT box (or outermost NAT box of several) which the IR(CP) router
   is behind.

   Then you could do a rough geographical test, based on some pretty
   tricky to maintain knowledge of where these IP addresses are.
   Then, if the IR(CP)'s address seemed to be in North America, the
   system could send it a list of IR(BR) routers which seemed to be
   in North America.

   If that is too many for the IR(CP) router to try tunneling to, or to
   ping or whatever, then the system could try a smaller subset based
   on what it thinks is close.  If the IR(CP) router pings these and
   chooses two or more or whatever which are sufficiently "close"
   by some metric (RTT and hop-count, perhaps - maybe hopcount of BGP
   routers, by some tricky algorithm) then all would be well.

   If the IR(CP) router tries them all and finds the "closest" doesn't
   appear to be very close, then the system could send it another list,
   or a bunch of addresses of IR(BR) routers all around the world, and
   get a report back about the apparent closeness of these.  This could
   iterate until the IR(CP) router got one or more IR(BR) routers
   which the system decided were as close as could be found.

   DRTM (Ivip's Distributed Real Time Mapping) involves a similar
   "closeness" finding algorithm to find nearby authoritative query
   servers.  Likewise TTR Mobility requires some fancy stuff so an
   MN can find one or perhaps more close TTRs, which is directly
   analogous to these IRON operations.  I expect all three sets of
   operations will need some work to make it operate correctly and
   scale well.


   My description involves both the IR(GW) router and the IR(CP)
   router trying to find the closest one or a few IR(BR) routers to
   the IR(CP) router.  But re-reading your text, I see all the
   closeness selection is done by the IR(GW) router, presumably
   based on the source address as mentioned above.  The IR(CP)
   router simply selects one or more IR(BR) routers from this list
   and in some way qualifies them - I guess makes a tunnel to them
   and checks with them that it is OK to use them for traffic.

   Because I think the IR(GW) router will have a difficult time
   choosing sufficiently close IR(BR) routers, I think there needs
   to be a second stage, as I described, with the IR(CP) router
   actively testing each provided IR(BR) router for closeness, and
   picking the one or two or three or whatever which appear to be
   "closest".  As I noted, I think there needs to be provision for
   refining this process if the original list doesn't have any
   sufficiently close IR(BR) routers.

   *** Note added later:  I see in later paragraphs that the
       IR(CP) router's "qualification procedure" does involve
       measuring closeness - so in the current paragraph, I
       suggest you mention this here, since "qualification
       procedure", to me, doesn't imply selection of the
       closest.


   So at the end of this process, each IR(CP) router has in
   its "default router list" one or several genuinely close
   IR(BR) routers which are ready to handle traffic to and from
   it.  In so doing, the IR(CP) router has set up a 2-way
   tunnel with each such IR(BR) router.

   This would be a good place to mention how many IR(BR) routers
   would be tunneled to and selected for ongoing traffic in
   this way.  Also, to mention the purpose of having two or more.

   I guess it is for redundancy if one IR(BR) dies, or can't be
   relied upon - the mapping of the EP would be changed to one of
   the other IR(BR)s which the IR(CP) router already has a 2-way
   tunnel to.



Section 6. IRON Operation

         Following this IRON initialization, IRs engage in the steady-state
         process of receiving and forwarding packets.  All IRs forward
         encapsulated packets over the IRON using the mechanisms of VET
         [I-D.templin-intarea-vet] and SEAL [I-D.templin-intarea-seal], while
         IR(GW)s and IR(BR)s additionally forward packets to and from the
         native IPv6 and IPv4 Internets.

   OK - more details are below, and I see that the IR(CP) routers do not
   either:

     1 - Receive traffic packets directly from the IPv4/6 Internet.

     2 - Send traffic packets directly to the IPv4/6 Internet.

   So in this respect each IR(CP) router resembles a MN in TTR
   Mobility and the IR(BR)s resemble the TTRs - acting as both ETRs
   and as routers for all outgoing packets, often with an ITRs
   function built in.  A significant difference between TTR Mobility
   and IRON is that if the IR(CP) router is not behind NAT, its
   outgoing packets to EPA addresses do not go via this TTR-like
   IR(BR), but are (ultimately, but perhaps not the initial packets
   in a new flow) tunneled directly to the ETR-like IR(BR)s which
   are close to the destination EUNs and their IR(CP) routers.

   So for traffic going to the EUN, there is double handling, as
   there is in TTR Mobility.  The packet is tunneled to the IR(BR)
   [TTR, acting as an ETR to the rest of the Ivip system] which
   detunnels it and then tunnels it to the IR(CP) [MN].

   From the point of view of the IR(CP) router, all outgoing
   packets to non-EPA addresses, and all outgoing packets to
   EPA addresses if the IR(CP) is behind NAT, are also subject
   to double handling, as in TTR Mobility:

      The IR(CP) [MN] tunnels the packet to a nearby IR(BR)
      [TTR] which then either detunnels and forwards the non-EPA
      packet or tunnels the EPA packet to another IR(BR) router
      which is close to the destination EUN's IR(CP) router, and
      already has a 2-way tunnel from that IR(CP) router.

   EPA-addressed outgoing packets from an IR(CP) not behind NAT
   are not double-handled - at least after route redirection takes
   place.  (But I am going from memory of previous versions - so
   the ordinary reader doesn't know this yet.)  In this case, the
   IR(CP) behaves rather like an Ivip ITR and tunnels the packets
   to a distant IR(BR) as just mentioned.

   This TTR-like arrangement means you can definitely do all this
   even when the IR(CP) is behind NAT and when the ISP won't allow
   outgoing packets with EPA source addresses.

   This TTR approach could be used for some or all non-mobile
   networks with Ivip, but I tend to assume that it won't be,
   because it will be better to have ETRs definitely on ordinary non
   Ivip-mapped addresses, and for ISPs to suspend any source address
   filtering they are now doing.

   Generally, as more and more networks use Ivip or IRON, the concept
   of an ISP dropping outgoing packets due to the source address being
   from Ivip-mapped or IRON-mapped (EPA) addresses will seem old-hat.

   I am assuming the ISPs will get hip to Ivip or to IRON, which
   removes the need for double-handling of packets to and from
   non-mobile devices/subnetworks.  If so, then for any EUN whose
   Internet connection is behind NAT, or where the ISP is still doing
   source address filtering, this TTR Mobility approach would still
   be needed.


   I think it is vital that the one or more IR(BR) routers chosen by
   each IR(CP) router be genuinely close, and also have sufficient
   capacity to handle its traffic, considering all the other
   traffic they need to handle.

   This raises many questions about where these IR(BR) routers are.

   The VPC company either needs to buy and install its IR(BR)
   routers, all over the Net, or it needs to hire the services of
   such routers from some other company, including perhaps another
   VPC, which already has a bunch of these routers all over the Net.
   Also, it could do some mixture of both.

   Presumably there will be a way the IR(BR) routers can go out of
   service and/or shed some of their load - so at any time an IR(CP)
   router might be notified that it needs to stop using one, and find
   another, or use another which it already has a 2-way tunnel to.

         IRs also use the SEAL Control Message Protocol (SCMP) to
         coordinate with other IRs, including the process of sending
         and receiving redirect messages for route optimization.  Each
         IR operates as specified in the following sub-sections.

  OK - Route Optimization by way of route redirection messages is a
  central part of IRON, and is unlike anything in LISP or Ivip.


Section 6.1. IR(CP) Operation

         During its initialization phase, the IR(CP) qualifies candidate
         IR(BR)s by sending SEAL Control Message Protocol (SCMP) Router
         Solicitation (SRS) test messages to elicit SCMP Router Advertisement
         (SRA) messages.  The IR(BR) will include the header of the soliciting
         SRS message in its SRA message so that the IR(CP) can determine the
         number of hops along the forward path.  The IR(BR) also includes a
         metric in its SRA messages indicating its current load average so
         that the IR(CP) can avoid selecting IR(BR)s that are overloaded.

   OK - this looks good.  You seem to have the IR(CP) routers deciding which
   IR(BR) routers to use, rather than this being explicitly controlled by
   central VPC servers.  Perhaps the latter approach would enable better
   real-time balancing of the workload of IR(BR) routers.

                                                                         The
         IR(CP) can also measure the round trip time between sending the SRS
         and receiving the SRA as an indication of round-trip delay.  Finally,
         the IR(CP) examines the SRA messages to determine whether there is a
         NAT on the path

   At least one NAT - I am not sure if it can detect the fact that there
   are two or more layers of NAT, or whether it matters that there are
   two or more.

                           to its candidate IR(BR)s via each of its ISP
         connections.

   This is the first mention of the IR(CP) having two or more ISPs.

   Perhaps this needs to be explained earlier, but it is hard to know
   what order to present this stuff.

   If the IR(CP) has two ISPs ISP-A and ISP-B, with a separate
   physical link to each, then it can send out SCMP messages to each
   candidate IR(BR) router via each link, using the address it has
   from each of these ISPs.

   Each reply will come back through the same link it was sent out of.

   All this looks good to me.

   The IR(CP) has to choose not only one or more IR(BP)s to use, but
   which of its two or more ISP links to use when making the 2-way
   tunnel to it.

                      The IR(CP) determines whether there is a NAT on the
         path by examining the address and UDP port number in the header of
         the soliciting SRS message that the IR(BR) will reflect in its SRA
         messages.  If the locator and port number reflected in the SRA
         messages does not match the locator and port number the IR(CP) uses
         for tunneling, then the IR(CP) can know that it is behind a NAT.

   Yes - one or more layers of NAT.

         After the IR(CP) determines one or more preferred IR(BR)s, it
         registers its EP-to-locator bindings with the IR(BR)s by sending SRS
         messages with signed certificates and prefix information to prove
         ownership of its EPs.

   I think the registration process needs to be more complex than this.

   Sending a "signed certificate" is equivalent to sending a password in
   the clear.  If an attacker can somehow get a copy of this, then they
   can hijack the EP.

   I think there needs to be a challenge response arrangement where the
   IR(BR), or the IR(GW) via an IR(BR) sends some kind of number or
   message to the IR(CP) and the IR(CP) sends back some function of
   this which proves to the VPC company routers (either the IR(BR), the
   IR(GW) or some other router or server) that the IR(CP) really has
   whatever secret key it needs to be granted "ownership" of the
   EP prefix.


   CHAP uses such a protocol, but there are plenty of others.  Its late
   and I won't think about it in detail - but I believe you need a
   cryptographically secure challenge-response protocol so the VPC
   system can prove the device which initiated the 2-way tunnel to
   one of its IR(BR) routers really has the secret key material, or
   the digital certificate or whatever.  This needs to be done in a
   way that the secret material never leaves the IR(CP), including
   via an encrypted link.


                                  The SRS message will elicit an SRA message
         from the IR(BR) that includes a non-zero default router lifetime and
         that signifies the establishment of a two-way tunnel.  (A zero
         default router lifetime on the other hand signifies that the IR(BR)
         is currently unable to establish a two-way tunnel, e.g., due to heavy
         load.)

  OK.

                 The IR(CP) should send separate beacons to each IR(BR) via
         each of its ISP connections in order to establish multiple two-way
         tunnels for multihoming purposes.

   "Beacons" is a new term - but I guess you mean the above process of
   securely establishing a 2-way tunnel to the chosen IR(BR).

   So if the IR(CP) has two ISPs, then it has two independent 2-way
   tunnels to the one IR(BR).

   There is also the possibility of doing this with more than one IR(BR),
   but so far its not clear how many, or why.

         After the initial EP-to-locator registrations, the IR(CP) sends
         periodic SRS beacons to its IR(BR)s to keep its two-way tunnels
         alive.


   This is confusing.  The first use of "beacon" seems to refer to
   the establishment of a 2-way tunnel from the IR(CP) to the IR(BR).
   The second usage seems to refer to a keepalive message.

         These beacons need not include signed certificates since
         prefix proof of ownership was already established in the initial
         exchange and the SEAL ID in the SEAL header can be used to confirm
         that the beacon was sent by the correct tunnel far end.

   I think that if you rewrite this to use challenge-response
   authentication, you will have two separate concepts with two
   separate names - one for the tunnel establishment, which follows
   or is an optional extension of the closeness measurement stuff -
   and the other which is simply a keepalive, perhaps with some
   exchange of status information.

   I suggest not using the term "beacon" for either of these.

                                                                 If the
         IR(CP) ceases to receive SRA messages from an IR(BR) via a specific
         ISP connection, it marks the IR(BR) as unreachable for that locator.

   Perhaps, "from that locator" where the locator is the IR(CP)'s address
   it gets from this ISP.  But IR(CP)'s addresses are not really "locators"
   since they could be behind NAT - and no address behind a NAT functions
   as a "locator" from outside the NATed network.  So I think "locator"
   in this sentence is not right.  Maybe:

                                                                 If the
         IR(CP) ceases to receive SRA messages from an IR(BR) via a specific
         ISP connection - via the 2-way tunnel it established to that IR(BR)
         from the address which the ISP gave it - it marks the IR(BR) as
         unreachable from that address, and therefore over that ISP connection.



         If the IR(CP) ceases to receive SRA messages from multiple IR(BR)s
         via a specific ISP connection, it marks the ISP connection as failed/
         failing.

   OK - but so far there is no explanation of why multiple IR(BR)s are
   needed, or how many there should be.

   Also, every 2-way tunnel to IR(BR)s via a given ISP link must fail
   before the IR(CP) should conclude the link itself is unusable.

                   The IR(CP) also uses the same SRS/SRA beaconing procedure
         to inform its IR(BR)s of a change in locator, e.g., due to changing
         to a new ISP connection during a mobility event.

   OK - this is the same as TTR Mobility.  I think the "beacon" term should
   be avoided, or used in a more specific and well explained manner.

   The concept of tunnel establishment to an already used TTR
   or IR(BR) is totally different from the keepalives and exchange of
   status information.  So "beacon" or any other single term can't be
   used to refer to both of these types of operations.

   From the point of view of the IR(BR) [TTR], the tunnel establishment
   phase involves it getting a packet from some novel address, claiming
   to be from an IR(CP) [MN] which is authorized to receive packets
   addressed to a particular PE prefix [micronet].

   The IR(BR) [TTR] needs to respond with a challenge, and the device
   at the other end - presumably the IR(CP) [MN] needs to send back a
   message, based on this challenge, which proves it has the secret
   key or whatever it needs to have in order to be authenticated as
   the proper destination for packets addressed to a given EP [micronet].

   There needs to be such an exchange for each EP [micronet] the
   the IR(CP) [MN] is handling.

   For TTR Mobility, I have assumed the MN is most likely handling
   a single micronet.  If TTR Mobility was used for more micronets
   per MN, or it was not an MN, but a router for a largish mobile or
   non-mobile network, which handled dozens of micronets then it
   might be better to have a short-cut to replace these multiple
   authentication sessions for each micronet, in the event that the
   MN had already established its credentials with the TTR for all
   these via one CoA, and now has another CoA which it needs to
   establish a two-way tunnel from to the TTR.

   For instance, the TTR [IR(CP)] could have done an initial
   authentication for each micronet [EP] and then securely sent
   the MN [IR(CP)] a nonce.

   Then, when the MN [IR(CP)] finds it has another CoA [new ISP
   and therefore new address for its interface which is making
   the new 2-way tunnel] the challenge response system only
   needs to show the device at the end of this new 2-way tunnel
   has this nonce.  Then there is no need to individually authenticate
   it for each of its micronets [EPs].

   This is an elaboration.  Generally I think it is sufficient to
   have a separate challenge-response authentication exchange for
   each micronet [EP] when the MN [IR(CP)] establishes a 2-way
   tunnel to the TTR 9IR(BR)].


         When an end system in an EUN has a packet to send, the packet is
         forwarded through the EUN via normal routing until it reaches the
         IR(CP), which then encapsulates the packet and forwards it either to
         one of its serving IR(BR)s or directly into the public Internet.

   I guess the choice between forwarding it to its serving IR(BR) or "to
   the public Internet" is similar to the choice between, as mentioned above:

     1, 3 & 4:  Tunnel it to the nearby IR(BR) router.

     2:         Tunnel it to whichever IR(BR) router is "nearby"
                to the destination network's IR(CP) router.

   This doesn't look right to me:

     "encapsulates the packet and forwards it . . . into the public Internet."

   unless by this you mean tunneling to IR(BR) routers of the destination
   networks, which only occurs when the destination address is an EPA
   address which is handled by some other IR(BR) than the nearby "serving
   IR(BR).

   But this may be because you haven't yet explained how the whole thing
   works - and explanations need to be done in some kind of order, not all
   at once.

   Because you assume the ISP won't forward packets with such source
   addresses, I understood, as noted above, that the IR(CP) needed
   to tunnel the outgoing packets to its nearby serving IR(BR), or
   to a distant IR(BR), irrespective of whether the IR(CP) is behind NAT
   or not.

   The above description accomplishes this, since the packet is encapsulated
   and the outer destination address is, I think, the same as the inner
   destination address of the traffic packet.  But the outer source
   address is that of the outgoing interface of the IR(CP), which could be
   behind NAT.  That outer source address will not fall foul of any ISP
   source address filtering, so this will be fine.

   What seemed odd about this, is that you are tunneling a packet to an
   ordinary Internet address . . . but checking back, this is case 2 only
   which is when the packet is addressed to an EPA address and the IR(CP)
   router's address it is using for this is on a global public address, not
   behind NAT.

   So you are tunneling the packet "to" a host address which is an EPA
   address, which is only reachable via some IR(CP) router and its one
   or more serving IR(BR) routers . . .  yet the sending IR(CP) router
   so far has no idea where these are.

   Based on my prior understanding of IRON-RANGER, I understand that this
   tunneled packet is forwarded out to the DFZ or wherever, and towards
   the nearest IRON router which advertises the VP.

   I haven't read much further yet, but let's assume this is an IR(GW)
   router - not necessarily of the VPC which handles the sending IR(CP)
   router, or its nearby serving IR(BR) router(s).

   From what I read earlier, the packet will go to the nearest of these
   IR(GW) routers, which will look up the mapping *** which it already
   has stored inside itself *** (no fancy lookups to other servers
   as in LISP or Ivip!) and will tunnel the packet to that IR(BR)
   router.  That IR(BR) router will tunnel the packet to the appropriate
   IR(CP) router, which will detunnel it and forward the traffic packet
   to the EUN.

   Meanwhile, the IR(GW) router will send a route redirection message
   to the sending IR(CP) router, which will subsequently tunnel packets
   with the same destination address directly to the same IR(BR)
   router.

   All this is from my memory of previous versions.  If I am right
   about this, this explains the guts of the IRON system - there is
   no separate mapping system.  Initial packets go a longer path, but
   because there are multiple IR(GW) routers around the Net, this
   extra path length is probably not going to be unreasonably far.

   Typically, within a fraction of a second, any further such packets
   are tunneled directly to the destination EUN's currently mapped
   IR(BR).


                                                                        In
         particular, if the IR(CP) is located behind a NAT, if the IR(CP) does
         not configure a locator of the same protocol version as the packet's
         destination, or if the destination address is a non-EPA address, the
         IR(CP) encapsulates the packet in an outer header with its locator as
         the source address and the locator of one of its serving IR(BR)s as
         the destination address then forwards the encapsulated packet to the
         IR(BR).

   OK.  This is the 1, 3 & 4 cases.


   In the remaining two sentences of this paragraph, three things
   are true:

   1 - The IR(CP)'s address ("locator") it gets from the ISP is not behind
       NAT - it is an ordinary global public unicast address which is not
       an EPA address: not mapped as part of the IRON system.

   2 - There is no tricky cross-protocol IPv4/v6 translation.

   3 - The destination address of the packet is an EPA address -
       one which is mapped by the IRON system.


                 Otherwise, the IR(CP) encapsulates the packet in an outer
         header with its locator as the source address and the destination
         address of the inner packet copied into the destination address of
         the outer packet, then forwards the packet into the public Internet
         via a default or more-specific route.  This arrangement will ensure
         that the encapsulated packet is forwarded toward the final
         destination while bypassing the IR(CP)'s default routers in order to
         reduce path stretch.

   I don't clearly understand this.  How does this bypass anything?
   What are these "default routers"?  What path stretch is being avoided?

   This seems to be contrary to what I previously understood - that these
   packets would be tunneled to the IR(BR) router serving the EP which the
   destination address is within.

   So I am getting lost with the above sentences.


         The IR(CP) uses the mechanisms specified in VET and SEAL to
         encapsulate each forwarded packet.  The IR(CP) further uses the SCMP
         protocol to coordinate with other IRs, including accepting redirect
         messages that indicate a better next hop.  When the IR(CP) receives
         an SCMP redirect, it checks the identification field of the
         encapsulated message to verify that the redirect corresponds to a
         packet that it had previously sent and accepts the redirect if there
         is a match.  Thereafter, subsequent packets forwarded by the source
         IR(CP) will follow a route-optimized path.

   OK - but I am trying to imagine the mechanisms inside the IR(CP)
   which do all this.