Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)

marcelo bagnulo braun <marcelo@it.uc3m.es> Thu, 02 March 2006 09:05 UTC

Envelope-to: shim6-data@psg.com
Delivery-date: Thu, 02 Mar 2006 09:06:19 +0000
Mime-Version: 1.0 (Apple Message framework v623)
Content-Type: text/plain; charset="ISO-8859-1"; delsp="yes"; format="flowed"
Message-Id: <3df7df0affe008c937485331b35c23ed@it.uc3m.es>
Content-Transfer-Encoding: quoted-printable
Cc: shim6-wg <shim6@psg.com>
From: marcelo bagnulo braun <marcelo@it.uc3m.es>
Subject: Re: shim6 @ NANOG (forwarded note from John Payne) (fwd)
Date: Thu, 02 Mar 2006 11:05:42 +0200
To: Igor Gashinsky <igor@gashinsky.net>

Hi Igor,

It is very nice from you to give us feedback about this...

i will try to comment some of the issues that you mention below...

El 01/03/2006, a las 10:10, Igor Gashinsky escribió:
>
>
> 1) Most connections to content providers (with the exceptions of
> long-lived streaming sessions, but those sessions are fairly "few" per
> server) are very short-lived http (think about 15 packets in each
> direction including the setup/teardown). Since, shim6 (as designed  
> right
> now) does not initiate from the first packet(s), it might not take  
> effect
> for these short-lived sessions, and therefore will not help in case of
> failure, so in effect, *does not work* at all for fast http  
> transactions
>

First of all i think it is important to remember that the goal of the  
SHIM6 protocol is to _preserve_ established sessions through outages.

However, there are other tools that are being discussed, that would  
allow to establish new communication _after_ an outage.

The rationale behind the different tools as i understand it would be  
something like the following:
- If two hosts have long lived sessions (that could be a long tcp  
session, or many short tcp sessions or a long udp exchange), then it is  
likely that it is important for them to preserve this session through  
outages. In addition, since the session is long lived, and as the  
probability of having an outage affecting the communication raises with  
the lifetime of the communication, it seems reasonable to try to  
protect the session. Moreover, as the session is long lived, the amount  
of packets will be large enough to reduce the effect of the overhead  
introduced by the shim context establishment
- However, if two hosts have a short lived session, like a short TCP  
connection, the the above conditions are not true. Basically, this  
means that since the session is short, then the probability of an  
outage affecting this session during its lifetime is reduced. Moreover,  
since the session has just been established, and an outage affects it,  
the assupmtion is that the host will be willing to retry to establish  
the session again. For this there are mechanisms being proposed in  
order to allow the hosts to be able to establish new connections n the  
case that a failure is affecting one of the available addresses. In  
other words, the rationale here is that since the session is short  
lived, the host will prefer to take the risk of having to reestablish  
the session in the case of an outage rather than paying the shim6  
overhead in all its communications (when it is likely that no outage  
will affect them). It should also be noted, that as you mention the  
patience of the users is quite limited and they are likely to retry if  
the connection takes too long, which seems in line with the above case  
for retrying to establish the connection. In addition i would like to  
point out that because of the time that it may take to reconverge, a  
BGP based solution for multihoming does not preserves established  
communication through all the outages, especially when you have anxious  
users that are willing to hit the reload button.

So the effort for this case imho is putted in enabling the capacity or  
establishing new sessions after an outage rather than in preserving  
established connections, do you think this makes any sense to you

> 1) In order to "fix" #1, shim6 has the potential to put a sizable (over
> 10%) state penalties on our servers (to service end-sites w/ shim6),
> something which is arguably the most painful thing for those servers,
> which can translate into millions of dollars of additional hardware,  
> and
> many more millions of dollars per year to power/cool that hardware.
>

Well, the good thing about mechanisms to establish new communications  
through outages is that they are located in the client only and have no  
effect in the server


> 3) While TE has been discussed at length already, but it is something
> which is absolutely required for a content provider to deploy shim6.  
> There
> has been quite a bit of talk about what TE is used for, but it seems  
> that
> few people recognize it as a way of expressing "business/financial
> policies". For example, in the v4 world, the (multi-homed) end-user  
> maybe
> visible via both a *paid* Transit path (say UUNET), and a *free*  
> peering
> link (say Cogent), and I would wager that most content providers would
> choose the free link (even if performance on that link is (not hugely)
> worse). That capability all but disappears in the v6 world if the  
> Client
> ID was sourced from their UUnet ip address (since that's who they chose
> to use for outbound traffic), and the (web) server does not know that
> that locator also corresponds to a Cogent IP (which they can reach for
> free).

I fail to understand the example the you are presenting here...

are you considering the case where both the client and the server are  
both multihomed to Cognet and UUnet?
something like

      UUnet
     /     \
    C       S
     \     /
      Cognet

I mean in this case, the selection of the server provider is determined  
by the server's address not by the client address, right?
The server can influence such decision using SRV records in the DNS,  
but not sure yet if this is the case you are considering



>  This change alone would add millions to the bw bills of said
> content providers, and well, reduce the likelyhood of adoption of the
> protocol by them. Now, if the shim6 init takes place in the 3way
> handshake process, then the servers "somewhat" know what all possible
> paths to reach that locator are, but then would need some sort of a
> policy server telling them who to talk to on what ip, and that's  
> something
> which will not simply scale for 100K+ machines.
>

I am not sure i understand the scaling problem here
Suppose that you are using a DHCP option for distributing the SHIM6  
preferences of the RFC3484 policy table, are you saying that DHCP does  
not scale for 100K+ machines? or is there something else other than  
DHCP that



> 4) As has also been discussed before, the initial connect time has to  
> be
> *very* low. Anything that takes longer then 4-5 seconds the end-users  
> have
> a funny way of clicking "stop" in their browser, deeming that "X is  
> down,
> let me try Y", which is usually not a very acceptable scenario :-) So,
> whatever methodology we use to do the initial set-up has to account for
> that, and be able to get a connection that is actually starting to do
> something in under 2 seconds, along with figuring out which sourceIP  
> and
> destIP pairs actually can talk to each other.

As i mentioned above, we are working in other mechanisms than the shim6  
protocol itself that can be used for establishing new communication  
through outages.

you can find some work in this area in

ftp://ftp.rfc-editor.org/in-notes/internet-drafts/draft-bagnulo-ipv6- 
rfc3484-update-00.txt

if you have comments, and especially improvements on the ideas of this  
draft or other ideas of how to tackkle this problem of initial contact,  
it would be really useful


>
> I hope this gives people some visibility as to what some content  
> providers
> think about shim6, and why deploying it is, well, not something that
> people will scramble (or very possibly chose) to do, unless those are
> addresses. And, yes, everyone understands that it's all about making
> trade-offs, but if you make the wrong trade-offs, and not enough people
> deploy the protocol, it's simply not going to fly, and people will  
> just go
> back to de-aggregating in v6 and let Moore's Law deal with the issue  
> (and
> anyone who thinks that people will prevent paying customers from
> deagregating has not seen how many hoops ISP's will jump through for  
> that
> extra revenue, or how fast customers will jump to other ISP's which  
> will
> allow them to do just that). I don't know if more work on shim6 is the
> answer, or GSE/8+8 is a better alterntive, but it sure looks like what  
> we
> have in shim6 today (and it's current direction) isn't going to cut it.
>
> Just my $0.02
>

yes your feedback is very welcome

thanks, marcelo


> Thanks,
> -igor
>
>
>