Re: soft state (was Re: shim6 and bit errors in data packet headers

Erik Nordmark <erik.nordmark@sun.com> Wed, 11 May 2005 18:15 UTC

Envelope-to: shim6-data@psg.com
Delivery-date: Wed, 11 May 2005 18:16:00 +0000
Message-ID: <42824BD0.90608@sun.com>
Date: Wed, 11 May 2005 11:15:44 -0700
From: Erik Nordmark <erik.nordmark@sun.com>
User-Agent: Mozilla Thunderbird 1.0.2 (X11/20050323)
MIME-Version: 1.0
To: marcelo bagnulo braun <marcelo@it.uc3m.es>
CC: shim6 <shim6@psg.com>
Subject: Re: soft state (was Re: shim6 and bit errors in data packet headers
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit

marcelo bagnulo braun wrote:

>> Why would we want to couple the state management aspects of shim6 but 
>> the shim6 test protocol? To me any such coupling seems undesirable, 
>> especially since the parameters for the test protocol (how quickly to 
>> detect failures) might be a function of upper layer advise, as well as 
>> upper layer hints of "working" or "not working".
>>
> 
> Well, i guess that the situation when one of the nodes has lost the shim 
> state can be seen as a form of failure and my assumption is that failure 
> detection mechanisms will likely detect it first

But that's a circular argument for including the context state in the 
failure detection mechanism. You are in effect saying that the test 
protocol should test whether the context has been lost on the peer since 
it can be made to test for a lost context on the peer.

FWIW the outline of a test protocol in section 5.4 of 
draft-arkko-multi6dt-failure-detection-00.txt doesn't assume such a 
thing. (But it does assume that B remember something about previously 
received probes, so there are some issues about DoS opportunities.)

> I think that the protocol behaviour would be something like this.
> 
> A communication is established between node A and node B
> Later on, a shim context is created between those two nodes.
> The parameters for that context are:
>   ULIDs: IPA1 and IPB1
>   Locators: for IPA1 (IPA1,...,IPAn)
>             for IPB1 (IPB1,...,IPBm)

And a context tag presumably.

> Suppose that for some reason node B losses the shim context (and only 
> the shim context, i.e. the application and transport state about ongoing 
> communications is preserved)
> 
> I guess that at this point we have several scenarios to consider:
> 
> Scenario a): the communication between A and B is still using IPA1 and 
> IPB1 as locators.
> This scenario has two subcases:
> Scenario a.1) The communication is bidirectional and e.g.
>               TCP is providing ack of the progress of the communication
>               this means that no periodic reachability test
>               nor any other shim signaling is being exchanged.
>               In this scenario, a lost of SHIM context would remain
>               undetected until there is a failure and node A detects it
>               and tries to explore alternative paths. This is so because
>               data packets will carry ULIDs and will be passed successfully
>               to the upper layers. 

If we assume that B (as well as A) will have a heuristic to create shim6 
contexts (e.g. based on having received 50 packets for a locator pair), 
then this heuristic might be trigger and cause B to try to establish a 
context with A, at which point in time A will see that it already has a 
context with B.

> Once that there is a failure, then
>               reachability test packets won't be recognized as belonging
>               to any existent shim context and the problem can be detected.

Here you are already assuming that reachability test packets will not be 
recognized, i.e. presupposing a particular interaction between the state 
management and the test protocol.

> Scenario a.2) The communication is unidirectional
>               In this case, periodic reachability test need to be
>               performed in order to verify that the path is still working
>               If the node B losses its shim state, it won't recongnize
>               the reachability test packets, and the lost of context can
>               be detected

Again, here you are presupposing a particular interaction.

> Scenario b) the communication between A and B is using alternative 
> locators.
> In this case, when node B losses the context, data packets won't be 
> properly delivered in node B, because it won't be properly demuxed.
> At this point, the reachability test will be performed to verify the 
> locator pair being used

If you are using alternate locators and the working locator pair is 
unidirectional, then it seems like you'd need to be able to re-discover 
that working unidirectional locator pair, before you can re-establish 
the context state on B.
Thus if A is sending using IPA1->IPB2 and B was replying using 
IPB1->IPA2, and B looses the context state, what do you do?
Seems like solving this case requires that the test protocol is not tied 
in with the state management.

> I don't know if i am missing something, but AFAICS, all the situations 
> when the shim context is lost result in a reachability test exchange, 
> and that is why i was wondering if it wouldn't make sense to define a 
> "no-context" error message as a rply to a reachability test request packet.

That is one particular solution with strong coupling between the test 
protocol and the state management.

But don't we want to retain the possibility to test locator pairs for 
initial contact, i.e. before a context is established between the peers? 
And handle the above case of unidirectional locator pairs?


> But i fail to understand how the node that has lost the state can 
> identify that a data packet belongs to a non existent shim state....

By seeing that the <source locator, destination locator, context tag> 
doesn't match any existing context?
I suspect we want that capability for robustness in any case.

> I mean, i guess that a first element that is relevant here is where are 
> we going to carry the context tag.
> If the context tag is carried in a extension header or dest option, then 
> i can see that if a node receives an packet with one of those, can 
> easily detect that there is no context associated. (note that in this 
> case, the context loss is only detected in the case where the locators 
> used for the communication differ from the ULIDs, i.e. the extension 
> header dst option is included in the packet)
> 
> If the context tag is included in the flow label, then i don't see how a 
> node that receives the data packet can determine that the packet is 
> associated to a shim context that is no longer there. At this point, i 
> gues that as you mentioned in a previous mail, the data packet would be 
> silently discarded, right?

If the context tag is carried as a flow label, I still think we need a 
way to tell the receiver "this is a shim6 packet". For robustness 
reasons I think the fact that the packet needs shim6 processing should 
be explicit.
There has been proposals in multi6 which suggested doing this without 
making the packets larger by defining a set of new nexthdr values with 
meaning like
	shim6+tcp
	shim6+udp
	...
	shim6+esp

Not having that "shim6" bit when the flow label is used as a context tag 
can easily result in hard to diagnose errors. We might have errors due 
to some middlebox messing with the data packets (a TCP relay for 
instance), but that leaves the shim6 test packets alone. If the TCP 
relay doesn't preserve the flow label, then the packets would be dropped 
due to TCP checksum errors (since the ULID rewrite didn't happen), but 
the test protocol would say that everything is fine.

> I think that at this point is clear to me that if we define a no-context 
> error message, this message should be defined as a reply to a packet 
> that refers to that context and it should include enough information 
> about this initial packet to verify that is a reply to that packet.
> 
> The no-context error message cannot be issued spontaneously by a node.

Agreed.

   Erik