Re: Minutes of the Interim SHIM6 WG meeting

Geoff Huston <gih@apnic.net> Tue, 11 October 2005 12:17 UTC

Envelope-to: shim6-data@psg.com
Delivery-date: Tue, 11 Oct 2005 12:19:08 +0000
Message-Id: <6.2.0.14.2.20051011221419.02b71c48@localhost>
Date: Tue, 11 Oct 2005 22:17:56 +1000
To: shim6@psg.com
From: Geoff Huston <gih@apnic.net>
Subject: Re: Minutes of the Interim SHIM6 WG meeting
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"

At 06:14 PM 11/10/2005, Geoff Huston wrote:
>We know a number of folk are keen to see these minutes, so here's the initial announcement
>
>The minutes of the interim meeting, in pdf format, are at http://www.potaroo.net/shim6/Shim6wg-Interim-Meeting-Minutes.pdf
>
>We'll convert these minutes to html and plain text, and also upload the presentations used at the meeting, in the coming hours, and post a followup note and the minutes in plain text once we've got all that done.
>
>Thanks to those who were able to attend,
>
>regards,
>
> Kurtis & Geoff

The presentation packs used at the SHIM6 interim WG meeting, and pdf and html markup versions of the minutes are now available at http://www.potaroo.net/shim6

The text version of the minutes is appended to this note.

thanks,

Geoff

----------------------------

SHIM6 Interim Working Group Meeting

Hotel Krasnapolsky, Amsterdam, The Netherlands
8th - 9th October 2005

The meeting logistics were generously supported by the RIPE NCC, and
the SHIM6 co-chairs thank the RIPE NCC for their support.

In accordance with an announcement to the SHIM6 Working Group Mailing
List, an interim meeting of the SHIM6 Working Group was held at the
Hotel Krasnapolsky on the 8th and 9th October, 2005.

Agenda

The agenda for the meeting was as follows:

1. Review of current status
2. Presentation by lead authors on working documents:
Protocol
Crypto Locators
Triggers
Applicability
3. Issue identification
Potential areas:
locator pairing discovery
locator pairing state maintenance / cache management
upper level protocol API / signalling
mobility implications
packet header format / content
shim equivalence state behaviour
4. Functional decomposition
5. Next steps (deliverables for Vancouver)
6. AOB

Participants

Jari Akko
Marcelo Bagnulo
Pierre Baume
Iljitsch van Beijnum
Spencer Dawkins
Geoff Huston
Kurtis Lindqvist
Pekka Nikander
Ronald van der Pol

Minute Takers

Illitsch van Beijnum, Spencer Dawkins and Geoff Huston took notes
of the meeting. The minutes were assembled by Geoff Huston

1. Review of Current Status

Protocol Specification
[1]draft-ietf-shim6-proto-00

This is intended to be the core specification document for SHIM6.
draft-ietf- shim6-l3shim-00.txt will not be further revised, and
the introductory text will be moved to the shim6 architecture
document. Section 18 (Design Alternatives) of the -00 protocol
draft will be placed into an appendix to the document. It is
undecided at this point whether to keep this appendix in final
version of the WG protocol specification document, or whether to
publish the appendix as a separate informational document at the
same time as the protocol specification document. The l3shim-00
document is effectively replaced by this document.
The lead author of this document, Erik Nordmark, has requested
some assistance with the message diagrams and associated protocol
interaction descriptions.

Functional Decomposition
[2]draft-ietf- shim6-functional-dec-00

The questions relating to this document were relating to the
specific purpose of maintaining this as a standalone document, and
whether parts of this should be folded into the protocol
specification and architecture document. The current version of
the document concentrates of consideration of various design
alternatives. At this stage it is proposed that the documentation
of design alternatives and specific design decisions taken within
the SHIM6 specification shall be included in the design
alternatives appendix of the protocol specification document, and
material related to the architectural description be folded into
the architecture draft.

Hash Based Addresses (HBA)
[3]draft-ietf-shim6-hba-00

This document is ready to WG Last Call, and is used by the
protocol specification (which is based on HBA). The HBA draft
describes how the hash algorithm works, and it is noted that we
can WG Last Call the current draft and then bundle it with the
rest of SHIM6 output for the IESG review and IETF Last Call. It
was proposed to consult the Ads regarding cross-area review of
this document needs, specifically including security community
review.

Action: Chairs to perform a Working Group Last Call on the HBA Draft

Action: Chairs to refer the document to the ADs for cross-area review,
with specific request for security review.

Ingress Filtering
[4]draft-huitema-shim6- ingress-filtering-00

It was commented that ingress filtering is operational practice
(BCP), not a particular protocol standard.

[5]RFC2827 Network Ingress Filtering: Defeating Denial of Service
Attacks which employ IP Source Address Spoofing

There are many ways that the issue of potential ingress filter-based
packet drop based on a source address filter match at the site
boundary interface could be addressed. It is also noted that the
issue here is not entirely limited to packet discard through ingress
filter, but that selection of a specific source address by the host
may be used as a mechanism for a host to select a specific site
egress path where there are multiple egress paths offering
equivalent or overlapping destination reachability. The question of
whether source address selection can be used for egress path
selection is an open one, and the mechanisms proposed in the draft
are not in common use at present.

The question considered was whether the draft contained material
that was considered critical for the protocol draft. It was noted
that the decision to use source address selection as the signalling
mechanism between the host and the sites packet forwarding framework
is not one that the WG has made. If it's a precondition for SHIM6
that source addresses matter, the WG needs to sanity-check this
decision really soon, because the impact on routing and forwarding
systems within sites is very significant. It is also noted that the
sites internal source-address forwarding mechanism is not required
in all cases (if you have full BGP at a single site edge router, for
example). Multihoming may not be comprehensive in terms of traffic
surviveability, and, for example active SHIM6 traffic may failover
but default-routed non- SHIM6 traffic would not. If detection and
repair aren't unidirectional, the "other end" gets hints that things
need to change, but the new source address doesn't "steer" the other
end traffic to take a particular incoming path. While it was
considered that these concerns are relevant to SHIM6, they can be
considered to be orthogonal to the protocol specification.

The question of adoption of this draft as a WG document was
considered, It was noted that there are a lot of good ideas
discussed in this draft, but its unsure that this document needs to
be part of the SHIM6 document collection. The central SHIM6 concern
is that of hosts selecting among site exit paths, rather than the
general ingress filtering problem. One conventional approach would
be to consider and possibly document the requirements first, and
then solicit proposals that meet the requirements. It was also noted
that the SHIM6 approach is to propose a minimal set of changes here.
It was felt that more information would assist the WG in considering
whether to adopt this document as a WG document.

Action: Marcelo Bagnulo to draft a matrix of scenarios and solutions
relating to this ingress filtering and site exit path selection,
intended for presentation at Vancouver.

Address Selection
[6]draft- bagnulo-shim6-addr-selection-00

This draft notes shortcomings in RFC3484 in terms of trying all
source/destination combinations, not just all destinations, in a
multi- addressed local context, irrespective of local SHIM6
capability. The context in SHIM6 is the proposed operation of
initial contact, where the current specification refers to RFC3484.
It was considered that if the major difference between the current
version of RFC3484 and this draft is the consideration of source
address ordering, as distinct from just source address selection,
then this would be a relatively small textual amendment to RFC3484.
Implementing this change may be challenging, however. It was noted
that this proposed change to RFC3484 would address the SHIM6 charter
item that refers to Solutions to establish new communications after
an outage has occurred that do not require shim support from the
non-multihomed end of the communication. The discussion in this
draft will be used to support the case for proposing changes to
RFC3484, but it was considered that as this was the output of the
IPv6 WG, further changes to this specification should be considered
by the IPv6 WG. Determination of this question was considered to be
a decision for the ADs.

Design Note: The specification for initial contact will use RFC3484,
as modified by this draft in terms of source address ordering.

Action: Marcelo Bagnulo to draft a note outlining a set of specific
text changes to RFC 3484 to support the approach proposed in the
address selection draft

Action: Chairs to pass this note and the address selection draft to
the ADs, with the recommendation that changes to RFC3484 be considered
by a WG in the Internet Area.

Application Referral
[7]draft-ietf-shim6-app- refer-00

It was noted that this draft is not on the critical part in terms
of completing the initial core protocol specification. It was
noted that there a large number of issues with application
referrals with Site-Local addresses, and this has the potential
for similar characteristics. The draft notes that if a FQDN is not
being used as the referral object, the options include the passing
of a ULID as the referral object or passing the entire locator set
as the referral object. The locator set object could be useful to
assist in terms of initial contact, but this has some API
implications for upper level protocols. The intent with the
locator set was to use an object that it is not intended that
applications understand the semantics of these locator sets.

Applicability
[8]draft-ietf-shim6- applicability-00

Applicability document is currently a placeholder document, which
will be revised once the core protocol specification has been
stabilised.

Failure Detection and Reachability Exploration
[9]draft-ietf- shim6-failure-detection-00
[10]draft-ietf-shim6-reach-detect-00

As the topics of failure and reachability are very closely related
the immediate steps are proposed to be merging these two drafts
into a single document that describes the state conditions where a
failure condition will be triggered, and also describes the
exploration procedure that attempts to recover connectivity though
a structured search of the locator pair space to discover locator
pairs that offer host-to-host reachability. The decision whether
to further fold this combined failure detection and reachability
exploration specification into the core SHIM6 protocol document
will be considered at the SHIM6 WG meeting at IETF-64 (November
2005).
It was noted that there is reference to IPv4 and RFC1918/NAT
conditions in the current document, and it was proposed that this
text be removed from the SHIM6 documents on the basis that the WG
has no charter beyond specification of IPv6 mechanisms. However it
was noted that the HIP WG and HIP RG may find the extensions of
this approach in IPv4 and RFC1918/NAT contexts a useful approach
useful in terms of avoiding reinvention in this area.

L3-SHIM
[11]draft-ietf-shim6-l3shim-00

As this document is effectively replaced by the shim6-proto
document, it will not be further revised by the WG.

Architecture
[12]draft-ietf-shim6-arch-00

As discussed in the previous WG meeting in August 2005, this
document will be revised once the core specification is
stabilized.

Upper level Protocol API

A document describing the forms of interaction with upper layer
protocols has yet to be drafted. Pekka Nikander has sent some
thoughts to the mailing list on this topic, but these have not
been submitted as a draft as yet. A January / February 2006
submission time is anticipated. The perspective proposed is to
consider this from a SHIM6-centric perspective (what signalling
from the upper protocol layers would SHIM6 consider a helpful
indication?). The larger topic of locator pair selection as an
indirect method of forms of host-based traffic engineering and
performance / service quality selection would be held over for now
and the initial API effort would concentrate on an API that
supports signalling to a core SHIM6 specification. The WG
considered that when a richer API was being considered by SHIM6
WG, then some assistance from TSV would be of a significant
benefit to the consideration of this topic.

2. Presentations by Lead Authors on Working Documents
3. Issue Identification
4. Functional Decomposition

These three items were considered jointly by the WG.

Protocol Specification
Erik Nordmark

* The specification has made a number of arbitrary design choices
based on alternatives described in the prior l3shim document in
order to make progress.
* The approach of placing the context tag in flow label field was
considered - the MTU problem isn't really a big problem, so why
are we hacking in the flow label to solve it?
* Do uncoordinated state removal with error message when peer
removed/lost state is detected.
* Use of Flow Label and modified Protocol numbers as an encoded
SHIM6 header in the IPv6 packet header. New protocol number for
control protocol (similar to ICMP), overloading of port number
semantics for TCP, UDP, etc. Need to tell receiver "this packet
needs special handling". Other protocols are also reusing flow
label (NSIS, etc.).
* Related topic - ordering of processing for multiple items in
headers.
* Is this a candidate for optimization (after we get the base
protocol working)? If most conversations don't fail, we usually
don't need this optimization anyway.
* We're concerned about possible "flow label collisions" between
protocol mechanisms that are trying to reuse the same header
field.
* We're concerned about bouncing back and forth between IP code and
TCP code in some implementations. Why are we trying to avoid using
a small number (8) of packet header bytes? Upper layer protocols
already have to deal with changes in MTU size due to routing
changes, etc.
* The architectural intent for IPv6 is to do use a specific header
for such signalling between endpoints. Can we optimize later?
Would need to be a stronger case to optimize later. If we think
people are going to use SHIM6 for something like traffic
engineering, we'll do this more often than just in failure modes.
Are we still using MPLS for traffic engineering, too? If so, we
would have an end-to-end signaling mechanism to negotiate stuff
like this anyway. Draft doesn't consider exhaustion of flow
labels, using multiple flow labels, etc. which would add a lot
more complexity. Could have an extension protocol that says "these
upper-layer identifiers are being SHIM6ed", so packets don't have
to carry any information at all - there's a whole capability
negotiation mechanism that would be useful. Indications of
"willingness to SHIM6" are unidirectional, so don't have to
coordinate between endpoints. Can we start out with an explicit
SHIM6 header and return to this later? Is this research now
anyway? We could use the approach of an experimental extension
protocol to deal with a number of things that we don't understand
now.
* Sense of the room is to use the explicit headers in the base
protocol specification and go on. This also allows us to use a
larger/better context identifier (we had picked 20 bits because it
fits in the flow label, in the previous draft). It also means that
SHIM6 control packets and data packets will tend to pass or be
blocked by firewalls together (instead of passing control protocol
packets and then failing data packets).

Design Note: Use an 8 byte IP SHIM6 header in the base protocol
specification for packets that require specific SHIM6 processing by
the receiver, and allow optimizations on this, including that of a
zero-length header, to be an experimental protocol extension.

* Do we need a checksum in the control packets? Certainly not in
every SHIM6 packet, but checksums are more valuable in the control
packets.

Design Note: Use a header checksum on the SHIM6 control packets but no
in the SHIM6 packet header.

* How "unique" are context tags? We are thinking about context tags
as a way of resisting injection attacks, so longer is more
attractive. We will end up with a 32-bit context field and about
15 bits as "reserved".

Design Note: Use a 32 bit context field with no checksum, and 15
reserved bits and a 1 bit flag to indicate control / payload. Note
potential DOS risks

* Failure in initial contact? The direction here is to refer to an
updated 3484 (source address set), and may ask that a
bidirectional ULP (such as TCP) retry multiple initial SYNs with
various source/destination locators. Is this sufficient? One other
possibility - "is this locator already part of another locator set
that has been previously established?" And how do we cache this?
Is this an API construct (ULP sends down the locator set and gets
back a ULID from SHIM6 if a SHIM6 context already exists for this
locator set)? Building on existing SHIM6 context state is similar
to TCP sharing congestion information across TCP connections, so
we can anticipate that SHIM6 API support at initial contact time
would get used by upper layers.
* What about exchanging SHIM6 information that allows SHIM6 peers to
exchange ULIDs before initial SHIM6 context is established?

Design Note: For initial contact use RFC3484 (bis)

Design Note: Possible experimental ULP API extensions to initial
contact:
1. Enhanced Contact would result in searching existing SHIM state
based on initial locator set. (This may return a ULID pair that
was not in the ULPs locator set is this a problem?)
2. SHIM6 Contact would perform the contact step above, and where
there was no SHIM6 context, then trigger SHIM6 state
initialization and returned to the ULP the ULID pair with SHIM6
state set up
3. Ordered Locator Set (getaddr_pair_set_info()) returns an RFC3484
ordering of locators based on local SHIM6 state information. This
could be used to construct a connect_by_name() approach

* We're not using ICMP messaging to indicate loss of state because
of our concerns about ICMP filtering in today's Internet. If we
use the same protocol for messaging and error notification, that
protocol either makes it through firewalls or it doesn't, instead
of failing during error notification, for instance.
* Can a SHIM6 peer decline SOME of the locators offered by another
SHIM6 peer? HIP experience was that a general update/ack protocol
was useful to update locators, SPIs, etc. MOBIKE also has a
similar underlying protocol as well. May include locators and
preferences between locators.
* How many offers/counteroffers do we need to support?

Design Note: For SHIM6 control messages use a unidirectional
acknowledged information transfer UPDATE and ACK message transaction
as the base protocol, and then specify control messages in terms of
control message types and attributes.

Open Issue: Do we need a simply NOTIFY (un ACKed UPDATE) message type
as well?

Placeholders in the specification:

* CUD/FBD? Locator pair test/reply (Eric suggests that we drop
this), and context explore messages.
* Locator pair test/reply (need to be independent of ULIDs, and note
that there may be multiple ULID pairs associated with the same
host pair)
* Reachability exploration for a working locator pair
* What are privacy requirements for locator lists? Also integrity -
this protocol is currently "in the clear".
* "Forking the context state" for preferring different locator pairs
for different ULP communications? How close is this to policy? Is
this unidirectional or bidirectional? How much information does
SHIM6 get from the ULPs that would go into this decision? This is
similar to MONAMI6 work previously done. Preference for viewing
this as a unidirectional rather than bidirectional search,
although in a destination-based hop-by-hop forwarding environment
without source-address routing considerations a pair of source-
address locators in each direction is functionally equivalent to a
single bidirectional pair.

Design Note: Locator pairs are considered as unidirectional locator
pairs, and there is no assumption that these must map into a
bi-directional pair.

* Locator list option has all locators, but HBA parameter set has
prefixes that reflect all locators, too. Is this needed in both
places?
* Do we need a generation/version number for the locator list? This
isn't the same as transport sequence numbers that are used for
reliability. Could we recover more rapidly if we know what version
number we are current with? But we're sending entire sets now, not
sending deltas. Don't want to list entire IPv6 address locators in
order to change preferences? But it is simpler to send the
addresses than to send the addresses and then send preferences by
index.

Design Note: Do not use locator ordering and index references in SHIM6
control messages in the initial base spec

* Detecting loss of context doesn't work while the ULID pair works
as the locator pair, so the peer may have garbage-collected the
context and you didn't notice until there is a failure.
* What do you do if you receive contexts that you don't understand?
Send an error, if it's a control packet, or silently discard it,
if it's a data packet? Eventually you notice because of
reachability detection anyway - do we need to notice more quickly
than that?

Design Note: We need to indicate which LLU locators should be verified
with HBA, CGA, or some future mechanism.

* 32-bit contexts could be DOSed - do we need more bits?
* Which SHIM6 control messages need sequence numbers?
* Remaining Design Alternatives...
* Need to make a decision on state cleanup, choosing uncoordinated
cleanup.

Sharing base packet format with HIP for SHIM6 Control messages

Pekka Nikander

* One perspective on HIP and SHIM6 is that SHIM6 is a semantic
subset of the HIP approach (Assertion - SHIM6 is a subset of the
problems HIP is trying to solve)
* This is not thinking about "same state machine and same
semantics".
* But a common packet format would help with areas of potential
experimental protocol extension
* Current HIP packet layout is pretty different from SHIM6 packet
layout, but (ignoring HITs) the contents are pretty similar.
* Option format - is 256 bytes enough for CGA signatures? If not, we
have 16-bit length, so having 16-bit type makes more sense, and we
may end up with something that is a close approximation of the HIP
parameter format in any case
* Not proposing a single shared parameter space until we know a lot
more about HIP than we know today.
* Why did we use 8-bit options, and would 16-bit options be a
problem?
* Our biggest expressed concern was a perception problem that SHIM6
is
* ntending to generate a proposed standard, while HIP is
experimental. That would imply a position that any resemblance to
the current HIP packet format is entirely coincidental, but useful
in various experimental contexts.

Design Note: The SHIM6 packet formats have been updated to
* have a 32 bit context tag
* checksum in same place as in the hip header
* a 1/0 bit to distinguish the payload vs. control messages
* have a 16 bit option type and length

For the most control messages this results in 7+16 reserved bits. Most
of the fields are 32 bit so they can't fit in here.
* Adopted HIP parameter format for options; HIP parameter format
defines length in bytes but guarantees 64-bit alignment.

[Meeting adjourned for dinner, restarted on Sunday morning.]

Protocol Specification Placeholders (review)

* Locator pair test and response

Design Note: Proposed to drop specific mechanism for locator test and
response

* Reachability exploration: what locator pairs are working after a
failure? (actually find me the first locator pair that works)
refer to the failure and reachability work.
* Locator list option has all locators, but HBA parameter set has
prefixes that reflect all locators, too. Is this needed in both
places?: We think it's OK to duplicate locators and prefixes in
our messages.

Design Note: Allow both locator set enumeration and HBA parameter set
in an UPDATE message

* What are privacy requirements for locator lists? Also integrity -
this protocol is currently "in the clear".

Design Note: Place this topic into the larger item of possible areas
of protocol extension, and note in the Security Considerations of the
protocol specification that we have considered this and are advising
that this falls into an area of potential protocol extension activity.

Action: Pekka Nikander and Marcelo Bagnulo to work in a draft of
Guidelines for potential protocol extensions for SHIM6, including (but
not limited to)

* flow label use / header compression,
* privacy,
* hash chains and security,
* initial contactless SHIM6 context establishment ,
* API interaction for initial contactless SHIM6 context
establishment
* Locator pair selection based on signalled preferences
* Return path locator preference signalling

* Forking of context state - is this unidirectional or
bidirectional? Strong preference voiced for a unidirectional
forked state. Two goals here - traffic engineering for a site, and
different traffic types between the same two hosts. Traffic
engineering seems closer to what we know about at IP level -
"different traffic" may be a lot more open-ended. Mobile IP and
HIP have similar issues. One proposal advanced was to schedule a
joint working session in IETF-65with TSV and RTG? We won't know
enough to meet on this subject in IETF-64 in November 2005. Can we
require that this be done at ULID selection time? SCTP has similar
problems (but SCTP is closer to the application than SHIM6). SHIM6
is providing a hook for something finer than host-to-host
granularity, without trying to solve all conceivable problems.
Bidirectional context state forking is seen as a ULP signalled
outcome.

Design Note: View forking as a unidirectional context state fork
(based on a ULP signal) that assumes that the forked context state may
then use a different outgoing locator pair.

* Run with a version number for a locator set?
* Detecting context problems while the original ULID pair works as a
locator pair? Need to detect the problem before a failure happens.
Ping periodically? If we send R1 as a context error message, we're
already starting to re-establish the context state. Why would any
host that was SHIMming decide to stop doing so? We need to make
sure that we don't require continuing packet exchanges without
advancing to context established state. The R1 values are slightly
different (we don't have an initiator context tag from a request,
we are using the context tag that we believe the peer thinks we
have). We think that trying to return to non-SHIMmed operation
when a host garbage-collects context is probably a mistake - we'll
just "die".
* What happens if the A end garbage-collect its state and later
reuses the same context number with the same B end host? Should
the B end have the new context replace old B end context state and
just go on? There is a race condition if the remote end is trying
to reestablish the context that has already been locally
garbage-collected, and the remote end is trying to send using the
old context. There's a concern with forged packets that try to
reestablish the context resulting in a DOS. Can we include in the
context tag generation algorithm some bits from the sender of the
packet, as well as the receiver of the packet who chooses (most
of) the context tag value, so the context tag has bits from both
ends and we can tell context 3.1 from context 3.2? Context numbers
that are pseudo-random would help, but we can't prevent collisions
completely. If applications can provide hints to SHIM6 that the
application is still alive ("so don't garbage-collect"), that
would help. A usage counter can tell you if garbage collection of
the context state at this point in time would be a bad idea (as
its still active), but not if it's a good idea. If we can get
unwedged, that's the important thing - being wedged less often is
an optimization.
* What do you do if you receive contexts that you don't understand?
Send an error, if it's a control packet, or silently discard it,
if it's a data packet? Eventually you notice because of
reachability detection anyway - do we need to notice more quickly
than that?

Design Note: On receipt of a SHIM6 payload packet where there is no
current SHIM6 context at the receiver, the receiver is to respond with
an R1* packet in order to re-establish SHIM6 context. The R1* packet
differs from the R1 packet in that an R1 packet echoes the I1 fields,
while this R1* offers state back to the sender. Either way the next
control packet is an I2 in response. The senders previous context
state is to be flushed in receipt of the R2 packet following the R1*,
I2 exchange

Action Item: Marcelo Bagnulo to review this and consider possible
issues with this form of SHIM6 protocol response.

Action Item: Erik Nordmark to document the alternative SHIM6 context
setup where each side offers one half of the constext value, so that
unnecessary context destruction is avoided for WG consideration.

* Are four packets really necessary in the SHIM6 context
establishment? IKEv2 doesn't require cookie to be present in all
packets, only when we suspect we're under attack. But this could
be an experimental extension. SYN flooding is still incredibly
difficult to deal with operationally (because each packet is just
a normal packet). We are in better shape than IKEv2 because
packets are still flowing "normally" while we are setting up SHIM6
context. This could be a potential experimental protocol
extension.

Action Item: Marcelo Bagnulo to document a shorter context
establishment protocol exchange based on the IKEv2 approach (as a
potential experimental protocol extension).

* Which SHIM6 control messages need sequence numbers?

Design Note: SHIM6 control message sequence numbers are not needed
here.
Reachability and Failure Detection
Jari Akko
Iljitcsh van Beijnum

Failure

* "Primary" isn't quite the right term (it's mostly "the locator we
started with")
* We won't reinvent DHCP, and we will believe what ND tells us.
* SHIM6 is only expected to be used in failover scenarios. Shim6
only works as a failover
i.e. different hosts may have different locator sets for the same
remote host
i.e. a pair of communicating hosts can have multiple contexts with
independent locator sets.
Right now the hint is the ULID pair differentiation
* Different contexts do not necessarily imply different ULID pairs
* FBD is chosen for simplicity

Design Note: Use FBD as the reachability algorithm.

* Sender chooses outgoing address pair (independently of the choices
made by the remote host)
* Failure Detection:
1. If you receive anything when you are sending packets, assume
that all is well.
2. If you aren't sending or receiving packets, assume that all
is well.
3. If you are receiving packets and don't need to send payload
packets, send some form of keepalive.
4. If you are sending payload packets and aren't receiving
anything (payload or keepalives), assume that something is
broken after time interval T.
* We need a time base in order to send keepalives, and an associated
timebase for non reception of in-coming packets within the SHIM6
context.
* Peers need to have a shared understanding of how long this
timeslot is. We need to understand the relationship between
timeslots and RTTs (and need to keep from reinventing TCP within
SHIM6 with focus on RTTs). Would prefer not to initiate an
exhaustive locator exploration just because SHIM6 is confused
about the peer's timeslot choice. We need to think about how
aggressive we want to be about failure detection. Exploring this
futher, it was observed that 10 seconds is fast as compared to
BGP4 current practice (1.5-3 minutes). There is a startup
transient that is also critical here. Should the initial
specification used a statically defined time interval, or does
SHIM6 adaptively learn? Is there a difference between symmetric
idle and assymetric idle? We have some concerns about interaction
with higher-level protocols that may also be trying to do recovery
asynchronously (and applications that may differ in the goal for
failover). Should this detection and recovery mechanism be faster
than a TCP ULP? Should the routing state timers in OSPF and BGP be
a factor here. TCP timeout is an upper limit. Within that
constraint, we have three choices for the timeout: slower than BGP
(so we give BGP the chance to repair the failure: > 90 (RFC) or
180 (Cisco) seconds), between BGP and OSPF (give OSPF the chance
to repair: > 40 < 90 or 180 seconds) or faster than OSPF: < 40
seconds.

Design Note: Use a statically specified in the initial protocol
specification of (10) seconds.

The idle keepalive trigger is statically specified to be 3 seconds.
This value may be negotiated at SHIM6 context startup as an
experimental protocol extension
This value may be dynamically altered during the SHIM6 context as
an experimental extension

* The meeting noted other candidate timers, including setting the
value between 24 and 36 seconds.

Reachability Exploration
* Exploration may be a uni-directional discovery, but a
bi-directional shared computation
* Exploration uses an attempt to synchronize on a state, using a
format where each sent probe carries information relating to all
received probes so far.
* Exploration also makes use of timers in terms of assumptions of
failed probes
* In exploration for a viable locator pair it is noted that only one
end may know there's a problem, and knowing when to STOP exploring
is really hard.
* Note FBD only detects failures in the incoming path.
* Consideration of the use of a quick check as the initial response
before launching into a full exploration.
* Must SHIM6 recognize a keepalive as a keepalive? This is not
strictly required in FBD, as its SHIM6 packets rather than
specific packet content or type that count here, but we have to be
able to recognize keepalives as keepalives to avoid sending
keepalives in response to keepalives.
* Also note that it's an issue to determine when to STOP sending
keepalives when neither peer has traffic to send.
* It is also a relevant consideration of how firewalls react to
keepalives (probably react badly to IP packets with no payload
whatsoever (header only), probably use SHIM header with keepalive
option.
* The concept of a host-id was considered as a way of identifying a
host across multiple ULIDs. Need an algorithm to make sure all
hosts choose a unique host ID (same theory as router IDs). How do
we change host IDs if the chosen locator was deprecated?
Alternative is to work with sets of locators (instead of host IDs)
- "this is the set of locators I think you have".
* How dynamic are locators sets (with CGA, etc.)? ULIDs don't change
as long as there is any session active.
* A common probe data structure is proposed to be reused in several
packet layouts.
* Quick check request/reply mechanism (we think this path will work,
we're just making sure), plus full exploration with context
reference. Some concerns about DOSability of including context
information as part of reachability (reflection attacks, etc.).
Including "last few" probes in each direction (allows you to
detect relatively slow locator paths). Does this extend into
sending complete metrics for all locators on each probe? Could
send last 3 successful probes, last 3 failed probes, etc.
Balancing amount of hints on path selection with amount of
information sent. Unsuccessful probe information could be really
useful ("move these locators to the end of the list"). 30-40
second-old information is ancient history.
* Start full exploration when you timeout on quick check.
"Exponential backoff" - sending probes to more locators over
greater intervals. Some discussion about choosing "best" or "first
as good as previous" paths based on RTT vs simply choosing a
working path - concerns about minimizing RTT versus other QoS
values (jitter, etc.). Moving beyond "works" as a discriminator
should be an experimental protocol extension. Moving "back" to the
original locator pair should be an experimental protocol
extension. The subtext is "getting off my GPRS backup ASAP", and
that's really hard to generalize.
* Propose to use this probe structure in all SHIM6 packets (would
give us better RTT measurements)?
* When should exploration stop? When you have any candidate locator
pair? Or continue to see if there is a better candidate pair?

Design Note: Continued exploration to see if a better locator pair is
available following identification of a viable locator is considered
to be an experimental protocol extension. The exploration in the base
protocol specification will terminate once a viable 9reachability
confirmed) locator pair has been discovered.

Reachability (v2)

* Reachability, version 2 (REAP) - using the same messages for four
complimentary functions (direct reachability, reverse-path
reachability, checking different return paths, return routability
checking). Including a mechanism for not having to probe in both
directions simultaneously.
* How long do we continue to probe? Keep context state around and
wait for upper layer apps to "try again". We may wish to remember
paths that previously failed so we try them LAST during the next
failure.
* Some discussion of how close REAP comes to being STUN protocol
when we have unidirectional path failures... is this ALSO an
experimental extension?
* Some discussion of how long we have to deliver SHIM6 baseline
functionality (time and energy), and how our charter maps to
"providing IPv4- equivalent multihoming in IPv6".
* Some discussion of unidirectional path failures - isn't this
usually due to ingress filtering? Can we assume that we don't have
to recover from unidirectional path failures in SHIM6? But we have
to detect this condition, even if we don't recover from it. We
think we can assemble two unidirectional paths into a single
bidirectional path, but don't know all the implications. We need
to be able to steer traffic based on source addresses to
accomplish this. Should continue exploration until we identify a
working bidirectional value. RFC 3484bis and SHIM6 setup and
recovery will all require bidirectional paths - don't SHIM6 setup
over unidirectional paths (because RFC 3484bis has a working path
already, no reason to try to improve it in the face of failures).
Should we allow setting up a SHIM before there is a context? Need
I1bis that says "send this back using a different locator" to make
this work. Concern about creating state? but maybe allowing state
on the INITIATOR is OK, as long as we don't require state on the
RESPONDER. May have to do M x N scan to find something that works.
May need "API on steroids" to initiate this process.
* REAP - Four functions:
1. Direct reachability the message can get to the receiver
2. Indirect reachability earlier messages got to the sender
3. Redirected response request the receiver to use a different
(specified) locator pair for the response
4. Perform a return routeability test

Another way to look at exploration is that of an exploration of a
matrix of locator pairs, with sender using locator Pair (a,b) and
the receiver using the locator pair (a,b), and each cell of the
matrix is itself a 3x3 set of possible information states, whether
traffic has been sent on the sending locator pair, whether traffic
has been received on this pair and what the local (sender) end
thinks it knows about the other (receiver) end.

Related issues: how many probes before the algorithm can consider
the locator pair unuseable? How many passes across the full
exploration space before the algorithm terminates with complete
failure?

Design Note: Where we are:
* Initial contact: 3484 (bis) is bidirectional
* Shim setup is bidirectional based on initial locator set
* Recovery from failure is potentially unidirectional

Design Note: Questions:
* Should shim setup allow unidirectional? No point per se unless you
have shim6 setup WITHOUT context
* Should 3484 be extended to allow unidirectional? No
* Should shim6 be allowed to setup without initial established
context? If yes, should it include unidirectional discovery?
* Should failure recovery continue to see if there is a
bidirectional locator pair even though there was already a
unidirectional path

* Note that the modified I1 in this case would include a STUN-like
request to pass a packet back with a different locator pair than
the received I1 packet

Action: Document the simple cases

Action: Document this concept of shim6 context without initial
bidirectional initial contact (i.e. shim6 initial context passes into
an initial walkthrough) and API considerations

Action: Initial reachability detection aim to get a unidirectional
support version drafted by November. Pekka Nikander and Iljitsch van
Beijnum to do an individual submission for working group review.

5. Next Steps

* Produce -01 of the protocol draft immediately following this
interim WG meeting
* Perform a WG Last call of the HBA draft
* Reachability detection aim to get a unidirectional support version
drafted by Vancouver
* 3484bis pass documented requirement to ADs to see where it's
actually reviewed.
* There may be impacts on the HBA draft if there is an option
structure added to CGA. We think Marcelo can handle this, before
IETF-64.
* Want to make sure everyone agrees that the base protocol extension
represents a conservative, but workable approach and at this stage
consider further refinements to be experimental extensions. Check
with the working group to see if unidirectional support goes in
the base protocol? (Does the WG agree that STUN-like response
redirection is a good thing to include in the base protocol? We're
saying that the API must allow applications to specify ULIDs,
we're saying that I and R packets must support response
redirection - maybe this is a reply redirection AND a reply that
says what the original addresses were.

References

1. draft-ietf-shim6-proto
2. draft-ietf-shim6-functional-dec
3. draft-ietf-shim6-hba
4. draft-huitema-shim6-ingress-filtering/
5. rfc2827
6. draft-bagnulo-shim6-addr-selection
7. draft-ietf-shim6-app-refer
8. draft-ietf-shim6-applicability/
9. http://draft-ietf-shim6-failure-detection.potaroo.net/
10. http://draft-ietf-shim6-reach-detect.potaroo.net/
11. http://draft-ietf-shim6-l3shim/
12. http://draft-ietf-shim6-arch.potaroo.net/

Re: Minutes of the Interim SHIM6 WG meeting Geoff Huston
Minutes of the Interim SHIM6 WG meeting Geoff Huston