[dna] Review of draft-ietf-dna-simple-11

Ted Lemon <mellon@fugue.com> Tue, 24 November 2009 17:52 UTC

From: Ted Lemon <mellon@fugue.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Tue, 24 Nov 2009 12:52:26 -0500
Message-Id: <B3D250DF-D059-48D3-8867-CB1645038382@fugue.com>
To: dna@ietf.org
Mime-Version: 1.0 (Apple Message framework v1077)
Subject: [dna] Review of draft-ietf-dna-simple-11
Precedence: list

Ralph asked participants in the DHC working group to review the draft, so I gave it a read. I have some comments. I realize that some of these comments are partially addressed in the summary message Jari sent to the list recently, but it's difficult to know what to include and what to leave out, so I just included it all.

From section 2.1:

o False positives are not acceptable. A host should not conclude
that there is no link change when there is one.

o False negatives are acceptable. A host can conclude that there is
a link change when there is none.

This seems backwards, unless there is no cost to a false negative. If
a host concludes that there was a link change when there was none, and
as a consequence abandons the IP addresses it was using on the
previous link, that's going to create a (potentially noticeable)
hiccup while it reacquires its address. Depending on the IP stack,
it's possible that connections might also be broken, e.g. if a syscall
returns EHOSTUNREACH due to a transient routing outage.

More, from section 4.3:

It SHOULD also set all Neighbor
Cache entries for the routers on its Default Router List to STALE.
This is done to speed up the acquisition of a new default router when
link change has occurred.

It's my impression that by default, a stale routing table entry won't
be used, so if an application writes to an open tcp connection while
the routing table entry is stale, it will get an EHOSTUNREACH, and the
connection will be dropped. I have had this exact experience with
IPv4 DNA implementations when my WiFi router signal is flaky. This is
much worse than taking a long time (500ms isn't really that long) to
identify a new link.

This would also be a problem in the case where you are on a WiFi
network with more than one device advertising the same SSID - when you
switch to a new beacon, arguably this is a DNA event, but it would clearly
be a mistake to break existing connections in this case.

In section 4.6, the draft seems to be saying that the DHCPv6 client
should starty soliciting before it receives an RA. This does actually
change the DHCPv6 state machine. In fact, the DHCPv6 protocol has a
specific message for checking in the event of a DNA event: the Confirm
message. I think Ralph already brought this up; I mention it because
it appears to be a problem to me as well.

I also think that this section goes into a lot of extra detail that is
unnecessary and probably harmful to interoperability, since what it
says doesn't really match what RFC3315 says. It would be better to
just refer to RFC3315 here.

Section 4.7.1 also seems to have a problem in that in this case we
have started the Confirm process before we receive an RA. If the RA
conflicts with the address we are attempting to confirm, we may know
this to be the case before we get a reply (or no reply) from a DHCPv6
server. This could be because the RA comes before the DHCPv6 Reply
message, or it could be because the link has in fact changed, and
there is no DHCPv6 server on the new link. In either case, there is
no need to wait for the response from the DHCPv6 server.

Also, 4.7.1 doesn't talk about what to do if the NS and RS results
differ, and the NA is secure whereas the RA is not. In that case it
seems to me that it would make sense to trust the NA rather than the
RA (presumably in that case a trustworthy RA would eventually follow,
since the router that responded to the NS with a secure NA will
presumably send a secure RA in response to the RS). But there's a
window of opportunity here for a rogue RA to unseat a secure router.

There's a similar problem in section 4.7 because section 4.7 instructs
the implementor to test whether an NA comes from the same router as an
RA, but doesn't talk about what to do if the security of the two
messages differs. So this is another opportunity for an attacker to
get in a quick DoS.

The same problem exists in section 4.9, where receipt of an NA, RA or
DHCP Reply will terminate other DNA operations, without regard to the
security level of the response that terminates the other state
machines.

Section 5 provides pseudocode for this process. Thread C of the
pseudocode is not asynchronous, and hence not correct. Even if it
could be made asynchronous, the protocol spec should be fully
descriptive; if it is not, conflicts between the spec and the
pseudocode will create interoperability problems, and being sure that
the two are the same is an unsolved computer science problem. So
section 5 should be deleted.

Section 7 doesn't mention what to do in the dual-stack case. Maybe
this is too big a can of worms...

I see that some of the issues I mention above are addressed in the
security considerations section. This is the wrong place for this -
the handling of SEND needs to be wrapped into the spec in section 4.
Security Considerations is the wrong place to put more of the protocol
spec.

Re: [dna] Review of draft-ietf-dna-simple-11 Laganier, Julien
[dna] Review of draft-ietf-dna-simple-11 Ted Lemon
Re: [dna] Review of draft-ietf-dna-simple-11 Bernard Aboba
Re: [dna] Review of draft-ietf-dna-simple-11 Bernard Aboba
Re: [dna] Review of draft-ietf-dna-simple-11 Bernard Aboba
Re: [dna] Review of draft-ietf-dna-simple-11 Bernard Aboba
Re: [dna] Review of draft-ietf-dna-simple-11 Thomas Narten
Re: [dna] Review of draft-ietf-dna-simple-11 Bernard Aboba
Re: [dna] Review of draft-ietf-dna-simple-11 Ted Lemon
Re: [dna] Review of draft-ietf-dna-simple-11 Erik Nordmark
Re: [dna] Review of draft-ietf-dna-simple-11 Bernard Aboba
Re: [dna] Review of draft-ietf-dna-simple-11 Erik Nordmark