Re: [BEHAVE] Review of the NAT64 document?

Bryan Ford <baford@mpi-sws.org> Sun, 20 December 2009 16:32 UTC

Message-Id: <345ECBBB-D91E-4E9A-8E94-FD4083F9697B@mpi-sws.org>
From: Bryan Ford <baford@mpi-sws.org>
To: marcelo bagnulo braun <marcelo@it.uc3m.es>
In-Reply-To: <4B2B5C6E.3060805@it.uc3m.es>
Content-Type: text/plain; charset="US-ASCII"; format="flowed"; delsp="yes"
Content-Transfer-Encoding: 7bit
Mime-Version: 1.0 (Apple Message framework v936)
Date: Sun, 20 Dec 2009 08:32:21 -0800
References: <4B2B5C6E.3060805@it.uc3m.es>
Cc: IETF BEHAVE WG <behave@ietf.org>
Subject: Re: [BEHAVE] Review of the NAT64 document?
Precedence: list

Hi Marcelo,

Here are some comments on draft-ietf-behave-v6v4-xlate- 
stateful-07.txt.  This message is divided into three sections: (a)  
technical issues that may require some list discussion; (b) clarity  
issues with what exactly is specified and now, that you can probably  
just fix yourself; (c) minor editorial/writing nitpicks that can  
certainly be fixed unilaterally.

---
Technical issues (apologies if any of these have already been  
discussed extensively and I just don't know it because I haven't been  
keeping up with the list):

* First, the specified rules for generating ICMP packets look to me to  
be dangerous in that they're liable to interfere with hole punching.   
For example, see the paragraph on the border between page 25 and 26:

"If the security policy requires silently dropping externally  
initiated TCP connections, then the packet is silently discarded,  
else, If the destination transport address contained in the incoming  
V4 SYN (i.e. X,x) is not in use in the TCP BIB, then the packet is  
discarded and an ICMP Port Unreachable error (Type 3, Code 3) is sent  
back to the source of the v4 SYN. The state remains unchanged in CLOSED"

This situation (the dest transport addr not in the BIB) will be  
commonplace in hole punching scenarios, in which one client's outgoing  
SYN gets to the far NAT before the other client's SYN has gotten to  
its own NAT.  If that NAT returns an ICP Port Unreachable error, that  
could kill the hole punching attempt.  Unless it is somehow known that  
all or at least most existing TCP stacks don't give up when they get  
an ICP Port Unreachable, I would say that a NAT64 MUST or at least  
SHOULD just silently discard "unsolicited" packets like these  
independent of security policy.

* Second issue is about TCP lifetime expiry during the ESTABLISHED  
state, page 26.  The draft suggests (requires?) that the session table  
merely be silently deleted when the 2-hour timer expires, which I  
think is far from ideal.  BEHAVE-TCP explicitly suggests that NATs  
send keepalives to the endpoints at the expiry of this timer to  
"double-check" whether the connection is really still alive or not.   
The keepalives will hopefully solicit either an ACK or a RST,  
providing a definite indication.  So the procedure I think this  
standard should specify, with at least a SHOULD, might go something  
like this:

1. When the 2:40:00 timer expires, send TCP keepalives to at least the  
internal (typical IPv6) endpoint, and possibly to both endpoints; then  
start a shorter 4-minute timer.

2. If within that 4-minute grace period the NAT64 sees a regular ACK,  
go back to the ESTABLISHED state with another 2:40:00 timer.  If the  
NAT64 sees a RST, go to RST RCV state.

3. If the 4-minute grace period expires, drop the session state.

Actually, it might be sufficient simply to make this "grace period"  
state behaviorally identical to the "RST RCV" state - i.e., the NAT64  
goes into the RST RCV state when either it sees an explicit RST or  
when the ESTABLISHED timer expires, and in either case it can get back  
out of that state to ESTABLISHED if it sees more normal traffic.

* Third issue is with session timers for UDP.  (I think I brought up  
this on this list sometime back, but I don't know if there's been any  
discussion of it since then.)  While just having a constant UDP  
session timeout of 5 minutes should be permitted behavior, what would  
be even better and perhaps RECOMMENDED is if the NAT increases the  
session timeout as the session's lifetime increases.

For example, if the session survives its first 5-minute timeout (i.e.,  
the NAT sees traffic during that time), it doubles the next period to  
10 minutes, then to 20 minutes, etc., until some maximum, e.g., the  
2:40:00 timeout specified for TCP in the ESTABLISHED state.

This behavior still ensures NAT session state gets garbage collected  
before its lifetime exceeds about twice the time the UDP session is  
actually in-use; it won't affect UDP apps that naively just send  
keepalives at <5min periods; but it may greatly benefit smarter UDP  
apps on power-constrained mobile devices that want to reduce power  
consumption.  In particular, if the UDP app implements a binding- 
timeout-testing algorithm that starts with frequent keepalives and  
gradually decreases the keepalive frequency, then on a "naive" NAT  
with a 5-min timeout the app will learn that it has to send keepalives  
every 5min and do that; but on a NAT with doubling session timeouts,  
the session will appear to the smart app to have an "infinite" (or a  
2:40-min) session timeout, and the smart app will be able to gradually  
reduce its keepalives to a very low rate and conserve power even while  
holding a long-term UDP session open.

---
Next, technical clarity issues with the draft:

Section 3.1 - some of this text looks in need of a bit more precision,  
including use of the standard requirements terms MAY/SHOULD/MUST.   
Especially, e.g., the end of para 5 - "the incoming IP packet is  
silently discarded" - does that mean it SHOULD be silently discarded,  
or MUST be, or what?

Paragraph starting "For incoming packets...", last sentence "In the  
latter case, subsequent fragments may arrive before the first." - what  
is this supposed to be saying?  Yes, fragments may arrive out of order  
- but then what?  SHOULD or MUST the NAT64 try to hold onto the out-of- 
order fragments until the first fragment arrives, before which it  
won't know the full tuple and thus won't be able to forward those out- 
of-order fragments?  Or MAY a NAT silently drop out-of-order fragments  
it receives the first packet?  Seems like that behavior could be  
dangerous - some existing hosts are known to send packets out of order  
out of habit; such a host's fragmented packets would always be black- 
holed by such a NAT.

next para - what does "has enough resources" mean?  Are NAT64s  
required to have any resources at all for this purpose, or is "always  
zero resources for reassembling zero-checksum packets" a legitimate  
design point?  Seems the standard should specify this. Also, "will  
silently discard" - is that a MUST?

3.2.1, para containing "By default, the maximum session lifetime is 5  
minutes." - what does this mean precisely?  Does it mean the maximum  
session lifetime MUST be 5 minutes?  Or MUST be _at least_ 5 minutes?   
Or it SHOULD be configurable and the default MUST (or SHOULD) be [at  
least] 5 minutes?  or what?

Later, in the first subpara of IPv4 processing part - "packet is  
dropped" - MUST be dropped?  "has a type of 3" - MUST have a type of 3?

Session timer para at end of section - same issue as earlier.

3.2.1.1. first para - "If the rules specify" - what/whose rules?  the  
rules specified right here in the standard?  Shouldn't the standard  
"know" what its own rules specify? :)  Or are you referring to some  
rules defined by the NAT64 implementor?  Please clarify.

paga 26, first para - "is silently discarded" - MUST be?

page 26, last para - the parenthetical expression about the NAT being  
able to move from RST RCV back to ESTABLISHED if the endpoints don't  
accept the RST is very important for DoS protection, I believe, and:  
(a) shouldn't be a parenthetical expression (makes it sound  
unimportant), and (b) this behavior needs to be specified with MUST  
language.

3.2.2.1, first para - another "If the rules specify" whose meaning I  
don't understand.  (what rules?)

Section 3.5 on hairpinning is inadequate, both in terms of  
specification and in terms of description.  In terms of specification:  
"This step handles hairpinning if necessary."  What does "if  
necessary" mean?  Existing BEHAVE documents dictate that NATs MUST  
support hairpinning, don't they?  Shouldn't the first paragraph of 3.5  
reiterate this, instead of just reiterating what it means for a NAT to  
support hairpinning?

In terms of description, I think the second paragraph should be  
expanded to a more precise, step-by-step description, like the rest of  
the processes described in this section.

sec 4, "Attacks" section, first para - "NAT64 devices should  
implement" -> "MUST implement"?

---
The rest are mostly writing/editorial nitpicks:

1. Intro, first para:

(a) "a mechanism for IPv6-IPv4 transition" - makes it sound like we're  
transitioning from IPv6 to IPv4! Don't you mean "IPv4-to-IPv6  
transition"?
(b) "to an IPv4-only server, They also enable" - change ',' to '.'

Second para is written assuming IPv4-side node is a server and IPv6- 
side node is a client, implicitly contradicting what the first para  
just said. Please clarify - is this para only supposed to apply to the  
"primary usage" of IPv6-client-to-IPv4-server, or does it apply to all  
supported communication modes?

1.1 - I don't understand what the second bullet is saying:
    o  In the absence of any state in NAT64 regarding a given IPv6 node,
       only said IPv6 node can initiate sessions to IPv4 nodes.  This
       works for roughly the same class of applications that work  
through
       IPv4-to-IPv4 NATs.

1.1, third bullet - isn't the capitalized "MAY" a misuse of standards  
terminology, given that its use here is just to state a fact about  
NAT64 and not to specify required or allowed behavior in an  
implementation of a protocol?  Decapitalize.

Also, "via one of the following mechanism" - last word should be plural.

Also, third sub-bullet is missing a period at the end.

1.2, para 2 - "an NAT64 box" - change "an" to "a"

1.2.1, para 3 - change "addresses i.e. an" to just "addresses: an"

1.2.1, para starting "the IPv4 address pool is": change ", which  
enable" in last sentence to ", enabling".

next para - delete "it is easy to understand that", and change "which  
address" to "whose address"

1.2.2, first para - "a IPv4 node" -> "an IPv4 node"

sec. 3 - the text before the start of 3.1 is very long and should  
probably be split up a bit.  For example:

- start a new subsection titled "Binding Information Bases" just  
before the para beginning "A NAT64 has three Binding Information  
Bases..."

- start a new subsection titled "Session Tables" just before the para  
beginning "A NAT64 also has three session tables..."

- Start a new subsection titled, e.g., "Packet Processing" or "Packet  
Processing Overview", just before the para beginning "The NAT64 will  
receive packets through its interfaces..."

3.1, a few paragraphs in: "even if the arrive" -> "even if they  
arrive"; and "conditioned to" -> "conditioned on"

3.2, para 2: capitalize "must" in first line.  Also, "will only"  
appears twice later in the para - should those be MUSTs?

3.2.2, V4 SYN RCV para: "waiting for a matching IPv4 packet" - don't  
you mean IPv6 here?

3.2.2, under *** CLOSED ***, bullet 1. - "does not exists" -> "does  
not exist"

page 25, first para - "unchanged in CLOSED" needs a period.

page 25, para "If the NAT64 is performing..." - mangled text; not sure  
how it's supposed to parse.  At least eliminate the doubly nested  
parentheses. :)

3.2.2 - this is a very long section, and could be easily broken up a  
bit for example by adding a subsubsection for each state instead of  
the *** STATE *** thing.  Then that structure would be reflected in  
the table of contents, too, which I think would be a good thing.

sec 4, para "Any protocol...": "that protect" -> "that protects"; "are  
essentially" -> "is essentially"; delete "So,"; "inherent to" ->  
"inherent in".

This section has a bunch of subsections (e.g., "Implications on...",  
"Filtering", Attacks"...) that need to be made into "real" subsections  
- i.e., numbered and included in the table of contents.

"Attacks to NAT64" -> "Attacks on NAT64"

first para of that section: delete "It should be noted that".

"launch a DoS attack to" -> "launch a DoS attack on".

"Avoiding hairpinning loops", first para: "will loop" -> "could  
loop".  (Will NOT if this spec is implemented correctly, right!?)  And  
same for next instance of "will loop".  And delete another instance of  
"It should be noted that".  "your changes" -> "the attacker's  
chances".  "the NAT64 drops" -> "the NAT64 MUST drop ... as described  
in Section XXX"?

---
END of comments.

Thanks,
Bryan

Re: [BEHAVE] Review of the NAT64 document? Bryan Ford
Re: [BEHAVE] Review of the NAT64 document? Reinaldo Penno
[BEHAVE] NAT64 sending TCP keepalives before disc… marcelo bagnulo braun
[BEHAVE] Increasing the session timer for long li… marcelo bagnulo braun
Re: [BEHAVE] Increasing the session timer for lon… Reinaldo Penno
Re: [BEHAVE] Increasing the session timer for lon… marcelo bagnulo braun
Re: [BEHAVE] Increasing the session timer for lon… Reinaldo Penno
Re: [BEHAVE] Increasing the session timer for lon… marcelo bagnulo braun
Re: [BEHAVE] NAT64 sending TCP keepalives before … Reinaldo Penno
Re: [BEHAVE] NAT64 sending TCP keepalives before … Senthil Sivakumar (ssenthil)
Re: [BEHAVE] Increasing the session timer for lon… Senthil Sivakumar (ssenthil)
[BEHAVE] Hole Punching (was Re: Review of the NAT… marcelo bagnulo braun
Re: [BEHAVE] Hole Punching (was Re: Review of the… marcelo bagnulo braun
Re: [BEHAVE] Hole Punching (was Re: Review of the… Bryan Ford
Re: [BEHAVE] Hole Punching (was Re: Review of the… Dan Wing
Re: [BEHAVE] Hole Punching (was Re: Review of the… Christian Huitema
Re: [BEHAVE] Hole Punching (was Re: Review of the… marcelo bagnulo braun
Re: [BEHAVE] Hole Punching (was Re: Review of the… marcelo bagnulo braun
Re: [BEHAVE] Hole Punching (was Re: Review of the… Dan Wing
Re: [BEHAVE] Hole Punching (was Re: Review of the… Christian Huitema
Re: [BEHAVE] Hole Punching (was Re: Review of the… Dan Wing
Re: [BEHAVE] Hole Punching (was Re: Review of the… Christian Huitema
Re: [BEHAVE] Hole Punching (was Re: Review of the… marcelo bagnulo braun
Re: [BEHAVE] Review of the NAT64 document? marcelo bagnulo braun
Re: [BEHAVE] Increasing the session timer for lon… Senthil Sivakumar (ssenthil)