[lisp] Warren Kumari's Discuss on draft-ietf-lisp-rfc6830bis-26: (with DISCUSS and COMMENT)

Warren Kumari <warren@kumari.net> Wed, 06 February 2019 02:21 UTC

Return-Path: <warren@kumari.net>
X-Original-To: lisp@ietf.org
Delivered-To: lisp@ietfa.amsl.com
Received: from ietfa.amsl.com (localhost [IPv6:::1]) by ietfa.amsl.com (Postfix) with ESMTP id C4BB1126C7E; Tue, 5 Feb 2019 18:21:54 -0800 (PST)
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit
From: Warren Kumari <warren@kumari.net>
To: The IESG <iesg@ietf.org>
Cc: draft-ietf-lisp-rfc6830bis@ietf.org, Luigi Iannone <ggx@gigix.net>, lisp-chairs@ietf.org, ggx@gigix.net, lisp@ietf.org
X-Test-IDTracker: no
X-IETF-IDTracker: 6.91.0
Auto-Submitted: auto-generated
Precedence: bulk
Message-ID: <154941971479.32132.7227582520612116720.idtracker@ietfa.amsl.com>
Date: Tue, 05 Feb 2019 18:21:54 -0800
Archived-At: <https://mailarchive.ietf.org/arch/msg/lisp/jC9v0ZqA26oANIP4FxWyJGwLYJY>
Subject: [lisp] Warren Kumari's Discuss on draft-ietf-lisp-rfc6830bis-26: (with DISCUSS and COMMENT)
X-BeenThere: lisp@ietf.org
X-Mailman-Version: 2.1.29
List-Id: List for the discussion of the Locator/ID Separation Protocol <lisp.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/lisp>, <mailto:lisp-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/lisp/>
List-Post: <mailto:lisp@ietf.org>
List-Help: <mailto:lisp-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/lisp>, <mailto:lisp-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 06 Feb 2019 02:21:55 -0000

Warren Kumari has entered the following ballot position for
draft-ietf-lisp-rfc6830bis-26: Discuss

When responding, please keep the subject line intact and reply to all
email addresses included in the To and CC lines. (Feel free to cut this
introductory paragraph, however.)


Please refer to https://www.ietf.org/iesg/statement/discuss-criteria.html
for more information about IESG DISCUSS and COMMENT positions.


The document, along with other ballot positions, can be found here:
https://datatracker.ietf.org/doc/draft-ietf-lisp-rfc6830bis/



----------------------------------------------------------------------
DISCUSS:
----------------------------------------------------------------------

I read much of this on a plane while overtired, so it is entirely possible /
probable that I've completely misunderstood something(s) obvious. Many of the
below are probably simple to address, and either I simply need to be educated,
or just there needs to be a bit more text / detail provided.

1: "3.  The ITR sends a LISP Map-Request as specified in
[I-D.ietf-lisp-rfc6833bis].  Map-Requests SHOULD be rate-limited." What does
the ITR do with the packet while waiting for the Map-Request to complete? Must
it buffer the packets or can it discard them? If the former, for how long must
it buffer? When you say "SHOULD be rate-limited", can you provide guidance on
rates? 1 request per second? 1 million per second? Is this rate-limit per
destination or per device? Apologies if this is clearly stated in RFC6833(bis)
- I only skimmed it, and didn't see an answer there.

2: "6. ... Note that the Map-Cache is an on-demand cache. An ITR will manage
its Map-Cache in such a way that optimizes for its resource constraints."
Presumably I could cause this cache to thrash / overflow by looking at the RLOC
database, and choosing EIDs to send traffic to which all require different
cache entries, causing the cache to overflow (or, at least, causing maximum
cache pressure). This seems like an ideal DoS vector. It seems that there
should be more guidance provided on how to size the Map-Cache / the expected
order of the cache size, even if it is ultimately an implementation issue (e.g:
is a Map-Cache of 100 entries OK for an ITR? or should it be O(1000)? Or
roughly size(database)/2? Having multiple devices with small caches, and a bot
which does the above seems like a global risk).

I'm quite confused by much of the MTU / Fragmentation stuff -- I did read the
documents on a plane after not getting much sleep, and so it is entirely
possible / probable that I'm just being stupid, but there are bits which don't
seem to make sense to me. 3: "2.  Define L to be the size, in octets, of the
maximum-sized packet an ITR can send to an ETR without the need for the ITR or
any intermediate routers to fragment the packet." How do I know what L is? The
document "RECOMMENDS that L be defined as 1500" -- but 1500 isn't universally
true (if it were, we would never have to do Path MTU). What happens when the
*actual* MTU on the path is e.g 1476 because there is a tunnel on the path? The
text also mentions "which is less than the ITR’s estimate of the path MTU
between the ITR and its correspondent ETR" - this implies that the ITR is
tracking / estimating the MTU, which a: doesn't align with the rest of the
text, or b: sounds like the stateful solution below. I have reread this
multiple times, but it still feels like it is avoiding the issue by defining it
to not exist.

4: "Note that reassembly can happen at the ETR if the encapsulated packet was
fragmented at or after the ITR." - I think that there needs to be more text /
description about resource constraints on routers performing reassembly of
fragments - in most cases a router doesn't have to / isn't expected to have to
reassemble transit packets from arbitrary sources on the Internet (things where
routers may reassemble are aimed at the control plane which can be
rate-limited, or are from expected source addresses). It seems that spoofing
lots of initial fragments without the final one will be a tax on the router.

5: "Instead of using the Map-Cache or mapping system, RLOC information MAY be
gleaned from received tunneled packets or Map-Request messages. A "gleaned"
Map-Cache entry, one learned from the source RLOC of a received encapsulated
packet, is only stored and used for a few seconds, pending verification." - it
seems that this is ripe for abuse (or I'm missing in the cache expiration). I
want to hijack traffic from Site X to well known Service Y, so I look up
Service Y and save the TTL from the Map-Reply. I then start spoof packets
listing myself as the ETR - eventually Site X will glean from my spoofed
packets, and start sending traffic to me - yes, this will only work for a few
seconds -- but as soon as I stop getting packets from site X, I know site X has
verified the entry and discovered it is wrong... and that the TTL is now being
deprecated. I start a timer, and second or two less than the TTL later I start
spoofing packets again, knowing that site X will soon expire the cache entry
and will once again be willing to accept mine again. A: I get some Site X to
Site Y traffic for a few seconds every TTL seconds, and B: the loss of this
traffic is a signal that TTL seconds again it will need to be refreshed.

 6: "10.1.  Echo Nonce Algorithm" -- If I spoof lots of packets with the N- and
 E-bits set, the receiving ETR will need to keep false state, and presumably I
 can overfill a cache. This will cause the ETR to not be able to include the
 received nonce on legitimate traffic, and so the ITR on the far side will
 think this ETR is down. This seems like a fairly easy DoS. I'm guessing that
 this can be worked around by not setting the E bit in the RLOC-probe Map-Reply
 message, but this feels like a dangerous foot gun, and should at least be
 noted. Note that this is different to the "Note the attacker must guess a
 valid nonce the ITR is requesting to be echoed within a small window of time. 
 The goal is to convince the ITR that the ETR’s RLOC is  reachable even when it
 may not be reachable."  attack listed in the document in that a: it doesn't
 require any guessing, and b: makes an ETR appear down, not up.

The document does mention "... attack can be mitigated by preventing RLOC
spoofing in the network by deploying uRPF BCP 38 [RFC2827]." - while that may
be true for many of the above, BCP38 is far from being universally deployed,
and this feels similar to solving world hunger by saying everyone must have
enough food. :-)

Again, apologies if I've completely misunderstood something, clue-bat gladly
accepted...


----------------------------------------------------------------------
COMMENT:
----------------------------------------------------------------------

Comments:
1: "LISP Locator-Status-Bits (LSBs):  ... The field is 32 bits when the I-bit
is set to 0 and is 8 bits when the I-bit is set to 1." - I think I'm missing
something fairly fundamental here (and in Section 10, 13.1, and others) - if
I'm using the I bit, it sounds like I can only have 8 ETRs? And with this clear
I can only have 32? I feel like I must have missed something....

2: "When the ’DF’ field of the IP header is set to 1, or the packet is an IPv6
packet originated by the source host, the ITR will drop the packet when the
size is greater than L and send an ICMPv4..." I think this needs to say when
the resulting (or encapsulated) packet is greater than L (otherwise it is
unclear if you are referring to the original or resulting packet).

3: "The server-side sets a Weight of zero for the RLOC subset list. In this
case, the client-side can choose how the traffic load is  spread across the
subset list." -- please insert a reference to Section 12 here. I wrote up a
long comment on what happens of the load sharing delivers packets to different
ETRs, before finding this section later on in the document.

Nits:
1: " LISP does not require changes to either host protocol stack or to underlay
routers. " -- I think either "to either the host protocol stack" or " to either
host protocol stacks"

2: "As an exmple of such attacks an off-path attacker can" -- typo for example.

3: "it can protect itself from erroneious reachability attacks" -- typo for
erroneous.