Re: [Ila] [lisp] LISP for ILA

On Tue, Mar 13, 2018 at 10:46 PM, Richard Li <renwei.li@huawei.com> wrote:
> 》 enlightening or convincing. I am really hoping we can get something
> 》more concrete for dealing with DOS threats in a control plane for ILA.
>
> Isn’t DOS a data plane problem?
>
Richard,

The potential attack is on the mapping cache that needs to be
maintained by the control plane. It's really the cache that is being
attacked via that packets sent by an attacker. The goal of the
attacker is something like exhausting the cache or other resources
such that legitimate traffic is blocked or severely degraded. The
recent Meltdown and Spectre exploits on CPU caches are a good reminder
of how generally how hard it is to make caches resilient to attack and
how the problem is never completely solved!

Tom

> Richard
>
> From: Tom Herbert
> To: Florin Coras;
> Cc: ila@ietf.org; lisp@ietf.org;
> Subject: Re: [lisp] [Ila] LISP for ILA
> Time: 2018-03-13 22:25:44
>
>
> On Tue, Mar 13, 2018 at 6:37 PM, Florin Coras <fcoras.lists@gmail.com>
> wrote:
>> Not sure about ILA-R but typically when deploying LISP, RTR/Proxy-ITRs
>> have
>> enough memory to store most, if not all, of the identity to location
>> mappings. Therefore, once in steady state, most of the requests to the
>> mapping system are triggered by edge devices ITR/ILA-N.
>>
> ILA-Rs contain the all the mappings for the shard the service. If they
> don't have a mapping for a packet, then the packet is dropped.
>
>> This then means that just rate limiting ITRs should be enough to avoid
>> DOS-ing the control plane and the problem converts into one of trying to
>> avoid providing sub-optimal paths to legitimate traffic due to attacker
>> pressure. As Alberto mentioned, there are a number of solutions to
>> determining both the attackers and the destinations set that should be
>> protected against cache evictions. The former can be used to determine the
>> set of requests that should not be punted, while the latter ensures that
>> mappings for popular destinations cannot be evicted by attacks.
>>
> Okay, but I still don't know where the details and analysis of these
> solutions are. It's not enough to simply say that rate limiting is the
> solution to the DOS threat. I looked at RFC7835, for instance, which
> gives a nice analysis of the threat, but the suggested mitigations are
> "careful deployment and configuration" and "Systematically applying
> filters and rate limitation"-- that guidance is not particularly
> enlightening or convincing. I am really hoping we can get something
> more concrete for dealing with DOS threats in a control plane for ILA.
>
> Thanks,
> Tom
>
>> Florin
>>
>> On Mar 13, 2018, at 4:27 PM, Tom Herbert <tom@quantonium.net> wrote:
>>
>> On Tue, Mar 13, 2018 at 3:50 PM, Alberto Rodriguez Natal (natal)
>> <natal@cisco.com> wrote:
>>
>>
>>
>> On 3/13/18, 1:05 PM, "Tom Herbert" <tom@quantonium.net> wrote:
>>
>>
>>    This is reflected below in: "While the mapping is being resolved via
>>    the Map-Request/  Map-Reply process, the ILA-N can send the data
>>    packets to the underlay using the SIR address."
>>
>>    I think it should be assumed in ILA that not queuing packets and not
>>    dropping packets because of resolution are requirements (too much
>>    latency hit).
>>
>> IMHO, these should not be hard requirements. Leveraging ILA-Rs for mapping
>> resolution has another set of tradeoffs to be considered. An operator
>> should
>> be able to decide which set of tradeoffs makes sense for his/her
>> particular
>> scenario.
>>
>>    This is a hard requirement because caches are explicitly not required
>>    for ILA to operate. They are *only* optimizations. If there is a cache
>>    hit then packets presumably get optimized path, on a cache miss they
>>    might take a subopitimal route-- but packets still flow without being
>>    blocked! This means that the worse case DOS attack on the cache might
>>    cause suboptimal routing; however, if resolution is required then the
>>    worse attack case becomes that packets don't flow and it's a much more
>>    effective attack.
>>
>> Performing the mapping resolution at the ILA-N doesn't mean that you can't
>> send the packets to the ILA-R to avoid the first-packet-drop. Those are
>> two
>> different things. Traditionally in LISP, a possible deployment model is to
>> have a couple of RTRs with all the mappings in the site, so xTRs can use
>> them as default path while they are resolving mappings. In this scenario,
>> all the mapping resolution is done at the xTRs while the RTRs are only
>> forwarding "first-packets". We have seen this model working really well
>> even
>> for large LISP deployments.
>>
>>    In ILAMP, a redirect method is defined. On a chache miss the packet is
>>    forwarded and no other action is taken. If an ILA-R does
>>    transformation it may send back a mapping redirect informing the ILA-N
>>    of a transformation. The redirects must be completely secure (one
>>    reason I'm partial to TCP) and are only sent to inform an ILA-N about
>>    a positive response. To a large extent this neutralizes the above
>>    random address DOS attack. There are other means of attack on the
>>    cache, but the exposure is narrowed I believe.
>>
>> That model is supported in LISP via the use of Map-Notifies. However,
>> moving
>> the mapping resolution to the ILA-R comes at a cost. It's putting more
>> load
>> (in terms of both data and control plane) into an architectural component
>> that it's not easy to scale out, since it requires (for instance)
>> reconfiguring the underlay topology.
>>
>>
>>    I'm not see how this creates more load (i.e. the need for map request
>>    packets are eliminated), but I really don't understand what
>>    "reconfiguring the underlay topology" means!
>>
>> Happy to try to clarify this. I'm talking about the load in the ILA-R.
>> With
>> a "redirect" model, the ILA-R has to (1) serve as the data-plane default
>> path and (2) provide control-plane mapping resolution. This is
>> centralizing
>> the data-plane and control-plane into a single component, the ILA-R.
>> Moreover, this will also require a lot of punts from the fast path to the
>> slow path in the ILA-R which has also implications. With a request/reply
>> model, the control-plane resolution is performed at the edges in a
>> distributed fashion and the ILA-R only serves as data-plane default path
>> to
>> avoid dropping traffic. The latter model alleviates the load in the
>> ILA-Rs,
>> which reduces the need to scale them out.
>>
>> Yes, but you are ignoring the load on the mapping servers which also
>> needs to scale. Additionally, if ILA-N is both forwarding a packet and
>> sending a map request then this potentially doubles the packet load on
>> the network and exacerbates the potential DOS attack where someone
>> floods an ILA-N with packets having bogus destinations. There might be
>> mitigations to this DOS attack, like heavy-hitters you mentioned, but
>> we really need the details to see exactly how this works and how
>> effective they are. On the surface of it, it looks like
>> request/response model is susceptible to DOS especially when third
>> parties are allowed to drive the process.
>>
>> Tom
>>
>> _______________________________________________
>> lisp mailing list
>> lisp@ietf.org
>> https://www.ietf.org/mailman/listinfo/lisp
>>
>>
>
> _______________________________________________
> lisp mailing list
> lisp@ietf.org
> https://www.ietf.org/mailman/listinfo/lisp