Re: [sip-overload] AD review: draft-ietf-soc-overload-control-13 (MAJOR ISSUES)

Richard Barnes <rlb@ipv.sx> Tue, 29 October 2013 18:33 UTC

Return-Path: <rlb@ipv.sx>
X-Original-To: sip-overload@ietfa.amsl.com
Delivered-To: sip-overload@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 18A3921E8083 for <sip-overload@ietfa.amsl.com>; Tue, 29 Oct 2013 11:33:14 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.758
X-Spam-Level:
X-Spam-Status: No, score=-1.758 tagged_above=-999 required=5 tests=[AWL=-1.082, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, HTML_MESSAGE=0.001, MANGLED_PAIN=2.3, RCVD_IN_DNSWL_LOW=-1]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ynvlfjxECEWA for <sip-overload@ietfa.amsl.com>; Tue, 29 Oct 2013 11:33:10 -0700 (PDT)
Received: from mail-ob0-f178.google.com (mail-ob0-f178.google.com [209.85.214.178]) by ietfa.amsl.com (Postfix) with ESMTP id 5773321F9EB8 for <sip-overload@ietf.org>; Tue, 29 Oct 2013 11:32:59 -0700 (PDT)
Received: by mail-ob0-f178.google.com with SMTP id wm4so301138obc.37 for <sip-overload@ietf.org>; Tue, 29 Oct 2013 11:32:58 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=bp/ADN6Qq4gRLhpSKSRnWy14nyAaDaDl+U30v5mUIDQ=; b=SupTT7xw8Gu9defNcmOFGRJmB0UothD1PtAGTawiXDhj3dynTQvg3KqQm7/fSN3pfx goIO8q1uqrRWJmJGd0DgdYqoGEoNWaj8PVdmMdGf283TFg1p1wqhLQLJlZyQG9Mgu5JY iEnspWOLbsAquy4kP0/hfrL81tnsCns83xmsDcjdHy8iT8Ju2c5Gjjc6YzhPSpxaZJM+ 6qgCzJMaYeqs/Al9NYo/prFkukFqmopa9NmaSAhX+R89o/G1EyFtlkMXgYuI43gHwmrb AYOnZLnlPXnh+uahzkIhzTAfjeFQv55ZNZgQbZVscbIDLjYUsovlZ0HS6pB/1KkvbH0I va8g==
X-Gm-Message-State: ALoCoQknTsYj5NY0XqWCDC323P9eiv6dUNTpajc7kTPHmtTgo0XmDe4+ve/2Ui3tt44eDrfnLR07
MIME-Version: 1.0
X-Received: by 10.60.65.227 with SMTP id a3mr678214oet.13.1383071578473; Tue, 29 Oct 2013 11:32:58 -0700 (PDT)
Received: by 10.76.101.10 with HTTP; Tue, 29 Oct 2013 11:32:58 -0700 (PDT)
In-Reply-To: <520564B6.7040304@bell-labs.com>
References: <520564B6.7040304@bell-labs.com>
Date: Tue, 29 Oct 2013 14:32:58 -0400
Message-ID: <CAL02cgSov68ub=Djrst=V+=f0oPHY_HFMEb6jOuerWYAaxtRxQ@mail.gmail.com>
From: Richard Barnes <rlb@ipv.sx>
To: "Vijay K. Gurbani" <vkg@bell-labs.com>
Content-Type: multipart/alternative; boundary="001a11c257002281cd04e9e572cc"
Cc: "sip-overload@ietf.org" <sip-overload@ietf.org>
Subject: Re: [sip-overload] AD review: draft-ietf-soc-overload-control-13 (MAJOR ISSUES)
X-BeenThere: sip-overload@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: SIP Overload <sip-overload.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/sip-overload>, <mailto:sip-overload-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/sip-overload>
List-Post: <mailto:sip-overload@ietf.org>
List-Help: <mailto:sip-overload-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/sip-overload>, <mailto:sip-overload-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 29 Oct 2013 18:33:14 -0000

Hey Vijay,

Thanks for following up in detail.  Catching up with this now.  Responses
inline.


On Fri, Aug 9, 2013 at 5:52 PM, Vijay K. Gurbani <vkg@bell-labs.com> wrote:

> Richard: Thank you for a close read of the draft as part of the
> AD review (c.f. [1]).  I must offer my apologies for a delay in
> response, but I was on vacation when your email arrived and could not
> get to attend to it until now.
>
> I will break the response to your review in two parts.  This email
> attends to the major issues.  I will generate a separate email for the
> minor issues.
>
> So, taking a look at the major issues.
>
>  1. The fourth paragraph of Section 4.4 does not allow clients to detect
>> when overflow has happened.  It seems like there are two options: (1)
>> Define the timestamp so it cannot overflow (e.g., as a sufficiently long
>> timestap), or (2) define what an "appropriate base value" is so that a
>> client can recognize when the "oc-seq" has been reset there.  It seems
>> like it would be simplest to imply say that "oc-seq" MUST be an N-bit
>> counter value or a timestamp.  (Or better, choose one.)
>>
>
> The client will detect overflow when it receives an "oc-seq" parameter
> whose value is less than the previously seen value (because successive
> "oc-seq" parameter values are defined to be in an increasing order).
> When this happens there are guidelines on what the client
> implementation can do (last indented paragraph of S4.4).
>
> Regarding choosing between a timestamp or a counter, the WG prefers to
> use a timestamp primarily because of re-entrancy issues in a multi-
> core/thread environment when using a common variable (the counter)
> [2].  However, we did not end up mandating a timestamp using rfc2119
> language to allow flexibility to implementations who may want to use
> something else.
>
> The earliest we will overflow using a signed 32-bit representation
> of a timestamp will be January 2038; using a signed 64-bit
> representation gets us about 290 billion more years.  I
> understand the need to design robust protocols, but I am not
> sure what to mandate here besides perhaps exhorting implementations
> to use an unsigned 32-bit integer or move to a 64-bit machine or use
> a time library that will handles large time values on a 32-bit
> architecture.
>

In retrospect, I may have been a little persnickety about this.  What I
meant was that simply having a timestamp with a value less than a previous
one is not sufficient to detect overflow, because messages can arrive out
of order (as the document notes).  It seems like for overflow detection,
you could just require that the oc-seq value be *significantly* below
recent values, or below all values used in a time window.

Also, if overflow can be detected, then shouldn't the new message be
accepted and used, even though the oc-seq value is lower?

It's actually a little more troubling that the field is not actually
defined to be an unsigned integer AFAICT.  Could we define this to be a
32-bit or 64-bit unsigned integer?  Otherwise, talking about comparisons is
kind of ill-defined.

Suggested text:
OLD: "This parameter contains a value..."
NEW: "This parameter contains a unsigned integer value..."

OLD:
"""
Due to an overflow, client implementations should be prepared to receive an
"oc-seq" parameter whose value is less than the previous value.  Client
implementations can handle this by continuing to perform overload control
until the "oc-validity" related to the previous value of "oc-seq" parameter
expires.
"""
NEW:
"""
A client implementation can recognize that an overflow has occurred when if
it receives an "oc-seq" parameter whose value is significantly less than
the last several previous values.  If an overflow is detected, then the
client should use the overload parameters in the new message, even though
the sequence number is lower.  The client should also reset any internal
state to reflect the overflow so that future messages (following the
overflow) will be accepted.
"""



>  2. "oc=0" seems like a bad way to signal that there's no overload going
>> on.  "oc-validity=0" does that.  And if there's no overload condition,
>> on algorithm is being applied, so no "oc" parameter value needs to be
>> supplied.  So it seems like you should signal support with no overload
>> with "oc;oc-validity=0".
>>
>
> The reason "oc=0" was chosen to denote a server was not under overload
> control was that we wanted to treat the "oc" parameter in the same
> manner as the SIP "rport" parameter in rfc3581.  Like "rport", when a
> server filled a value for "oc" it indicated support for the overload
> extension.  A value of "0" for the "oc" parameter indicated that the
> server supported overload control but it was not interested in using it
> right now.
>
> The problem that arised when using only "oc=0" to indicate support
> for overload control was that if the rate-based algorithm was chosen to
> perform overload, then the value "oc=0" was taken as indication not to
> send any requests to the server.  Hence we added "oc-validity=0" as
> well.
>
> Ostensibly, as you state, a server could insert "oc;oc-validity=0" to
> indicate to a client that it supports overload control.  But being
> more explicit by "oc=0;oc-validity=0" does not seem too onerous.
> Unless you feel strongly about this, I'd rather leave the current
> text as is.
>

Ok, this explanation makes sense.  I can live with the current text.



>  3. As written, it seems like Section 5.9 could be read to require a
>> behavior that would exacerbate overload conditions with liveness checks.
>> (For example, "periodically" with period 500ms.)  Suggest that this
>> section recommend a back-off algorithm, possibly with some concrete
>> timing paramters.  E.g., start at whatever interval you like,
>> exponential back-off to some floor (say 10sec).
>>
>
> I believe this issue is a bit more complex than suggesting a exponential
> backoff algorithm.  The intent is that a SIP client has a plurality of
> downstream servers to choose from, and that one of these servers is not
> currently responding.  Therfore, request is sent to the other (N-1)
> servers in some fashion.  It is not the case that the same request is
> continuously sent to the non-responding server.
>
> Instead of mandating any specific behaviour, the current text leaves
> this to implementations to determine how often they want to, and indeed,
> if they want to, contact the non-responding server.  Should we mandate
> more?
>

We probably don't need to be as specific as my initial suggestion, but I
think it would be good to note that liveness checks are a source of load.
Suggested text:

OLD:
"""
The SIP client SHOULD periodically probe if the downstream server is alive
using any mechanism at its disposal.
"""

NEW:
"""
The SIP client SHOULD periodically probe if the downstream server is alive
using any mechanism at its disposal.  Clients should be conservative in
their probing (e.g., using an exponential back-off) so that their liveness
probes do not exacerbate an overload situation.
"""



>  4. The Security Considerations do a good job of describing the threats
>> that this mechanism introduces, but less so the mitigations to these
>> threats.  In particular, there is no mitigation provided for the
>> multi-hop attack, and the mitigation for a malicious client is not
>> clearly stated.  Suggest:
>> -- Recommending that clients enforce a maximum validity period (e.g.,
>> 3600s) in order to limit the scope of spoofing attacks (off-path or
>> multi-hop)
>> -- "Servers SHOULD monitor client behavior to determine whether they are
>> complying with overload control policies.  If a client is not
>> conforming, then the server SHOULD treat it as a non-supporting client
>> (Section 5.10.2)."
>>
>
> Thanks for the suggestions.  I re-read the section and in light of your
> suggestions, I think the section may benefit from rewording it to
> enunciate the security threats better.  The mitigation strategies you
> outline then become more apparent.  Let me suggest the new wording of
> the section, including your mitigation strategies, that will completely
> replace the current S10.  Let me know what you think.
>
>
Overall, pretty good.  Couple of minor things inline.


> --- Begin new text
>
> 10. Security Considerations
>
> Overload control mechanisms can be used by an attacker to conduct a
> denial-of-service attack on a SIP entity if the attacker can pretend
> that the SIP entity is overloaded.  When such a forged overload
> indication is received by an upstream SIP client, it will stop
> sending traffic to the victim.  Thus, the victim is subject to a
> denial-of-service attack.
>
> To better understand the threat model, consider the following diagram:
>
>    Pa -------                    ------ Pb
>              \                  /
>    :  ------ +-------- P1 ------+------ :
>              /    L1        L2  \
>    :  -------                    ------ :
>
>    -----> Downstream (requests)
>    <----- Upstream (responses)
>
> Here, requests travel downstream from the left-hand side, through Proxy
> P1, towards the right-hand side, and responses travel upstream from the
> right-hand side, through P1, towards the left hand side.  Proxies Pa, Pb
> and P1 support overload control.  L1 and L2 are labels for the links
> connecting P1 to the upstream clients and downstream servers.
>
> If an attacker is able to modify traffic between Pa and P1 on link L1,
> it can cause denial of service attack on P1 by having Pa not send
> any traffic to P1.  Such an attack can proceed by the attacker modifying
> the response from P1 to Pa such that Pa's Via header is changed to
> indicate that all requests destined towards P1 should be dropped.
> Conversely, the attacker can simply remove any "oc", "oc-validity" and
> "oc-seq" markings added by P1 in a response to Pa.  In such a case, the
> attacker will force P1 into overload control by denying request
> quenching at Pa even though Pa is capable of performing overload
> control.
>
> Similarly, if an attacker is able to modify traffic between P1 and Pb
> on link L2, it can change P1's Via header in a response from Pb to
> P1 such that all subsequent requests destined towards Pb from P1 are
> dropped.  A denial of service attack is thus mounted on Pb.  Note that
> it is immaterial whether Pb supports overload control or not, the
> attack will succeed as long as the attacker is able to control L2.
> Conversely, an attacker can simply remove any "oc", "oc-validity"
> and "oc-seq" markings added by Pb in a response to P1.  In such a case,
> the attacker will force P1 into sending requests to Pb even under
> overload conditions because P1 would not be aware aware that Pb
> supports overload control.
>
> P1 can prevent these types of attack by using TLS on links L1 and L2.
>

Couple observations:
-- You might phrase this as (1) indicating false oc, and (2) suppressing
genuine oc
-- To do a DoS by indicating false oc over UDP, you could even be off-link
=> mitigate by using TCP/TLS/WS and/or applying BCP 38
-- To suppress genuine OC, you need to hijack the connection => Use TLS


Yet another type of attack could be mounted by a malicious proxy
> (say Pb in the above figure) changing Via headers not corresponding
> to their immediate neighbor such that the overload control parameters
> of these Via headers would cause the SIP client identified in the
> Via header not to send requests to its neighbour.  Such a multi-hop
> attack can be prevented ensuring that a SIP client removes "oc",
> "oc-validity" and "oc-seq" parameters from all Via headers of a
> response received, except for the topmost Via header.  This prevents
> overload control parameters that were accidentally or maliciously
> inserted into Via headers by a downstream SIP server from traveling
> upstream (Section 5.4).
>

I actually think the original text conveys this better.  The mitigation
described above helps if the request only goes through proxies that support
this spec.  But the mention of non-supporting proxies has been lost, and
that's the case where you need some other mitigation, e.g., the max
oc-vailidity.



> A malicious SIP entity could gain an advantage by pretending to
> support this specification but never reducing the amount of traffic
> it forwards to the downstream neighbor.  If its downstream neighbor
> receives traffic from multiple sources which correctly implement
> overload control, the malicious SIP entity would benefit since all
> other sources to its downstream neighbor would reduce load.
>
>    The solution to this problem depends on the overload control
>    method.  For rate-based and window-based overload control, it is
>    very easy for a downstream entity to monitor if the upstream
>    neighbor throttles traffic forwarded as directed.  For percentage
>    throttling this is not always obvious since the load forwarded
>    depends on the load received by the upstream neighbor.
>
> To prevent such attacks, servers should monitor client behavior to
> determine whether they are complying with overload control policies.
> If a client is not conforming to such policies, then the server should
> treat it as a non-supporting client (Section 5.10.2).
>

This part looks fine.

Thanks for the edits.  It looks like we should be able to converge pretty
quickly.

--Richard



> --- End new text
>
> Please let me know what you think on these issues and I will close
> any resolutions as expeditiously as possible.
>
> [1] http://www.ietf.org/mail-**archive/web/sip-overload/**
> current/msg00988.html<http://www.ietf.org/mail-archive/web/sip-overload/current/msg00988.html>
> [2] http://www.ietf.org/mail-**archive/web/sip-overload/**
> current/msg00307.html<http://www.ietf.org/mail-archive/web/sip-overload/current/msg00307.html>
>
> Thanks, Richard.
>
> - vijay
> --
> Vijay K. Gurbani, Bell Laboratories, Alcatel-Lucent
> 1960 Lucent Lane, Rm. 9C-533, Naperville, Illinois 60563 (USA)
> Email: vkg@{bell-labs.com,acm.org} / vijay.gurbani@alcatel-lucent.**com<vijay.gurbani@alcatel-lucent.com>
> Web: http://ect.bell-labs.com/who/**vkg/<http://ect.bell-labs.com/who/vkg/> | Calendar:
> http://goo.gl/x3Ogq
>