[sip-overload] AD review: draft-ietf-soc-overload-control-13 (MAJOR ISSUES)

"Vijay K. Gurbani" <vkg@bell-labs.com> Fri, 09 August 2013 21:55 UTC

Message-ID: <520564B6.7040304@bell-labs.com>
Date: Fri, 09 Aug 2013 16:52:54 -0500
From: "Vijay K. Gurbani" <vkg@bell-labs.com>
Organization: Bell Laboratories, Alcatel-Lucent
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130625 Thunderbird/17.0.7
MIME-Version: 1.0
To: Richard Barnes <rlb@ipv.sx>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: "sip-overload@ietf.org" <sip-overload@ietf.org>
Subject: [sip-overload] AD review: draft-ietf-soc-overload-control-13 (MAJOR ISSUES)
Precedence: list

Richard: Thank you for a close read of the draft as part of the
AD review (c.f. [1]).  I must offer my apologies for a delay in
response, but I was on vacation when your email arrived and could not
get to attend to it until now.

I will break the response to your review in two parts.  This email
attends to the major issues.  I will generate a separate email for the
minor issues.

So, taking a look at the major issues.

> 1. The fourth paragraph of Section 4.4 does not allow clients to detect
> when overflow has happened.  It seems like there are two options: (1)
> Define the timestamp so it cannot overflow (e.g., as a sufficiently long
> timestap), or (2) define what an "appropriate base value" is so that a
> client can recognize when the "oc-seq" has been reset there.  It seems
> like it would be simplest to imply say that "oc-seq" MUST be an N-bit
> counter value or a timestamp.  (Or better, choose one.)

The client will detect overflow when it receives an "oc-seq" parameter
whose value is less than the previously seen value (because successive
"oc-seq" parameter values are defined to be in an increasing order).
When this happens there are guidelines on what the client
implementation can do (last indented paragraph of S4.4).

Regarding choosing between a timestamp or a counter, the WG prefers to
use a timestamp primarily because of re-entrancy issues in a multi-
core/thread environment when using a common variable (the counter)
[2].  However, we did not end up mandating a timestamp using rfc2119
language to allow flexibility to implementations who may want to use
something else.

The earliest we will overflow using a signed 32-bit representation
of a timestamp will be January 2038; using a signed 64-bit
representation gets us about 290 billion more years.  I
understand the need to design robust protocols, but I am not
sure what to mandate here besides perhaps exhorting implementations
to use an unsigned 32-bit integer or move to a 64-bit machine or use
a time library that will handles large time values on a 32-bit
architecture.

> 2. "oc=0" seems like a bad way to signal that there's no overload going
> on.  "oc-validity=0" does that.  And if there's no overload condition,
> on algorithm is being applied, so no "oc" parameter value needs to be
> supplied.  So it seems like you should signal support with no overload
> with "oc;oc-validity=0".

The reason "oc=0" was chosen to denote a server was not under overload
control was that we wanted to treat the "oc" parameter in the same
manner as the SIP "rport" parameter in rfc3581.  Like "rport", when a
server filled a value for "oc" it indicated support for the overload
extension.  A value of "0" for the "oc" parameter indicated that the
server supported overload control but it was not interested in using it
right now.

The problem that arised when using only "oc=0" to indicate support
for overload control was that if the rate-based algorithm was chosen to
perform overload, then the value "oc=0" was taken as indication not to
send any requests to the server.  Hence we added "oc-validity=0" as
well.

Ostensibly, as you state, a server could insert "oc;oc-validity=0" to
indicate to a client that it supports overload control.  But being
more explicit by "oc=0;oc-validity=0" does not seem too onerous.
Unless you feel strongly about this, I'd rather leave the current
text as is.

> 3. As written, it seems like Section 5.9 could be read to require a
> behavior that would exacerbate overload conditions with liveness checks.
> (For example, "periodically" with period 500ms.)  Suggest that this
> section recommend a back-off algorithm, possibly with some concrete
> timing paramters.  E.g., start at whatever interval you like,
> exponential back-off to some floor (say 10sec).

I believe this issue is a bit more complex than suggesting a exponential
backoff algorithm.  The intent is that a SIP client has a plurality of
downstream servers to choose from, and that one of these servers is not
currently responding.  Therfore, request is sent to the other (N-1)
servers in some fashion.  It is not the case that the same request is
continuously sent to the non-responding server.

Instead of mandating any specific behaviour, the current text leaves
this to implementations to determine how often they want to, and indeed,
if they want to, contact the non-responding server.  Should we mandate
more?

> 4. The Security Considerations do a good job of describing the threats
> that this mechanism introduces, but less so the mitigations to these
> threats.  In particular, there is no mitigation provided for the
> multi-hop attack, and the mitigation for a malicious client is not
> clearly stated.  Suggest:
> -- Recommending that clients enforce a maximum validity period (e.g.,
> 3600s) in order to limit the scope of spoofing attacks (off-path or
> multi-hop)
> -- "Servers SHOULD monitor client behavior to determine whether they are
> complying with overload control policies.  If a client is not
> conforming, then the server SHOULD treat it as a non-supporting client
> (Section 5.10.2)."

Thanks for the suggestions.  I re-read the section and in light of your
suggestions, I think the section may benefit from rewording it to
enunciate the security threats better.  The mitigation strategies you
outline then become more apparent.  Let me suggest the new wording of
the section, including your mitigation strategies, that will completely
replace the current S10.  Let me know what you think.

--- Begin new text

10. Security Considerations

Overload control mechanisms can be used by an attacker to conduct a
denial-of-service attack on a SIP entity if the attacker can pretend
that the SIP entity is overloaded.  When such a forged overload
indication is received by an upstream SIP client, it will stop
sending traffic to the victim.  Thus, the victim is subject to a
denial-of-service attack.

To better understand the threat model, consider the following diagram:

    Pa -------                    ------ Pb
              \                  /
    :  ------ +-------- P1 ------+------ :
              /    L1        L2  \
    :  -------                    ------ :

    -----> Downstream (requests)
    <----- Upstream (responses)

Here, requests travel downstream from the left-hand side, through Proxy
P1, towards the right-hand side, and responses travel upstream from the
right-hand side, through P1, towards the left hand side.  Proxies Pa, Pb
and P1 support overload control.  L1 and L2 are labels for the links
connecting P1 to the upstream clients and downstream servers.

If an attacker is able to modify traffic between Pa and P1 on link L1,
it can cause denial of service attack on P1 by having Pa not send
any traffic to P1.  Such an attack can proceed by the attacker modifying
the response from P1 to Pa such that Pa's Via header is changed to
indicate that all requests destined towards P1 should be dropped.
Conversely, the attacker can simply remove any "oc", "oc-validity" and
"oc-seq" markings added by P1 in a response to Pa.  In such a case, the
attacker will force P1 into overload control by denying request
quenching at Pa even though Pa is capable of performing overload
control.

Similarly, if an attacker is able to modify traffic between P1 and Pb
on link L2, it can change P1's Via header in a response from Pb to
P1 such that all subsequent requests destined towards Pb from P1 are
dropped.  A denial of service attack is thus mounted on Pb.  Note that
it is immaterial whether Pb supports overload control or not, the
attack will succeed as long as the attacker is able to control L2.
Conversely, an attacker can simply remove any "oc", "oc-validity"
and "oc-seq" markings added by Pb in a response to P1.  In such a case,
the attacker will force P1 into sending requests to Pb even under
overload conditions because P1 would not be aware aware that Pb
supports overload control.

P1 can prevent these types of attack by using TLS on links L1 and L2.

Yet another type of attack could be mounted by a malicious proxy
(say Pb in the above figure) changing Via headers not corresponding
to their immediate neighbor such that the overload control parameters
of these Via headers would cause the SIP client identified in the
Via header not to send requests to its neighbour.  Such a multi-hop
attack can be prevented ensuring that a SIP client removes "oc",
"oc-validity" and "oc-seq" parameters from all Via headers of a
response received, except for the topmost Via header.  This prevents
overload control parameters that were accidentally or maliciously
inserted into Via headers by a downstream SIP server from traveling
upstream (Section 5.4).

A malicious SIP entity could gain an advantage by pretending to
support this specification but never reducing the amount of traffic
it forwards to the downstream neighbor.  If its downstream neighbor
receives traffic from multiple sources which correctly implement
overload control, the malicious SIP entity would benefit since all
other sources to its downstream neighbor would reduce load.

    The solution to this problem depends on the overload control
    method.  For rate-based and window-based overload control, it is
    very easy for a downstream entity to monitor if the upstream
    neighbor throttles traffic forwarded as directed.  For percentage
    throttling this is not always obvious since the load forwarded
    depends on the load received by the upstream neighbor.

To prevent such attacks, servers should monitor client behavior to
determine whether they are complying with overload control policies.
If a client is not conforming to such policies, then the server should
treat it as a non-supporting client (Section 5.10.2).

--- End new text

Please let me know what you think on these issues and I will close
any resolutions as expeditiously as possible.

[1] http://www.ietf.org/mail-archive/web/sip-overload/current/msg00988.html
[2] http://www.ietf.org/mail-archive/web/sip-overload/current/msg00307.html

Thanks, Richard.

- vijay
-- 
Vijay K. Gurbani, Bell Laboratories, Alcatel-Lucent
1960 Lucent Lane, Rm. 9C-533, Naperville, Illinois 60563 (USA)
Email: vkg@{bell-labs.com,acm.org} / vijay.gurbani@alcatel-lucent.com
Web: http://ect.bell-labs.com/who/vkg/  | Calendar: http://goo.gl/x3Ogq

[sip-overload] AD review: draft-ietf-soc-overload… Vijay K. Gurbani
Re: [sip-overload] AD review: draft-ietf-soc-over… Richard Barnes
Re: [sip-overload] AD review: draft-ietf-soc-over… Vijay K. Gurbani
Re: [sip-overload] AD review: draft-ietf-soc-over… Richard Barnes