Re: [OPSEC] review for draft-ietf-opsec-ip-security-01

Andrew Yourtchenko <ayourtch@cisco.com> Wed, 16 September 2009 10:49 UTC

Return-Path: <ayourtch@cisco.com>
X-Original-To: opsec@core3.amsl.com
Delivered-To: opsec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 8E7793A69E6 for <opsec@core3.amsl.com>; Wed, 16 Sep 2009 03:49:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.675
X-Spam-Level:
X-Spam-Status: No, score=-1.675 tagged_above=-999 required=5 tests=[AWL=-0.564, BAYES_05=-1.11]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id MWnzPzYxhtKK for <opsec@core3.amsl.com>; Wed, 16 Sep 2009 03:49:50 -0700 (PDT)
Received: from av-tac-bru.cisco.com (weird-brew.cisco.com [144.254.15.118]) by core3.amsl.com (Postfix) with ESMTP id 4DFCA3A696E for <opsec@ietf.org>; Wed, 16 Sep 2009 03:49:50 -0700 (PDT)
X-TACSUNS: Virus Scanned
Received: from strange-brew.cisco.com (localhost.cisco.com [127.0.0.1]) by av-tac-bru.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id n8GAh5PO018863; Wed, 16 Sep 2009 12:43:05 +0200 (CEST)
Received: from kk-son (dhcp-peg3-vl30-144-254-7-191.cisco.com [144.254.7.191]) by strange-brew.cisco.com (8.13.8+Sun/8.13.8) with ESMTP id n8GAh4ic018437; Wed, 16 Sep 2009 12:43:05 +0200 (CEST)
Date: Wed, 16 Sep 2009 12:43:16 +0200
From: Andrew Yourtchenko <ayourtch@cisco.com>
X-X-Sender: ayourtch@zippy.stdio.be
To: Fernando Gont <fernando@gont.com.ar>
In-Reply-To: <4A988FE4.8090909@gont.com.ar>
Message-ID: <Pine.LNX.4.64.0909111815530.5203@zippy.stdio.be>
References: <20090129201501.C10D83A687E@core3.amsl.com> <Pine.LNX.4.64.0902131108350.5865@zippy.stdio.be> <4A8E6144.6040700@gont.com.ar> <Pine.LNX.4.64.0908211326190.5148@zippy.stdio.be> <4A988FE4.8090909@gont.com.ar>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"; format="flowed"
Cc: opsec@ietf.org
Subject: Re: [OPSEC] review for draft-ietf-opsec-ip-security-01
X-BeenThere: opsec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: ayourtch@cisco.com
List-Id: opsec wg mailing list <opsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/opsec>, <mailto:opsec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/opsec>
List-Post: <mailto:opsec@ietf.org>
List-Help: <mailto:opsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/opsec>, <mailto:opsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 16 Sep 2009 10:49:52 -0000

Hi Fernando,

sorry for delay with the reply...
inline.

On Fri, 28 Aug 2009, Fernando Gont wrote:

> Hello, Andrew,
>
>>>> "3.3. TOS ..." -> RFC1349 is obsoleted by the RFC2474 (DSCP), so this
>>>> chapter needs a rewrite.
> [....]
>> If I understand the relation of the RFCs correctly, we should be
>> discussing the semantics of this byte in terms of RFC2474, not RFC1349.
>> However, from the security point of view, since we do not know how a
>> particular end node interprets it, it may be even a good idea to look at
>> the relationship between both of the semantics.
>>
>> The current text implies the ToS/Precedence is the current standards
>> definition - which is AFAIK not the case.
>
> Agreed. I will swap the DSCP and TOS sections... and will think a little
> bit about the relationship of these two semantics.
>

Thanks! And maybe somehow crossreference them, if it's not possible to put 
them closer ?

>
>
>>>> 9kb limit - the document should have some specific references to the
>>>> stacks that have it. Also, probably the reference to the RFC1122 (at
>>>> least for the terminological definition of the EMTU_R?)
>>>
>>> Rather than stating that there are stacks that *have* this limitation,
>>> the I-D states that virtually all stacks *can* reassemble datagrams of
>>> at least 9KB.
>>
>> Re-reading it, it looks ok - it sounds more like a back-up for the
>> reasoning for a given stack to be prepared to receive the datagrams of
>> 9K that were fragmented by the other side, so it's ok.
>
> Good. No changes here, then.
>
>
>
>>>> [Page 11]:
>>>>
>>>> "3.5.2. Possible security improvements"
>>>>
>>>> TODO - discussion of a quality of IP ID of being able to provide a
>>>> "fingerprint" of the remote host.
>>>
>>> You mean you could fingerprint the host by "figuring out" the algorithm
>>> used to set the IP ID, or what?
>>
>> I used this technique several times to understand whether the particular
>> packet was sent by a host, or by someone sending a packet on its behalf.
>
> Isn't this already addressed by the "count number of systems behind a
> NAT" and "fingerprinting physical devices" stuff?
>

yeah, though for the everyday diagnostics, it's a bit too much math in 
those. May be good to say that for diagnostic purposes it should be 
possible to revert the behaviour to the "sequential" IP ID allocation, but 
that's not anything strong from my side - after all, it is more of a 
parlor trick to avoid looking at the traffic at two points at once.

>
>>>> [Page 24]:
>>>>
>>>> "Enforce a limit on the maximum number of options" - to me this looks a
>>>> bit of a dangerous recommendation in this form, as it would cause
>>>> arbitrary hard limits set by middle devices, resulting in creation of
>>>> "IP Option MTU". (Not that anyone is adding a lot of IP options anyway
>>>> these days, so it is more a theoretical nitpick :-)
>>>
>>> Well, just pick a very large limit that will never limit anything, and
>>> that's it.
>>
>> "never limit anything" for a particular environment that I see that I
>> pick for. And if you send me the options you can never be sure whether I
>> am going to process them or I am going to persistently drop them.
>
> Well, this limit is not meant to knock down legitimate uses. e.g., do
> you see any real traffic with more than four IP options?

Realistically, I don't think IP options are a huge innovation vehicle, 
because of the performance hit they impose in a lot of implementations.

Having the limit implies the overhead of processing the options, whereas 
rate-limiting the processing of those packets cuts them off earlier. You 
don't have to first do all the work and then then say "oops, too much".

>
> That aside, this may already be the case with packet scrubbers deployed
> on the path to destination...
>

With the packet scrubbers, all bets are off. I'm talking the conventional 
routers.

>
>
>> Rate-limiting alone should be fine - as it allows the options still to
>> be processed when not under load, whereas the upper bound puts a hard-stop.
>
> You mean you'd rate limit the processing of packets that contain options?
> I'm not necessarily arguing against this, but this would sort of open
> the door to a new attack: just send packets with options to DoS any
> traffic that carries options. Whereas imposing a limit on the number of
> options would hurt the traffic that's using a "surprisingly" large
> number of options.
>
> I might argue that one might want to do both.
>

see above - the figuring out the number of options is in itself work. 
Though, theoretically, might also be performed in the accelerated path.

Basically it all boils down to how the implementation processes the 
options in the first place - if any option processing is as efficient as 
the traffic without the options, there is no reason either to rate limit 
or limit the number of options. You do need to rate limit the flow to the 
components that have limited computing power.

>
>
>>>> [Page 31]:
>>>>
>>>> LSRR, SSRR and RR options - can their restrictions be combined as much
>>>> as possible ? To me they look largely similar, and the repetition is a
>>>> cause for potential mistakes, IMHO.
>>>
>>> I agree that repetition is a potential source for mistakes, and that
>>> conceptually speaking, they *could* be combined. However, I think it's
>>> useful to be able to read whatever you need to know about each option in
>>> a single section. And that of having "repeated" stuff allows me to use
>>> "descriptive "acronyms" for each option ("LSRR.length" vs.
>>> "SSRR.length", etc.). Please let me know if you feel strongly about this
>>> change.
>>>
>>
>> Ok, let's see. Copypasting from the draft.
>>
>> valid LSRR:
>>   LSRR.Length >= 3
>>   LSRR.Offset + LSRR.Length < IHL *4
>>   LSRR.Pointer >= 4
>>   LSRR.Pointer % 4 == 0
>> empty LSRR:
>>   LSRR.Pointer > LSRR.Length
>> can write LSRR:
>>   LSRR.Length - LSRR.Pointer >= 3
>>
>> valid SSRR:
>>   SSRR.Length >= 3
>>   SSRR.Offset + SSRR.Length < IHL *4
>>   SSRR.Pointer >= 4
>>   SSRR.Pointer % 4 == 0
>> empty SSRR:
>>   SSRR.Pointer > SSRR.Length
>> can write SSRR:
>>   SSRR.Length - SSRR.Pointer >=3
>>
>> valid RR:
>>   RR.Length >= 3
>>   RR.Offset + RR_Length < IHL *4
>>   RR.Pointer >= 3       <------------- wrong ?
>
> Yes, this is a bug. It should be:
> RR.Pointer >= 4
>
>
>
>>   RR.Pointer % 4 == 0
>> full RR:
>>   RR.Pointer > RR.Length
>> additional RR validity check:
>>   RR.Pointer - RR.Length >= 3   <---- valid if Pointer is bigger than
>> length + 3 ?
>
> Good grief! It seems I screw up when I converted the nice PDF into the
> IETF I-D.
>
> It should be RR.Length - RR.Pointer >=3
>
> It means that there is room to store a 4-byte IP address. (the check
> looks ugly because RR.Pointer starts with "1" ("1" when it points to the
> beginning of the option, and 4 when it points to the beginning of the
> option area)
>

from what I read in the RFC791, the smallest possible value for the 
pointer is 4, quote from page 17:

        "The option begins with the option type code.  The second octet
         is the option length which includes the option type code and the
         length octet, the pointer octet, and length-3 octets of route
         data.  The third octet is the pointer into the route data
         indicating the octet which begins the next source address to be
         processed.  The pointer is relative to this option, and the
         smallest legal value for the pointer is 4."



>
>
>
>> The RR ones that I marked with the arrow are interesting:
>>
>> "The pointer field is relative to this option, with the minimum legal
>> value being 4.  Therefore, the following check should be performed:
>>
>> RR.Pointer >= 3
>> "
>>
>> -----> So this looks like a bug, the value should be 4, not 3 ?
>
> Yes.
>
>
>
>> and then call this from the option processing where needed.
>> Currently the reader has to first spot that the checks are pretty much
>> the same, then write them down close by to verify this = extra cognitive
>> load for the reader. And, appears, for the writer as well :-)
>>
>> So, yes, I'm rather strongly in favour of joining whatever can be joined.
>
> Ok. how do you propose to do this? For LSRR and SSRR, this would be
> easy:  just document the checks for one of them, and include a pointer
> to those checks in the other option.
> For the RR option, one could say "implement the same checks, but do not
> enforce the SSRR.Length - SSRR.Pointer >= 3 check". But that would look
> ugly.
>
> (and, if you had to code these, you'd need to indicate whether its a
> XSRR option or a RR option, because of this last check that applies only
> to XSRR options).
>
> Thoughts?
>

As per the equations I wrote above, there is a higher-order operation that 
is common and can be abstracted - i.e. the checks look pretty much the 
same for the most part, and only tiny pieces differ. So, describe the 
limitations, and enclose the higher-order check algorithm, and how it is 
invoked for each of the options.

>
>
>>>> [Page 42]:
>>>>
>>>> Man in the middle threat mention for DSCP: is this the only field for
>>>> which the MITM attacks are a concern ?
> [....]
>> http://tools.ietf.org/html/draft-ietf-opsec-ip-security-00:
>>
>> "   However, if this moderate congestion turned into heavy congestion,
>>    routers should switch to drop packets, regardless of whether the
>>    packets have the ECT codepoint set or not.
>>
>>    A number of other threats could arise if an attacker was a man in the
>>    middle (i.e., was in the middle of the path the packets travel to get
>>    to the destination host).  For a detailed discussion of those cases,
>>    we urge the reader to consult Section 16 of RFC 3168.
>> "
>>     ^------
>>
>> ECN/DSCP and ToS (if the ToS is there at all) should definitely sit
>> somewhere together IMHO.
>
> The MITM attack is mentioned only for ECN because MITM attacks *are*
> mentioned in the ECN spec. I'm more of the idea to delete MITM
> discussions, rather than adding more of it.
>
> e.g., we'd have to add stuff on the IP ID discussion (if there's a MITM,
> an attacker could spoof packets with the same IP ID, and cause
> reassembly to fail), an attacker could rewrite the contents of the IP
> source route options, etc. I'd personally avoid getting into this. :-)
> Thoughts?

if there's MITM all bets are off with IP, imho :-) Similar to vanilla TCP.

>
>
>
>>>> [Page 51]:
>>>>
>>>> The sequence# check: should here be a reference to the section 3.4.3 of
>>>> rfc2406 as a pointer to what constitutes the "valid sequence#" ?
>>>
>>> Would just a pointer (reference) address your comment?
>>
>> yes, reference is ideal.
>
> Good. Will do.
>
>
>
>>>> Step three: should there be more specifics about some randomization for
>>>> the process of dropping ?
>>>
>>> Hints? (Argue something like "it's recommended that fragments are
>>> flushed on a random basis" or the like?)
>>
>> Probably. One question about this algorithm in general - is this
>> something that works and tested in real life with code, or it is
>> something we think may work if implemented ? If the former, there should
>> be references to the implementation, if the latter - maybe this deserves
>> a separate doc, and a a good scrutiny of IPSEC people ?
>
> The separation of reassembly into separate protocols has not been
> implemented, AFAIK. So it would be "the latter". However, this document
> *has* been reviewed by Ran Atkinson (author of RFC 2401). Further thoughts?
>

well, if the IPSEC folks don't have any comments - up to them. I still 
think that putting the big lock on the door (AH/ESP) and leaving the 
window (UDP/4500) wide open, is more of a security theatre than a real 
protection. Combined with bugs which will inevitably result from the 
separate codepaths (read: $$$ for development and testing), I think
the collateral damages are worse than the possible wins.

>
>>>> Also - this algorithm omits the frequently used NAT-T (essentially IPSEC
>>>> over UDP/4500)
>>>
>>> Could this really be incorporated in the algorithm?
>>
>> That's what I wonder. :-)
>
> I would argue against this one. You cannot separate this traffic into a
> different queue, because you might receive a non-first fragment first.
> And, IIRC, linux sends the last fragment *first*.
>

right.

>
>
>> I really think now that this piece should be
>> looked at by the IPSEC people, even if to say "We don't need this
>> because we never fragment".
>>
>> Realistically, that's pretty much the case, because if you need to
>> reassemble the fragments before decryption, you're playing in a much
>> lower league performance-wise. So, this text might be redundant altogether.
>
> Well, the bottom-line of this document is "you should not rely on
> fragmentation.... but if you do, here's how you could improve the
> security of the process".
>

As I said - I think the penalty that it brings in terms of additional 
costs weights more than the positive sides of it.

>
>
>>>> [Page 52]:
>>>>
>>>> "virtually impossible" - I'd replace this with "harder" :) And, as the
>>>> MITM is mentioned for DSCP, I'd not make the task easier for IPSEC
>>>> specifically :) the algorighm which is supposed to protect IPSEC should
>>>> be able to resist the on-path attacker.
>>>
>>> Well, I don't think you could protect IPsec traffic that relies on
>>> fragmentation. If a have access to the IPsec fragments, I could simply
>>> forge fragments that would lead to a failure in the reassembly process.
>>
>> Ok. So I have a growing conclusion that then the more complex reassembly
>> algorithm would be useless then. Either use the system-standard one, or
>> don't fragment, period. Less bugs :)
>
> One might argue that you could allow IPsec traffic from a single host
> (by means of a fw) and allow non-IPsec traffic from anywhere else. In
> this case, the algorithm might be of help.
>

So this implies a wide change in the fragment handling for a very specific 
scenario. And in that case one might apply other situation-specific tweaks 
- if you know your peer and you know its behaviour, most of the times you 
can tune it to avoid fragmentation whatsoever.

>
>
>
>>>> [Page 53]:
>>>>
>>>> "Additionally, given that many middle-boxes such as firewalls create
>>>>    state according to the contents of the first fragment of a given
>>>>    packet, it is best that, in the event an end-system receives
>>>>    overlapping fragments, it honors the information contained in the
>>>>    fragment that was received first."
>>>>
>>>> received first if there was a box behind the middlebox that has
>>>> reordered the packets afterwards, or if there was no such box ? :-)
>>>
>>> Not sure what you mean.
>>>
>>
>> I meant that I do not really see the logic here. You have a middlebox
>> that forwards the "fragmentA", then "fragmentB". On the way to you they
>> get reordered. You keep the "fragment B". But according to your
>> statement the firewall keeps the state according to "fragmentA". So
>> we're doing exactly the opposite of what you intended to say ?
>
> If a host receives fragmentA (first) and fragmentB (second) and they
> overlap, they should use the contents of fragmentA.
>
> (the referenced RFCs should how you could bypass a firewall if the host
> uses fragmentB instead of fragmentA).
>
> Does this make sense now?
>

Yes.

>
>
>
>>>> If the two received fragments contain conflicting information, we do not
>>>> have enough info to discern which of the two is "correct", IMHO. So, we
>>>> should mark the corresponding packet as "bad",
>>>
>>> Isn't this more aggressive than what the existing text recommends?
>>
>> to an extent. But it's also more predictable - you get packet lost, not
>> corrupted. Else if you reassemble, either the upper layer detects the
>> error (so we still have the same problem, but at an upper layer), or it
>> does not - at which point we are unsure whether there is no error
>> because the upper layer check is weak, or because we actually
>> reconstructed the original packet..
>>
>> OTOH with the "drop if suspicious" idea, there's an easy DoS vector...
>> You're right. Scratch this.
>
> So... should keep the next "as is"?
>

yup. Maybe just include some discussion that if we have two fragments, 
in the absence of a stronger error check for the contents of the entire 
PDU that those fragments belong to, it is impossible to say which one is 
"good" and which one isn't.

>
>
>>>> treat the reassembly as
>>>> usual, and then drop it with an auditable event. (not drop immediately
>>>> in order to avoid the DoS where colliding fragments would be dropped and
>>>> cause the accumulation of the remaining fragments till the max
>>>> reassembly timeout occurs).
>>>
>>> I agree with this "remaining fragments thing". However, obvious
>>> question: how long should we wait? This might close one door of attack,
>>> but open another one....
>>
>> same as for regular reassembly. The fact that you have two colliding
>> fragments, does not matter - you could as well receive two spoofed
>> fragments for different IP IDs, and there you can't be sure if they are
>> spoofed or not. So I'd treat these cases functionally the same.
>
> Thinking out loud: One might argue that if there are two many incomplete
> fragments in the queue, one should flush them all (or at least quite a
> few of them) -- if not, an attacker could simply trash the whole IP ID
> space. Then legitimate fragmented traffic would complete the reassembly
> processs with the spoofed fragments, thus leaving a (later arriving)
> legitimate fragment in the fragment queue, which would keep the IP-ID
> space trashed.
>
> Thoughts?
>
>

Again, because there is no way to determine a "legit" vs. "non-legit" 
fragments, it is technically impossible to tell an attack from just a 
higher rate of fragments that arrive, due to some other anomaly. If it was 
a network anomaly as opposed to malicious activity, then we end up 
throwing away the packets that would have been correctly reassembled.
(Read: applications people will get unhappy :-)


>
>
>>>> [Page 54]:
>>>>
>>>> sending with high precedence values: ingress filtering ?
>>>
>>> ingress-filtering based on the IP addresses, you mean?
>>
>> Yes, I mean that this is not the business of the host to deal with this
>> - especially that there's no real recommendation: "yeah. They say you
>> should not drop this. but bad stuff is easy to do. So, be bear aware!"
>
> I will add a note on ingress filtering here.
>
>
>
>
>>>> (Also, with the DSCP/ToS duality, worth doublechecking on how does it
>>>> translate to real forwarding devices?)
>>>
>>> mmm... not sure what you mean.
>>
>> i.e. how much of a problem it really is, and whether it is something
>> that was shown practically possible, or is more a theory.
>
> Are you referring to TOS (specifically), or TOS+Precedence?
>

either/both.

>
>
>>>> [Page 55]:
>>>>
>>>> Address resolution: the passage about storing the packets for a long
>>>> time on the router - isn't it something that should be directly
>>>> discouraged, precisely because of its big impact ?
>>>
>>> What specific behavior would you recommend?
>>
>> This is not about recommending, but rather about the practical
>> behaviour. AFAIK, the router would in practice throw such packets down
>> the drain. Or there exist a real-world routing device that actually
>> holds all the packets for an unresolved L2 adjacency ?
>
> A unix-like box operating as a router would do this, IIRC. Nevertheless,
> there are too many devices around. And some might be doing this -- hence
> the discussion. (it *is* noted that it's a usual implementation approach
> that the device drop the packet that elicited the address resolution
> before engaging in arp).

Ok, let's leave this one as is.

thanks,
andrew