Re: [bmwg] IPsec drafts

Further discussion embedded....

On Mar 12, 2008, at 7:44 AM, Yaron Sheffer wrote:

>
>>> - IPsec "gateways" are mentioned multiple times but are not defined.
>>> They are another synonym to "IPsec server".
>>
>> We assumed that this would be a commonly known term as defined by the
>> IPsec standard.  Specifically,
>> from RFC 4301 in section 3. System Overview:  "An IPsec  
>> implementation
>> operates in a host, as a security gateway
>> (SG), or as an independent device, affording protection to IP  
>> traffic.
>> (A security gateway is an intermediate system implementing
>> IPsec, e.g., a firewall or router that has been IPsec-enabled.)"
> This is not a very good answer since you do define things like ISAKMP,
> IKE, IPsec, SA etc. :-) And I think the definition is material to
> understanding the methodology.

I see your point.   I propose that we change the definition of IPsec  
Server to IPsec Gateway
and in the issues section state " IPsec Gateways are also sometimes  
referred to as 'IPsec Servers'
or 'VPN Concentrators' .

>>
>>> - I would expect the "security context" to indicate NAT  
>>> traversal, since
>>> it has performance implications.
>>
>> I agree that this should be added.  I think it would fit best under
>> the IKE context at the end of the section and would be an additional
>> MUST.
>> Will this be acceptable?
>>
> Yes. By the way, doesn't the detailed definition of the Context belong
> in the normative Methodology draft?

No, terminology doc is where terms get defined.....I'd prefer to keep  
in the terminology document to clarify exactly what the collection of  
security parameters are.

>>> - IPsec Tunnel Capacity: it is implicit here that each IPsec SA is
>>> associated with exactly 1 IKE SA. Please make it explicit.
>>
>> I agree.  So is it sufficient to change the discussion section to:
>>
>> "  This metric will represent the quantity of IPsec Tunnels
>>       that can be established on an IPsec Device that can forward  
>> traffic
>>       i.e.  Active Tunnels.  It will be a measure that indicates how
>>       many remote peers an IPsec Device can establish a secure
>>       connection with.  For IPsec Tunnel Capacity, each IPsec SA is
>> associated
>>       with exactly 1 IKE SA. "
>>
> Sounds good.
>>> - IPsec throughput: the definition is unclear. The units are pps,  
>>> but
>>> then there is reference to the packet size.
>>
>> I don't understand what is unclear?!?  What would you change?
>>
> The text is a bit obscure, in that you first describe two measurement
> options and then say the 2nd one is the "right" one. But more
> importantly, when you say "This resulting rate can be recalculated  
> with
> an encrypted framesize to represent the encryption throughput rate"  
> this
> seems to mean that the rate somehow depends on the framesize, which is
> incorrect - you only measure PPS. Note that RFC 1242 actually mentions
> *both* PPS and BPS as the measures of throughput - which again is  
> rather
> confusing.

Ahh...I see.  OK.   The second paragraph can just be deleted I think  
and the definition
will be sufficient.  Do you agree?

Re pps vs bps....I compared with what we have in methodology....in  
section 9.1 (throughput baseline) reporting
we simply refer to rate [which would be pps according to the  
terminology] although we do state that in a single value is desired  
to be used then pps MUST be used and bps MAY be used.   I think  
keeping pps for the definition is OK since that is the preferred  
reporting rate.  (I also checked rfc1242 again and there is no MUST  
anywhere between pps and bps....we felt it was best to pick one to  
avoid confusion)

>>
>>> - 10.8.1: there are DOS resistance solutions where the device  
>>> *never*
>>> stops accepting all valid attempts, it just accepts fewer and  
>>> fewer. I
>>> would suggest to qualify the measure, e.g. "the rate of invalid or
>>> malicious IKE tunnels that can be directed at a DUT before the  
>>> Responder
>>> ignores or rejects more than 90% of valid tunnel attempts".
>>
>> There is no rate.  We are testing how many failed attempts are
>> possible while still being able to establish
>> the valid IKE message.  Need to think on this a bit......your
>> suggestion would change the nature of this test.
>>
> My point here (and elsewhere) is that DOS protection is typically not
> all-or-nothing. You can measure the rate where the DUT *starts*  
> failing
> valid requests, but it's not very interesting.

Why is it not interesting?  For me as an end user it would give an  
indication of
where the device is starting to have issues while nothing else was  
going on with
box.  I can plan for capacity this way with a rough idea of what can  
realistically
be deployed.

But let's make sure this test is valid and we come to agreement.   
When you say that
there are DoS resistance solutions where device 'never' stops  
accepting all valid attempts, are
you referring to rate limiting?

The only other test I can think of that would be useful is if we  
extended it so that we get a graph
of what the rate of IKE tunnel attempts are compared to % of  
failure.   So we keep repeating the test
and get a graph of where we have 1% - 100% failure of valid IKE Phase  
1 attempts. (increments to be determined)

I'd love more input on this.....

>>
>>> - 10.8.2: why is this called "Phase 2"? It is simply DOS  
>>> resistance at
>>> the ESP/AH level. It could even apply to manually configured ESP/ 
>>> AH. In
>>> addition, the measure's definition is far from precise.
>>
>> You are right that it should not be Phase 2.  However, what would you
>> propose to make definition more precise?
> Again, you imply in the discussion that once the first packet is lost,
> the DOS attack is deemed successful. Moreover, you don't mention  
> whether
> this test should be done at the top throughput of the device.

In the methodology document we do mention that " The aggregate rate  
of both microflows MUST be equal to the IPsec Throughput and should  
therefore be able to pass the DUT."

> I suggest to change the definition to: The ESP/AH Hash Mismatch Denial
> of Service (DoS) Resilience Rate quantifies the rate of invalid ESP/AH
> packets that a DUT can drop without affecting the traffic flow of  
> valid
> ESP/AH packets.

Agree

> "Affecting the flow" is defined as dropping more than 1%
> of valid packets. The test report should include (1) the total
> throughput of traffic, which should be as close as possible to the  
> DUT's
> IPsec throughput, and (2) the rate of invalid packets out of the  
> total.

OK....let Tim and I reword both the term and meth part of this test  
and we'll send as separate email to ensure
we have concensus on this test.  (maybe others will chime in as well :))

>>>
>>> Methodology draft
>>> - NAT traversal is mentioned, but there is no requirement to  
>>> actually
>>> measure it.
>>
>> Not everyone necessarily supports NAT traversal.  It was an extension
>> for IKEv1 so not really a part of the protocol.  Also note that for
>> IKEv2 the NAT Traversal support is optional.  (and there will be a 00
>> rev of IKEv2 draft soon - there is a new IKEv2 clarification doc so
>> I'm trying to be thorough and get nuances between IKEv1 and IKEv2  
>> that
>> are relevant to performance)
> Formally you are right, but NAT traversal is widely supported and of
> course is essential for remote access VPNs. So can we have an optional
> test to cover it?

NAT-T is relevant for all tests dealing with IKE.  What I propose is  
that under section 7.6.7 we list
what is needed to perform NAT-T tests:  DUT must have NAT capability  
and that traffic selectors for test
traffic need to be addressed that are different from the actual IP  
addresses used to create the SAs.

I would hope you are not proposing adding separate NAT boxes since  
that would create an unknown
factor for affecting performance.  If you are thinking that separate  
NAT boxes need to be added to test then
how do we ensure we are testing IPsec performance using NAT-T?   
Testing using only NAT and then subtracting
that time from the IPsec-related tests?  If so....I'm starting to  
wonder if NAT-T could not be a separate document on its own?  Of  
course someone would need to be willing to author....

>>
>>
>>> - Why is support of AH a MUST?
>>
>> Well.....this is something we actually want input on.   While the
>> IPsec docs say AH MAY be supported, there is in reality little to no
>> testing done with AH.  Some folks in v6 community are trying to bring
>> back AH.  And RFC4305 (cryptographic algorithms for ESP and AH)
>> specifies algorithms for both protocols.
>>
>> Do you think it would be better to say ' If AH is supported by the
>> DUT/SUT  testing of AH Transforms 1 and 2 MUST be supported.' ?  We
>> don't say anywhere else that you MUST test AH, just that in any test
>> you must have either an AH an/or ESP transform.
> I like the new wording much better.
>>
>>
>>> - Why is ESP Transport mode a MUST? It is inapplicable to IPsec
>>> gateways.
>>
>> An IPsec gateway can act as an IPsec end node.  To make it easier for
>> everyone here's the text from section 3.3 of RFC4301:
>>
>>  " A host implementation of IPsec may appear in devices that might  
>> not
>>    be viewed as "hosts".  For example, a router might employ IPsec to
>>    protect routing protocols (e.g., BGP) and management functions  
>> (e.g.,
>>    Telnet), without affecting subscriber traffic traversing the  
>> router.
>>    A security gateway might employ separate IPsec implementations to
>>    protect its management traffic and subscriber traffic.  The
>>    architecture described in this document is very flexible.  For
>>    example, a computer with a full-featured, compliant, native OS  
>> IPsec
>>    implementation should be capable of being configured to protect
>>    resident (host) applications and to provide security gateway
>>    protection for traffic traversing the computer. "
> This is true. But even if it's implemented, it's not a core function.
> And the text you're quoting doesn't mandate it. So let's make it  
> optional.

I disagree.  Mainly because IPv6 will use more transport mode and  
while there are
factions that are trying to avoid using IPsec for IPv6 (they want to  
now make it a MAY
implement in IPv6 node requirements doc rather than a MUST), I feel  
very strongly that
since transport mode is part of the standard, it should be tested.

I am hoping others will chime in here.  I will bow to concensus but  
transport mode testing
in gateways is something I feel strongly about.

>>
>>> - The purpose of Table 3 is unclear, given that results are reported
>>> separately for ESP and AH.
>>
>> This is a scenario for nested tunnels which may be used (and are used
>> in some environments in practice).  Note that
>> in v6 it is less likely to use NAT traversal and therefore some folks
>> are trying to argue for using AH.  Note that this is
>> just a RECOMMENDED test but we as authors felt reasonable to mimick
>> what folks in real world are doing.  if your
>> experience is different, please let's discuss.
> We are seeing little use for this kind of tunnels. But if it's  
> optional,
> fine.
>>
>>> - 9.1: why does the procedure for the baseline test refer to  
>>> IPsec SAs,
>>> where the test is only of cleartext traffic?
>>
>> The IPsec SA selectors refers to IP addresses and port
>> numbers.......so want to ensure same type of traffic is sent for  
>> baseline
>> as for tests utilizing IPsec.
>>
> Makes sense. Could you add this clarification to the document?

Yes...will do.

>>
>>> - 9.1: "advertising copy" and "product datasheet" are clearly out of
>>> scope. Vendors are free to publish results in whatever way they  
>>> choose,
>>> including shouting them down a well.
>>
>> I'd like to hear other comments on this.  Is there anything wrong  
>> with
>> how the reporting format is written?  The intent was to
>> make sure reports from different vendors and/or testing environments
>> can accurately be compared.
>>
> I would just say, you have two reporting options and here they are  
> (and
> maybe that one is preferred). Realistically, the IETF cannot mandate
> publication of anything.

True about IETF not being to mandate..... I'll wait to see if I hear  
from others on this
point.  Rewording to get rid of 'advertising copy' and 'product  
datasheet' is probably a
good idea regardless.

>>
>>> - 11.2: measuring IPsec frame loss implies that we allow frame  
>>> loss. So
>>> what percentage loss do we allow in e.g. throughput tests?
>>
>> Huh?  The throughput is the fastest rate at which the count of test
>> frames transmitted by the DUT is equal to the number of test
>> frames sent to it by the test equipment.  This is not necessarily the
>> maximum rate for the frame size on the input media.  Should be for
>> wire rate performance but may not be true for all frame sizes :)
>>
>> Or am I misunderstanding this question?  The procedure looks fine  
>> to me.
>>
> Now I understand better but I still have an issue: a software IPsec
> device might be running on a box with 10Gb interfaces, but still be
> specified for lower throughput. I see no reason to measure packet loss
> for a level of throughput that you don't support. I suppose the  
> wording
> can be changed to mention "nominal device throughput" instead of  
> "frame
> rate on input media".

Makes sense...will look to reword as appropriate and send to list.....

>
>>
>>> - 11.2: assuming DUTa and DUTb are identical, this measures some
>>> function of a single device's loss rate, maybe simply twice the  
>>> rate.
>>
>> Can you elaborate?   What is missing in procedure or reporting format
>> from your perspective?
>>
> What I'm saying is that you cannot calculate packet loss on *one*  
> device
> from the packet loss measured on a chain of two identical devices,  
> both
> of which may be dropping packets. On a related theme, I would  
> appreciate
> a paragraph added near Figure 2 on whether and when you can replace  
> DUTb
> by a more powerful device (where appropriate of course), which  
> seems to
> be a common practice. E.g. you encrypt on a small home router and
> decrypt on a large enterprise router, and expect all the  
> bottlenecks to
> come from the smaller device.

OK...I get where you're coming from.  We want to introduce as little  
variables as possible
and so both DUTa and DUTb need to be same box and the test looks at  
them as a system.
Only if you have an IPsec aware tester can you reasonably look at a  
single device (although
you have to take into account that the IPsec aware tester can  
introduce variables).

Of course you aggregate on a bigger box but I don't see why this test  
needs to change?

I expect some vendors will definitely run this test and in marketing  
blurbs half the number.....but
what I would hope is that there are numbers for the baseline and then  
numbers utilizing IPsec which
tell you how the DUT acts as a system.

>>
>>> - 11.5: this test only generates 2-3 rekey events if using a single
>>> (IKE) SA, why not define it with a lower SA lifetime or a higher  
>>> number
>>> of SAs?
>>
>> The intent was to only have a few rekey events during the testing.
>> Why would you want more?
>> Is there a test scenario that you want that we are not covering?
> I just expect packet loss to be typically zero with such a low rekey
> rate. But maybe I'm being optimistic...

I will leave as is unless I hear different.  However, I will also ask  
some IPsec
implementors for comment off-line. (I'll forward replies)

>>
>>
>>> - 15.1: as noted for the terminology draft, this DOS test will  
>>> not work
>>> well with randomized anti-DOS mechanisms.
>>
>> So is it enough to have the statement in the terminology doc or did
>> you want something added in this section?
>>
> I suggest to also have it here.

OK....

>>
>>> - 15.2: this test may completely fail if the maximum throughput is
>>> unstable, in the sense that *any* extra CPU activity would cause  
>>> packet
>>> loss. Maybe it should be tried first at 90% of max throughput.
>>
>> I think it will be important in real life to see what happens at
>> throughput rates.  However, if a device completely
>> fails then the test MAY also be repeated at various Tunnel  
>> scalability
>> data points.   Do you think we need to make the
>> scalability points more an exact statement (i.e. 90%)?   Although  
>> this
>> makes me think we also need to change
>> the reporting format to:
>>
>> " The results shall be listed as PPS (of invalid traffic) . The
>> Security Context Parameters defined in Section 7.6
>>   and utilized for this test MUST be included in any statement of
>> performance.  The aggregate rate of both microflows
>>   which act as the offered testing load MUST also be reported."
>>
> The new text is fine, also please make scalability points (e.g. 90%)
> explicit.

OK....

- merike
_______________________________________________
bmwg mailing list
bmwg@ietf.org
https://www.ietf.org/mailman/listinfo/bmwg

Re: [bmwg] IPsec drafts - comments