Re: [radext] WGLC #2 for draft-ietf-radext-dtls-04

Peter Deacon <peterd@iea-software.com> Fri, 05 April 2013 07:24 UTC

Return-Path: <peterd@iea-software.com>
X-Original-To: radext@ietfa.amsl.com
Delivered-To: radext@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F065621F96E8 for <radext@ietfa.amsl.com>; Fri, 5 Apr 2013 00:24:32 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.979
X-Spam-Level:
X-Spam-Status: No, score=-1.979 tagged_above=-999 required=5 tests=[AWL=-0.620, BAYES_00=-2.599, SARE_LWSHORTT=1.24]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PVPpCpxy0bEi for <radext@ietfa.amsl.com>; Fri, 5 Apr 2013 00:24:32 -0700 (PDT)
Received: from aspen.internal.iea-software.com (remote.iea-software.com [70.89.142.196]) by ietfa.amsl.com (Postfix) with ESMTP id D96FC21F96E6 for <radext@ietf.org>; Fri, 5 Apr 2013 00:24:31 -0700 (PDT)
Received: from SMURF (unverified [10.0.3.195]) by aspen.internal.iea-software.com (Rockliffe SMTPRA 7.0.6) with ESMTP id <B0005878224@aspen.internal.iea-software.com>; Fri, 5 Apr 2013 00:24:31 -0700
Date: Fri, 05 Apr 2013 00:24:28 -0700
From: Peter Deacon <peterd@iea-software.com>
To: Alan DeKok <aland@deployingradius.com>
In-Reply-To: <515C3604.3040406@deployingradius.com>
Message-ID: <alpine.WNT.2.00.1304042021411.3988@SMURF>
References: <1A5FDF7C-9E93-447E-A103-9700349CB2F5@gmail.com> <alpine.WNT.2.00.1304021450180.3988@SMURF> <515C3604.3040406@deployingradius.com>
User-Agent: Alpine 2.00 (WNT 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset="US-ASCII"; format="flowed"
Cc: radext@ietf.org, radext-chairs@tools.ietf.org
Subject: Re: [radext] WGLC #2 for draft-ietf-radext-dtls-04
X-BeenThere: radext@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: RADIUS EXTensions working group discussion list <radext.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/radext>, <mailto:radext-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/radext>
List-Post: <mailto:radext@ietf.org>
List-Help: <mailto:radext-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/radext>, <mailto:radext-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 05 Apr 2013 07:24:33 -0000

On Wed, 3 Apr 2013, Alan DeKok wrote:

>> 2.2.2 "We re-iterate that much of [RFC6614] applies to this document.
>>    Specifically, Section 4 and Section 6 of that document are applicable
>>    in their entirety to RADIUS/DTLS."

>> RFC 6614 section 6 dedicates a few sentences to TCP specific properties
>> which do not apply to RADIUS/DTLS.

>  Do you have suggested text for the draft?

Recommend removing "in their entirety"

>> 4. "Adding these
>>    parameters means that the client MUST start using DTLS to the server
>>    for all new requests.  The client MUST, however, accept RADIUS/UDP
>>    responses to any outstanding requests."

>> MUST does not seem appropriate.  We have no business in what client
>> elect to do with outstanding requests after a security configuration
>> change.

>  Yes, we do.  We're writing the standards here, which means we must
> address security, migration, implementation cost, etc.

What security, migration or implementation concern does this "MUST" 
address?

I can think of a couple drawbacks..

If an operator is deciding they want to improve the security of their 
system they would be *required* to accept responses with lower security 
after they have declared otherwise.

Increased client implementation complexity as a short term security 
exception has to be made for receiving non DTLS packets including protocol 
disambiguation procedure for clients not specified in draft.

>  In this case, there are no known issues with a client accepting
> responses to packets it previously sent.  The goal here is to ensure a
> safe and productive transition between RADIUS/UDP and RADIUS/DTLS.

I am looking for specific technical justifications.  What safety or 
productivity implications does the MUST requirement provide?  If it did 
not exist how would safety or productivity be negatively impacted?

>> 5. "We note that [RFC5080] Section 2.2.2 already mandates a duplicate
>>    detection cache.  The connection tracking described below can be seen
>>    as an extension of that cache, where entries contain DTLS sessions
>>    instead of RADIUS/UDP packets."

>> I think bringing this up is likely to cause more confusion than
>> necessary. Tuples and authenticator usage are different, session
>> lifecycle is different and state logic is different (You would not
>> ignore anything while a response is pending)

>  Do you have suggested text for the draft?

I would suggest as properties of the state tracking are discussed in 
detail in section 5 removing RFC 5080 paragraph.

>> I think it might be helpful to note RFC5080 in the context of continuing
>> to support this mechanism and to continue to do it at the RADIUS packet
>> layer rather than DTLS or you're likely to end up on the wrong side of
>> the DTLS sequence window.

>  I'm not sure what that means.

Normally with RFC 5080 replay system an implementation will store a 
response for a given request and simply resend a stored response (UDP 
message) on wire.

With DTLS implementations RADIUS packets must be resent thru new DTLS 
messages rather than storing a previous DTLS response and resending.

The reason for this is DTLS replay protection uses a sequence window where 
only a fixed number of previously unaccepted packets are accepted.  Retry 
timers on orders of seconds on a busy client are likely lead to 
retransmitted messages being too stale to be accepted on DTLS stack unless 
retransmissions are performed by retransmitting stored RADIUS response 
thru DTLS.


Suggest a short text:

Any duplicate detection strategy such as [RFC5080] section 2.2.2 where 
previously transmitted RADIUS packets are replayed MUST be replayed thru 
DTLS creating a new DTLS packet before transmission.  Previously 
transmitted DTLS packets MUST NOT be retransmitted.

>> 5.1 "Last Packet
>>      A variable containing a timestamp which indicates when the last
>>      valid packet was received for this connection.  Packets which are
>>      "silently discarded" MUST NOT update this variable."

>> As long as the packet was valid while being silently discarded it should
>> count for the purpose of last packet.

>  Why?  What benefit does that offer?

My concern is in minimizing situations where client accounting of idle 
timeout becomes unsynchronized with server view of same.

>> 5.1.1 - I still think we can do better on the UDP / DTLS disambiguation
>> using the 4 byte header I described earlier or by explicitly requiring a
>> server knob to declare what protocol would be accepted from a given
>> source address.

>  The draft already defines a "DTLS Required" flag.  Servers use it to
> decide which protocol is accepted from a given client.

My suggestion the mechanism to automatically migrate clients to DTLS after 
successful DTLS handshake is not necessary.

We already need manual knobs to make this work.  Perhaps simply requiring 
a knob in client and server is the best approach. This would remove 
RADIUS/UDP to RADIUS/DTLS migration and RADIUS/DTLS disambiguation. 
Explicit configuration in client AND server would be necessary to 
successfully speak RADIUS/DTLS.

>>   "A server may also use watchdog packets from the client to
>>    determine that the connection is still active."
>>
>>    "The
>>    timestamp SHOULD be updated on reception of a valid RADIUS/DTLS
>>    packet.  The timestamp MUST NOT be updated in other situations."
>>
>> The RADIUS packet layer does not see heartbeats. Should this cause a
>> change in Last Packet?

>  No.  That is for RADIUS packets, not DTLS heartbeats.

>>  Do you intent for sessions to idle out and
>> expire even with active DTLS layer keepalives?

>  Yes.  If there's no RADIUS traffic for a long time, there are few 
> reasons to keep the session up.

We see lots of NASes with only a few if any concurrent users.  It is not 
uncommon for new RADIUS traffic to be seen on orders of tens of minutes to 
hours.  Having a connection open hurts nobody and prevents delay including 
possibly additional delay to failover for users coming online.  It also 
prevents state sync problems..(see my comments below)

>>    "This session "idle timeout" SHOULD be exposed to the administrator as
>>    a configurable setting.  It SHOULD NOT be set to less than 60
>>    seconds, and SHOULD NOT be set to more than 600 seconds (10 minutes).
>>    The minimum value useful value for this timer is determined by the
>>    application-layer watchdog mechanism defined in the following
>>    section."

>> The recommended maximum idle timeout is too low in my view.  Resetting
>> connections should be as rare as possible as clients now have the added
>> burden of correctly guessing whether packets were dropped on wire or
>> dropped on DTLS stack in addition to possibility server may not be alive.

>  Did you read the text about RADIUS watchdog packets and DTLS
> heartbeats?  No "guessing" is required.

Heartbeats do not effect last packet and watchdog is for detection of 
server failure rather than communication of compatible session timeout 
parameters.

The following example explains my concern:

RADIUS/DTLS server - 60 second idle timeout.
RADIUS/DTLS client - 90 second idle timeout.

(5.2 "RADIUS/DTLS clients SHOULD pro-actively close sessions when they have been idle for a period of time")

For the sake of this example RADIUS client is connected to a NAS that 
sees only a few concurrent sessions and only sparse activity every few 
minutes...From our experience typical AP in a low traffic environment.

At 75 seconds the client sends a RADIUS request to RADIUS/DTLS server. 
Server promptly ignores this request because the session was torn down due 
to exceeding idle timeout.  The client runs thru all of its retries and 
timeouts IGNORED by the server before it either picks a different RADIUS 
server or tries to open a new RADIUS/DTLS session to the same server.

None of the active probing mechanisms work at these timescales and they 
should not be necessary to prevent this sort of problem from occurring.

TCP TLS provides reliable notification of shutdown DTLS does not.

I recommend that recommended client idle parameters be specified and not 
overlap with recommended server idle parameters.

For example something like clients may close the connection at 60-600 
seconds.  Servers may close the connection after >600 seconds.

Other thoughts to minimize this problem have client enforce an idle 
timeout algorithm based on usage but allow server idle timer to be 
refreshed by DTLS heartbeats.

The server can take other actions if there is pressure on resources to 
manage DTLS sessions but normally it is important to do EVERYTHING 
possible to make sure RADIUS/DTLS is as reliable as RADIUS/UDP with no 
delays for clients to figure out and react to rug being pulled out from 
under them.

>> There are no timing guidelines provided for transition to "idle" state.

>  Do you have suggested text for the draft?

What does the idle state actually do?  What is the difference vs using 
"idle timeout" since last packet without an "idle" transition?

>> Mismatch of idle expectations between client and server could trigger
>> unnecessary delay which could be mitigated by separating client and
>> server expectations so there is no overlap in the recommended settings.
>> Clients severely outnumber servers.

>  Do you have suggested text for the draft?

5.2...

RADIUS/DTLS clients MAY proactively close sessions when they have been 
idle for 60-86400 seconds if DTLS heartbeats or active watchdog probes are 
used.  When unused RADIUS/DTLS client SHOULD close sessions idle for 60 to 
no longer than 600 seconds.

5.1.1...

This session "idle timeout" SHOULD be exposed to the administrator as a 
configurable setting. RADIUS servers SHOULD timeout after at least 600 
seconds.

    As UDP does not guarantee delivery of messages, RADIUS/DTLS servers
    MUST also maintain a "Last Packet" timestamp per DTLS session.  The
    timestamp MUST be updated on reception of a valid RADIUS/DTLS
    packet or DTLS heartbeat.  The timestamp MUST NOT be updated in other
    situations. The server SHOULD delete idle DTLS sessions after
    an "idle timeout".


10.2

While total number of sessions tracked exceeds the configured limit 
servers SHOULD close idle sessions starting with highest idle time until a 
sufficient number of sessions have been closed or lower idle timeout 
threshold of 60 seconds or more has been reached.

regards,
Peter