Re: [dhcwg] WG Adoption call for draft-gandhewar-dhc-relay-initiated-release and draft-gandhewar-dhc-v6-relay-initiated-release (Expires Oct 27, 2015)

Dan Seibel <Dan.Seibel@TELUS.COM> Fri, 16 October 2015 22:41 UTC

Return-Path: <Dan.Seibel@telus.com>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 965EB1A6FDB for <dhcwg@ietfa.amsl.com>; Fri, 16 Oct 2015 15:41:59 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.311
X-Spam-Level:
X-Spam-Status: No, score=-4.311 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PXMj6cOQ5xVc for <dhcwg@ietfa.amsl.com>; Fri, 16 Oct 2015 15:41:56 -0700 (PDT)
Received: from orkaan.nssi.telus.com (orkaan.nssi.telus.com [208.38.59.78]) by ietfa.amsl.com (Postfix) with ESMTP id 6DB911A6FD6 for <dhcwg@ietf.org>; Fri, 16 Oct 2015 15:41:56 -0700 (PDT)
DomainKey-Signature: s=orkaan.nssi; d=telus.com; c=nofws; q=dns; h=X-IronPort-Anti-Spam-Filtered: X-IronPort-Anti-Spam-Result:X-IronPort-AV:Received: Received:From:To:CC:Date:Subject:Thread-Topic: Thread-Index:Message-ID:References:In-Reply-To: Accept-Language:Content-Language:X-MS-Has-Attach: X-MS-TNEF-Correlator:user-agent:acceptlanguage: Content-Type:MIME-Version; b=TNcueJHAayrNXTFrXNGZZGlvSFV+VPdb1W7t6Tc2Cx4NZWttenB21wzw 3KVsnKBR/JnZ+sXPpX+3VpW31fhv8FRQNUBxayKm3+2l/gsfF3u1y33xZ dSdcDaEgBQdAaRYAHo76G7+4riP+qp4z5eUqsXGCZTlsF0z+PmEp0O6Uu I=;
X-IronPort-Anti-Spam-Filtered: true
X-IronPort-Anti-Spam-Result: A2AOBQAHfCFW/5Jjso5egmghLMEagmSDOgKBOjsRAQEBAQEBAYEKhCkBAQQODzYSFBACAQgNLgsfEyUBAQQOIIgVAcNpAQEIAQEBAQEBHYZ3hH6BPYMFSweELgWOB4gbDod7hDVfgViHVAySXjcsgh4mgV6GOgEBAQ
X-IronPort-AV: E=Sophos;i="5.17,690,1437436800"; d="scan'208,217";a="455309450"
Received: from unknown (HELO WP40081.corp.ads) ([142.178.99.146]) by orkaan-o.nssi.telus.com with ESMTP/TLS/AES128-SHA; 16 Oct 2015 22:41:53 +0000
Received: from WP41072.corp.ads ([fe80::782e:6557:8fc4:def7]) by WP40081.corp.ads ([::1]) with mapi; Fri, 16 Oct 2015 16:41:53 -0600
From: Dan Seibel <Dan.Seibel@TELUS.COM>
To: Ted Lemon <ted.lemon@nominum.com>
Date: Fri, 16 Oct 2015 16:41:49 -0600
Thread-Topic: [dhcwg] WG Adoption call for draft-gandhewar-dhc-relay-initiated-release and draft-gandhewar-dhc-v6-relay-initiated-release (Expires Oct 27, 2015)
Thread-Index: AdEIY9o0Y+cGb+zNSJu6ndYbTax/ig==
Message-ID: <D246D65C.78C6F%dan.seibel@telus.com>
References: <3ab954660ca847fc9d32d53c0cc7c959@XCH-ALN-003.cisco.com> <CAKD1Yr1jmr-+pk4ebHkSiTaHmYTg1ABm4sLov54Z-n+S2bqqtw@mail.gmail.com> <4ECE8D10-07B8-4886-8210-44BC94698C70@nominum.com> <fa532f9a205b406f95afedd3cab17233@XCH-ALN-003.cisco.com> <D246AFA6.78BC1%dan.seibel@telus.com> <C96E7F6D-5A7D-4033-ACFA-B65DFEFAC013@nominum.com>
In-Reply-To: <C96E7F6D-5A7D-4033-ACFA-B65DFEFAC013@nominum.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
user-agent: Microsoft-MacOutlook/14.5.7.151005
acceptlanguage: en-US
Content-Type: multipart/alternative; boundary="_000_D246D65C78C6Fdanseibelteluscom_"
MIME-Version: 1.0
Archived-At: <http://mailarchive.ietf.org/arch/msg/dhcwg/4B0K3WZ0ua7PiNvDZvogR4XS9vw>
Cc: "dhcwg@ietf.org" <dhcwg@ietf.org>, "Bernie Volz (volz)" <volz@cisco.com>
Subject: Re: [dhcwg] WG Adoption call for draft-gandhewar-dhc-relay-initiated-release and draft-gandhewar-dhc-v6-relay-initiated-release (Expires Oct 27, 2015)
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dhcwg/>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Oct 2015 22:41:59 -0000

Dhcp relay agent on a edge router (BNG) has two dhcp servers that it relays requests to (Server 1, Server 2).  These servers “should" be maintaining state between them for a given address pool.

Client A comes online and gets IP address 10.1.1.1 from Server 1.
Client B comes online and gets IP address 10.1.1.1 from Server 2.  The BNG sees this duplication and drops the request.  Client B is not able to get an address, and as far as Server 2 is concerned Client B just hasn’t responded back to it’s offer and continues to offer this same address out to Client B with each discover that it gets.

Now in theory the above issue shouldn’t happen if the dhcp servers are operating like they should etc.  However these types of issues do come up more often than I would care to see and if there is some other way to provide more resiliency in the whole process I would like to see that.

To paraphrase, "The DHCP server software I am using does not have a working failover implementation, and consequently I would like the DHCP protocol to be changed in order to address the failings of that server."   I’m sorry, I know that’s a bit cold, but seriously, this is a solved problem.   A protocol extension that makes DHCP less reliable in order to address a lack of reliability in a server seems like the wrong thing to do.

No that isn’t cold, that would normally be my response to “solve my misconfigured / buggy device by you changing what you are doing”.  I was curious if there is the potential to add another layer of resiliency without obviously making dhcp worse.

The reality is that BNGs using dhcp relay will usually maintain dhcp state of the active clients as this is used for creating forwarding tables, providing anti-spoofing protection etc.  If there is an enhancement to the dhcp protocol that will enable the dhcp relay agent to communicate with the server to help solve some out of sync issues I think that is a good idea.

DHCP leasequery?

Yes thanks for reminding me, this could potentially help a number of situations.

"I do believe that some places are already doing this by generating Releases messages (though perhaps the authors can confirm) and likely have not seen issues in that setting because it meets the criteria I mentioned above.”

I know of a couple vendors that do the above already (usually do a mac ping/ NS to the client and if there is no response then the relay sends a release).  One enhancement to this process the drafts would add is a way to see how many releases are happening from the actual client release vs releases from the dhcp relay do to other reasons.

A ping test tells you that the client is unreachable at the time of the test.   It doesn’t tell you whether it is actually offline, nor whether it has forgotten its lease.

Sorry, “MAC ping” should have read ARP/NS.  Which works fine to detect if a client gateway has been disconnected or is unreachable but yes can’t tell you if it has forgotten it’s lease.

"why this particular solution is a better solution than using shorter lease times”

Shorter lease times can help in some situations, but when operating networking equipment that has 10s of thousands of leases to maintain and deal with lowering lease times can only go so far before you start to hit resource / scalability issues on the BNG.

Scalability on the BNG?   This is the first I’ve heard of that.   Can you expand on this?   The BNG is just a relay agent, right?   Where’s the scalability issue there?


BNGs would do more than just relay dhcp traffic.  They would usually need to maintain state of all the dhcp sessions to create forwarding tables etc.  This involves processing all dhcp traffic that passes through them.  They also may do authentication/ accounting on every dhcp request.  Short lease times can also be a major issue if you have some sort of interruption between the relay and the clients causing them to loose their leases.  When the connectivity is restored you end up with a dhcp discovery storm of thousands of clients trying to obtain service all at the same time.  Short and long dhcp leases each have their pros and cons, and trying to find the balance seems to be a bit of an art.

Dan