Re: [dhcwg] WG Adoption call for draft-gandhewar-dhc-relay-initiated-release and draft-gandhewar-dhc-v6-relay-initiated-release (Expires Oct 27, 2015)

Ted Lemon <ted.lemon@nominum.com> Fri, 16 October 2015 22:08 UTC

Return-Path: <Ted.Lemon@nominum.com>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 853B91B3425 for <dhcwg@ietfa.amsl.com>; Fri, 16 Oct 2015 15:08:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.909
X-Spam-Level:
X-Spam-Status: No, score=-1.909 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9y0ehRtD-SZ7 for <dhcwg@ietfa.amsl.com>; Fri, 16 Oct 2015 15:08:21 -0700 (PDT)
Received: from sjc1-mx02-inside.nominum.com (sjc1-mx02-inside.nominum.com [64.89.234.25]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 8FF1F1A21BB for <dhcwg@ietf.org>; Fri, 16 Oct 2015 15:08:21 -0700 (PDT)
Received: from webmail.nominum.com (cas-03.win.nominum.com [64.89.235.66]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client CN "mail.nominum.com", Issuer "Go Daddy Secure Certificate Authority - G2" (verified OK)) by sjc1-mx02-inside.nominum.com (Postfix) with ESMTPS id 7BB08DA008A; Fri, 16 Oct 2015 22:08:21 +0000 (UTC)
Received: from [10.0.20.146] (71.233.41.235) by CAS-03.WIN.NOMINUM.COM (192.168.1.100) with Microsoft SMTP Server (TLS) id 14.3.224.2; Fri, 16 Oct 2015 15:08:21 -0700
Content-Type: multipart/alternative; boundary="Apple-Mail=_532DF289-DA33-4FBA-B74D-0D1E1E234C3A"
MIME-Version: 1.0 (Mac OS X Mail 8.2 \(2104\))
From: Ted Lemon <ted.lemon@nominum.com>
In-Reply-To: <D246AFA6.78BC1%dan.seibel@telus.com>
Date: Fri, 16 Oct 2015 18:08:19 -0400
Message-ID: <C96E7F6D-5A7D-4033-ACFA-B65DFEFAC013@nominum.com>
References: <3ab954660ca847fc9d32d53c0cc7c959@XCH-ALN-003.cisco.com> <CAKD1Yr1jmr-+pk4ebHkSiTaHmYTg1ABm4sLov54Z-n+S2bqqtw@mail.gmail.com> <4ECE8D10-07B8-4886-8210-44BC94698C70@nominum.com> <fa532f9a205b406f95afedd3cab17233@XCH-ALN-003.cisco.com> <D246AFA6.78BC1%dan.seibel@telus.com>
To: Dan Seibel <Dan.Seibel@TELUS.COM>
X-Mailer: Apple Mail (2.2104)
X-Originating-IP: [71.233.41.235]
Archived-At: <http://mailarchive.ietf.org/arch/msg/dhcwg/WsHSKKJ6bunYN0xrWTnuIUP7nd4>
Cc: "dhcwg@ietf.org" <dhcwg@ietf.org>, "Bernie Volz (volz)" <volz@cisco.com>
Subject: Re: [dhcwg] WG Adoption call for draft-gandhewar-dhc-relay-initiated-release and draft-gandhewar-dhc-v6-relay-initiated-release (Expires Oct 27, 2015)
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dhcwg/>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 16 Oct 2015 22:08:23 -0000

On Oct 16, 2015, at 4:28 PM, Dan Seibel <Dan.Seibel@TELUS.COM> wrote:
> Dhcp relay agent on a edge router (BNG) has two dhcp servers that it relays requests to (Server 1, Server 2).  These servers “should" be maintaining state between them for a given address pool.
> 
> Client A comes online and gets IP address 10.1.1.1 from Server 1.
> Client B comes online and gets IP address 10.1.1.1 from Server 2.  The BNG sees this duplication and drops the request.  Client B is not able to get an address, and as far as Server 2 is concerned Client B just hasn’t responded back to it’s offer and continues to offer this same address out to Client B with each discover that it gets.
> 
> Now in theory the above issue shouldn’t happen if the dhcp servers are operating like they should etc.  However these types of issues do come up more often than I would care to see and if there is some other way to provide more resiliency in the whole process I would like to see that.

To paraphrase, "The DHCP server software I am using does not have a working failover implementation, and consequently I would like the DHCP protocol to be changed in order to address the failings of that server."   I’m sorry, I know that’s a bit cold, but seriously, this is a solved problem.   A protocol extension that makes DHCP less reliable in order to address a lack of reliability in a server seems like the wrong thing to do.

> The reality is that BNGs using dhcp relay will usually maintain dhcp state of the active clients as this is used for creating forwarding tables, providing anti-spoofing protection etc.  If there is an enhancement to the dhcp protocol that will enable the dhcp relay agent to communicate with the server to help solve some out of sync issues I think that is a good idea.

DHCP leasequery?

> "I do believe that some places are already doing this by generating Releases messages (though perhaps the authors can confirm) and likely have not seen issues in that setting because it meets the criteria I mentioned above.”
> 
> I know of a couple vendors that do the above already (usually do a mac ping/ NS to the client and if there is no response then the relay sends a release).  One enhancement to this process the drafts would add is a way to see how many releases are happening from the actual client release vs releases from the dhcp relay do to other reasons.

A ping test tells you that the client is unreachable at the time of the test.   It doesn’t tell you whether it is actually offline, nor whether it has forgotten its lease.

> "why this particular solution is a better solution than using shorter lease times”
> 
> Shorter lease times can help in some situations, but when operating networking equipment that has 10s of thousands of leases to maintain and deal with lowering lease times can only go so far before you start to hit resource / scalability issues on the BNG.

Scalability on the BNG?   This is the first I’ve heard of that.   Can you expand on this?   The BNG is just a relay agent, right?   Where’s the scalability issue there?