Re: [dhcwg] Load Balancing for DHCPv6

Andre Kostur <akostur@incognito.com> Fri, 21 September 2012 17:12 UTC

Return-Path: <akostur@incognito.com>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 8311C21F8751 for <dhcwg@ietfa.amsl.com>; Fri, 21 Sep 2012 10:12:23 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.948
X-Spam-Level:
X-Spam-Status: No, score=-4.948 tagged_above=-999 required=5 tests=[AWL=-0.771, BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, J_CHICKENPOX_31=0.6, J_CHICKENPOX_33=0.6, J_CHICKENPOX_73=0.6, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 7u91-CHjzaPT for <dhcwg@ietfa.amsl.com>; Fri, 21 Sep 2012 10:12:22 -0700 (PDT)
Received: from na3sys010aog114.obsmtp.com (na3sys010aog114.obsmtp.com [74.125.245.96]) by ietfa.amsl.com (Postfix) with SMTP id 4D56721F8748 for <dhcwg@ietf.org>; Fri, 21 Sep 2012 10:12:21 -0700 (PDT)
Received: from mail-ie0-f172.google.com ([209.85.223.172]) (using TLSv1) by na3sys010aob114.postini.com ([74.125.244.12]) with SMTP ID DSNKUFyf9e4Om48TTwzop4T7Cs5rFibNjeuR@postini.com; Fri, 21 Sep 2012 10:12:22 PDT
Received: by iec9 with SMTP id 9so6858923iec.31 for <dhcwg@ietf.org>; Fri, 21 Sep 2012 10:12:21 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding:x-gm-message-state; bh=FLaqVTqY3Xlm07ugNlb7AtIwXupTrxnv6mAW8RGIDHo=; b=Qvz8JnM+tGfwVP6/WebGuyd9tLuRhkVpfM0jGim7ewLWHTYqreEJfO5pUrKC3yGBQP 0Ttuvt50zbNaJwR4tAVhJjTUELmHIpVz2kDsq7bKsQH5SD00NYFtPGoQ4FkagQM7EGNv 8HGtFw/ZLREMbroVvbzuSbM51iRRnaQb6W71juHpVUwRy1YBFhHjsoSnpHbfhDXW6Xgb L7Bz78M3GG1VFG06VNVocHjfribrnoKKFEPEZFmQDRyZ7vt0Mh6ctz7i/10EoQfo9wdz KN8L0nyHeOTPa0O5AkDKVLep5+EIjGU5wFQWO7wqtrt1EDUzfvsXB6rPk9AZxfzQisA0 br5A==
MIME-Version: 1.0
Received: by 10.50.47.227 with SMTP id g3mr2343896ign.5.1348247540932; Fri, 21 Sep 2012 10:12:20 -0700 (PDT)
Received: by 10.64.100.137 with HTTP; Fri, 21 Sep 2012 10:12:20 -0700 (PDT)
In-Reply-To: <489D13FBFA9B3E41812EA89F188F018E0F5033D6@xmb-rcd-x04.cisco.com>
References: <CAL10_Bqa4ftiVhyyf0ezwKR7mzAEOLNi_K3EJFPFUzPnz7QGPw@mail.gmail.com> <489D13FBFA9B3E41812EA89F188F018E0F4F3093@xmb-rcd-x04.cisco.com> <CAL10_Br=OcWZuar1fDOopevTy_W-3g9TsYqo61rOWm4tKkuYgg@mail.gmail.com> <2D09D61DDFA73D4C884805CC7865E61118003F@GAALPA1MSGUSR9N.ITServices.sbc.com> <CAL10_BpXdx03WfV1PeMKg1zYc1dAFKe1CDNdrcNf45+_EVCBPg@mail.gmail.com> <CDDB9016-BE11-489A-9361-0172D96A464C@nominum.com> <CAOpJ=k2CJS=FuUvFwOq=s2m871_qfo=xROsW=fx0E48w2wxAqQ@mail.gmail.com> <CAL10_BoLsdppYKNSfHheYrZg+SfaggoynQf2X11HEdy=ELFUiQ@mail.gmail.com> <5049C317.7090603@gmail.com> <94FA926F-2432-4AE7-BC20-AE7458AB40D9@cisco.com> <CAF4+nEHqRFHbz9qfQuOqpLCNeZqkT=+f53_eCboECfWX8QCt6A@mail.gmail.com> <71F17433-B2D9-4366-9B32-F0E4D294EDB5@nominum.com> <CAF4+nEE6pbmO_ss+3UEpRG1kh2YD20P2KD4CFQ7LwoRZ6gyxsg@mail.gmail.com> <A563FEA0-0A26-431D-BDAA-AD897F691754@nominum.com> <CAF4+nEFqLW6p1wE9BWE6c2CqGDwzUQTYom9K4icV1_NjM-nTXg@mail.gmail.com> <1BB5EDFF-BBA9-4335-BE55-058FACEE30EB@cisco.com> <CAL10_Bo_DE1Lq4BuXRaMmBs9Fah=4TxroBgUEQ8V_6VC2NnTzQ@mail.gmail.com> <489D13FBFA9B3E41812EA89F188F018E0F5033D6@xmb-rcd-x04.cisco.com>
Date: Fri, 21 Sep 2012 10:12:20 -0700
Message-ID: <CAL10_BqaP1A3a68LcP7v6692+F9y4pE3bixMkkNBKfWrPuGP=A@mail.gmail.com>
From: Andre Kostur <akostur@incognito.com>
To: "Bernie Volz (volz)" <volz@cisco.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Gm-Message-State: ALoCoQm9j243QGawJhz/INa+BSQUkuYbGyyWL2Nde3P0iCC2P6vQsl0QMQpzPYW79C3lY4+pJ/bX
Cc: "dhcwg@ietf.org" <dhcwg@ietf.org>, Ted Lemon <Ted.Lemon@nominum.com>
Subject: Re: [dhcwg] Load Balancing for DHCPv6
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dhcwg>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 21 Sep 2012 17:12:23 -0000

Whups, yep.  Changed that to a 1, and no significant different in the
stddev.  Both still in that 95 - 105 range.

I'm also of the opinion that Pearson is the way to go.  Seems to give
a pretty good spread of values, it's already documented in RFC
(complete with implementation), and any servers or relays that may
have already been doing load balancing for v4 will already have this
algorithm implemented.


On Fri, Sep 21, 2012 at 10:04 AM, Bernie Volz (volz) <volz@cisco.com> wrote:
> Thanks Andre.
>
> While it would make no difference, it is always a mystery whether DUID will use a hardware type of 1 or 6. Per http://www.iana.org/assignments/arp-parameters/arp-parameters.xml, type 1 is for Ethernet whereas 6 is for IEEE 802.3.
>
> It may well be for a more limited data set (i.e. ASCII strings), FNV1a could be better. I think Pearson was optimized for binary data.
>
> My feeling is we should stick with Pearson and use the entire DUID (no length limit).
>
> - Bernie
>
> -----Original Message-----
> From: Andre Kostur [mailto:akostur@incognito.com]
> Sent: Friday, September 21, 2012 12:44 PM
> To: Bernie Volz (volz)
> Cc: Donald Eastlake; dhcwg@ietf.org; Ted Lemon
> Subject: Re: [dhcwg] Load Balancing for DHCPv6
>
> I've crafted up a test rig for Pearson vs. FNV1a   (Pearson code from
> the RFC, FNV1a from
> http://isthe.com/chongo/tech/comp/fnv/#FNV-reference-source, with the XOR-folding at http://www.isthe.com/chongo/tech/comp/fnv/index.html#xor-fold).
>
> I generated my DUIDs as a set of DUID-LLs (so the first four bytes are 00, 03, 00, 06).  The next three were taken randomly from IEEE's list of registered OUIs.  The remaining three are rand().
>
> Running those through both Pearson and FNV1a 5 times each (on a set of
> 2,560,000 DUID-LLs) resulted in:
>
> Pearson:
> Min: 9637 - 9750, Max: 10220 - 10318, StdDev: 92.91 - 102.28
>
> FNV1a:
> Min: 9691 - 9754, Max: 10221 - 10379, Stdev: 93.27 - 107.66
>
>
> This seems to suggest that Pearson is better than FNV1a over DUID-LLs.
>
>
> On Thu, Sep 20, 2012 at 5:05 AM, Bernie Volz (volz) <volz@cisco.com> wrote:
>> Bad test - This is a very limited character set.
>>
>> The data usually used for dhcp is binary data.
>>
>> Sent from my iPad
>>
>> On Sep 20, 2012, at 1:27 AM, "Donald Eastlake" <d3e3e3@gmail.com> wrote:
>>
>>> Well, I have some more statistics. One my co-authors on the draft
>>> whipped up some code that I then extended. The data hashed consisted
>>> of 2,560,000 strings calculated as follows:
>>>        #define RUN 10000
>>>        for(i = 0; i < 256*RUN; ++i)
>>> {
>>>                sprintf(str, "%08d^^%08d", i, i);
>>>                {do hashes, etc}
>>> }
>>>
>>> Each string was hashed with Pearson and with 32-bit FNV-1a. Three
>>> arrays of 256 counters were incremented based on the Pearson hash,
>>> based on the bottom byte of the 32-bit FNV after one 16-bit fold (
>>> hash2 = (hash&0xffff) ^ (hash >> 16) ), and based on the bottom byte
>>> of the 32-bit FNV after an additional 8-bit fold (hash3 =
>>> (hash2&0xff) ^ (hash2 >> 8); the equivalent of xor'ing all four bytes together).
>>> The expected number of hits on each bucket with perfect hashing would
>>> be 10,000. Here are the minimum and maximum number of hits and the
>>> standard deviation:
>>>
>>> Pearson: min=9678 max=19938 std-dev=630.72
>>> FNV-32 1-fold: min=9699 max=10331 std-dev=103.06
>>> FNV-32 2-folds: min=9662 max=10247 std-dev=90.78
>>>
>>> So you now have Brian Carpenter's results, showing FNV to be much
>>> better than a variety of other methods, but not Pearson, on fairly
>>> realistic data. And you have these results showing FNV to be much
>>> better than Pearson on somewhat unrealistic data. ...
>>>
>>> Thanks,
>>> Donald
>>> =============================
>>> Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
>>> 155 Beaver Street, Milford, MA 01757 USA d3e3e3@gmail.com
>>>
>>>
>>> On Fri, Sep 7, 2012 at 6:29 PM, Ted Lemon <Ted.Lemon@nominum.com> wrote:
>>>> On Sep 7, 2012, at 5:37 PM, Donald Eastlake <d3e3e3@gmail.com> wrote:
>>>>> FNV takes a little more effort than Pearson. It also has a smaller
>>>>> footprint in memory. And I believe it will produce statistically
>>>>> better results. What are the relative weights of these or other
>>>>> factors in this case?
>>>>
>>>> Unfortunately, as far as I can tell neither of the studies you cite compares FNV to Pearson, although the results from Brian Carpenter's study are certainly pretty impressive.    I think more work is worse, smaller footprint probably doesn't matter (this algorithm is being run in DHCP servers, which typically aren't seriously memory-constrained), and statistically better results only matter if it's a pretty big difference, so it's frustrating that we don't have a comparison to the existing algorithm.
>>>>
>>>> To me, the biggest win of using FNV would be if there were an RFC to refer to, but it looks like this is not quite there yet.   So my conclusion is that I certainly don't oppose using FNV instead of Pearson, but I don't see a strong reason to support it either.
>>>>
>
> --
> Andre Kostur

-- 
Andre Kostur