Re: [dhcwg] Load Balancing for DHCPv6

Donald Eastlake <d3e3e3@gmail.com> Thu, 20 September 2012 05:27 UTC

Return-Path: <d3e3e3@gmail.com>
X-Original-To: dhcwg@ietfa.amsl.com
Delivered-To: dhcwg@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 3F0C221F86B0 for <dhcwg@ietfa.amsl.com>; Wed, 19 Sep 2012 22:27:31 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.699
X-Spam-Level:
X-Spam-Status: No, score=-102.699 tagged_above=-999 required=5 tests=[AWL=-0.900, BAYES_00=-2.599, J_CHICKENPOX_31=0.6, J_CHICKENPOX_33=0.6, J_CHICKENPOX_73=0.6, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id kW0GxiMnhrg5 for <dhcwg@ietfa.amsl.com>; Wed, 19 Sep 2012 22:27:30 -0700 (PDT)
Received: from mail-iy0-f172.google.com (mail-iy0-f172.google.com [209.85.210.172]) by ietfa.amsl.com (Postfix) with ESMTP id B455221F8678 for <dhcwg@ietf.org>; Wed, 19 Sep 2012 22:27:30 -0700 (PDT)
Received: by iabz21 with SMTP id z21so1554326iab.31 for <dhcwg@ietf.org>; Wed, 19 Sep 2012 22:27:30 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type:content-transfer-encoding; bh=WapE9RsoGh0hmjz4agI/UZ9Vd2m+rQ9zipFmHPpRpso=; b=EB4vtnxMulTmNlSPebI6DPY/+Ih/t7gmC1TdOxCr12rvtMVp6n70kTN8UeIy0RT4rI 2r9Tb3/o3368Z+5I55BXxyKc3ewoeOViXmUmujr8Zo91znUoMQ+FAVs/aqimWdo5+SYH xnx0oBqmD/7UQL59nZCqIUIARUw9Azzg9OvI3QOibs7l+QbdlGnbZAUZDDh4+nd2Bypi CB1XaxCGAl5C3cmjp+oUx1hg8CVOrYF8hVnj+yaIOXuyJGeFKtvU5g2HB8Lpb/igOqB4 AlPJ0a9bc3Q+S2BnjV9zzSjksTQ+jnzY/kTk2jacW4R5ZC51TR287uTC23KlU6Xiny0o 5EQw==
Received: by 10.50.184.129 with SMTP id eu1mr730690igc.0.1348118850097; Wed, 19 Sep 2012 22:27:30 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.64.86.113 with HTTP; Wed, 19 Sep 2012 22:27:09 -0700 (PDT)
In-Reply-To: <A563FEA0-0A26-431D-BDAA-AD897F691754@nominum.com>
References: <CAL10_Bqa4ftiVhyyf0ezwKR7mzAEOLNi_K3EJFPFUzPnz7QGPw@mail.gmail.com> <489D13FBFA9B3E41812EA89F188F018E0F4F3093@xmb-rcd-x04.cisco.com> <CAL10_Br=OcWZuar1fDOopevTy_W-3g9TsYqo61rOWm4tKkuYgg@mail.gmail.com> <2D09D61DDFA73D4C884805CC7865E61118003F@GAALPA1MSGUSR9N.ITServices.sbc.com> <CAL10_BpXdx03WfV1PeMKg1zYc1dAFKe1CDNdrcNf45+_EVCBPg@mail.gmail.com> <CDDB9016-BE11-489A-9361-0172D96A464C@nominum.com> <CAOpJ=k2CJS=FuUvFwOq=s2m871_qfo=xROsW=fx0E48w2wxAqQ@mail.gmail.com> <CAL10_BoLsdppYKNSfHheYrZg+SfaggoynQf2X11HEdy=ELFUiQ@mail.gmail.com> <5049C317.7090603@gmail.com> <94FA926F-2432-4AE7-BC20-AE7458AB40D9@cisco.com> <CAF4+nEHqRFHbz9qfQuOqpLCNeZqkT=+f53_eCboECfWX8QCt6A@mail.gmail.com> <71F17433-B2D9-4366-9B32-F0E4D294EDB5@nominum.com> <CAF4+nEE6pbmO_ss+3UEpRG1kh2YD20P2KD4CFQ7LwoRZ6gyxsg@mail.gmail.com> <A563FEA0-0A26-431D-BDAA-AD897F691754@nominum.com>
From: Donald Eastlake <d3e3e3@gmail.com>
Date: Thu, 20 Sep 2012 01:27:09 -0400
Message-ID: <CAF4+nEFqLW6p1wE9BWE6c2CqGDwzUQTYom9K4icV1_NjM-nTXg@mail.gmail.com>
To: Ted Lemon <Ted.Lemon@nominum.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: "dhcwg@ietf.org" <dhcwg@ietf.org>, "Bernie Volz (volz)" <volz@cisco.com>
Subject: Re: [dhcwg] Load Balancing for DHCPv6
X-BeenThere: dhcwg@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: <dhcwg.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/dhcwg>
List-Post: <mailto:dhcwg@ietf.org>
List-Help: <mailto:dhcwg-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/dhcwg>, <mailto:dhcwg-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 20 Sep 2012 05:27:31 -0000

Well, I have some more statistics. One my co-authors on the draft
whipped up some code that I then extended. The data hashed consisted
of 2,560,000 strings calculated as follows:
        #define RUN 10000
        for(i = 0; i < 256*RUN; ++i)
{
                sprintf(str, "%08d^^%08d", i, i);
                {do hashes, etc}
}

Each string was hashed with Pearson and with 32-bit FNV-1a. Three
arrays of 256 counters were incremented based on the Pearson hash,
based on the bottom byte of the 32-bit FNV after one 16-bit fold (
hash2 = (hash&0xffff) ^ (hash >> 16) ), and based on the bottom byte
of the 32-bit FNV after an additional 8-bit fold (hash3 = (hash2&0xff)
^ (hash2 >> 8); the equivalent of xor'ing all four bytes together).
The expected number of hits on each bucket with perfect hashing would
be 10,000. Here are the minimum and maximum number of hits and the
standard deviation:

Pearson: min=9678 max=19938 std-dev=630.72
FNV-32 1-fold: min=9699 max=10331 std-dev=103.06
FNV-32 2-folds: min=9662 max=10247 std-dev=90.78

So you now have Brian Carpenter's results, showing FNV to be much
better than a variety of other methods, but not Pearson, on fairly
realistic data. And you have these results showing FNV to be much
better than Pearson on somewhat unrealistic data. ...

Thanks,
Donald
=============================
 Donald E. Eastlake 3rd   +1-508-333-2270 (cell)
 155 Beaver Street, Milford, MA 01757 USA
 d3e3e3@gmail.com


On Fri, Sep 7, 2012 at 6:29 PM, Ted Lemon <Ted.Lemon@nominum.com> wrote:
> On Sep 7, 2012, at 5:37 PM, Donald Eastlake <d3e3e3@gmail.com> wrote:
>> FNV takes a little more effort than Pearson. It also has a smaller
>> footprint in memory. And I believe it will produce statistically
>> better results. What are the relative weights of these or other
>> factors in this case?
>
> Unfortunately, as far as I can tell neither of the studies you cite compares FNV to Pearson, although the results from Brian Carpenter's study are certainly pretty impressive.    I think more work is worse, smaller footprint probably doesn't matter (this algorithm is being run in DHCP servers, which typically aren't seriously memory-constrained), and statistically better results only matter if it's a pretty big difference, so it's frustrating that we don't have a comparison to the existing algorithm.
>
> To me, the biggest win of using FNV would be if there were an RFC to refer to, but it looks like this is not quite there yet.   So my conclusion is that I certainly don't oppose using FNV instead of Pearson, but I don't see a strong reason to support it either.
>