Re: [rtcweb] Data on travel times

Eric Rescorla <ekr@rtfm.com> Mon, 09 April 2012 16:03 UTC

Return-Path: <ekr@rtfm.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C96EC21F8753 for <rtcweb@ietfa.amsl.com>; Mon, 9 Apr 2012 09:03:22 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.977
X-Spam-Level:
X-Spam-Status: No, score=-102.977 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, FM_FORGED_GMAIL=0.622, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id tJPIF2uw48Ca for <rtcweb@ietfa.amsl.com>; Mon, 9 Apr 2012 09:03:21 -0700 (PDT)
Received: from mail-vb0-f44.google.com (mail-vb0-f44.google.com [209.85.212.44]) by ietfa.amsl.com (Postfix) with ESMTP id A2BDD21F8720 for <rtcweb@ietf.org>; Mon, 9 Apr 2012 09:03:21 -0700 (PDT)
Received: by vbbez10 with SMTP id ez10so2785194vbb.31 for <rtcweb@ietf.org>; Mon, 09 Apr 2012 09:03:21 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:from:date :message-id:subject:to:cc:content-type:content-transfer-encoding :x-gm-message-state; bh=4x6yI90r4nVRFELvADA733gj7ghqLXgACz7bijSd9X4=; b=B1ZKQLGGbkKi1IJoeJOQXb31maBuj/oDpCc8Z1ehbIgMcYBRSCFZ4Ri6kpIeNRsrrF 4iXU5zbaJLabh72NIHFZHT2G6aOb5DOP4W66mDLlBvprxHycFPlpGxLF5qCezDkiYuqC gpzvxnKvzGDoCe16SIokSH+3+aLWmIh676CCKUqCB4LGFObkdWPcS9HX45AFuPH/23Jr vbj7OxwnJSqSdlVbH97zJV7e6fwM4Z7AVJzjzfV8/Qrl2U1kNkblEVWgE5MB8KYY54KU mUNBZV+G8Jn9id1q5d2Mgz4K0gnst255caMszn+714l1PaGlYED0S8UkH2Mfh7vDlqIx p8LA==
Received: by 10.220.153.8 with SMTP id i8mr3840272vcw.73.1333987401157; Mon, 09 Apr 2012 09:03:21 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.52.19.233 with HTTP; Mon, 9 Apr 2012 09:02:41 -0700 (PDT)
X-Originating-IP: [63.245.220.224]
In-Reply-To: <CAJNg7VLfrn_SkTXHQYmR52NP5sxpO-03swiC4RBSDpwgOOt6cg@mail.gmail.com>
References: <CABcZeBPDpguge1zT5JyDk+tohMn1_av4jgdgDhNLnXMFKNzcbg@mail.gmail.com> <CAJNg7VLfrn_SkTXHQYmR52NP5sxpO-03swiC4RBSDpwgOOt6cg@mail.gmail.com>
From: Eric Rescorla <ekr@rtfm.com>
Date: Mon, 09 Apr 2012 09:02:41 -0700
Message-ID: <CABcZeBO85+MuNshMYfF2qxU3ws7EiuHSY9Gvh0mUE7i7ot8=FQ@mail.gmail.com>
To: Marshall Eubanks <marshall.eubanks@gmail.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
X-Gm-Message-State: ALoCoQkoRJoPQsGGoFAKTlhD/iyte5PprletJQW7dxurPWSq+88Ho7QunAVA4iqvb4hdx+6Drxno
Cc: rtcweb@ietf.org, public-webrtc@w3.org
Subject: Re: [rtcweb] Data on travel times
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Apr 2012 16:03:22 -0000

On Mon, Apr 9, 2012 at 8:35 AM, Marshall Eubanks
<marshall.eubanks@gmail.com> wrote:
> I really like this analysis. Some questions.
>
> 2012/4/9 Eric Rescorla <ekr@rtfm.com>:
>> Hi folks,
>>
>> Since it seems like we're going to be having a large number of
>> interims, I thought it might be instructive to try to analyze a bunch
>> of different locations to figure out the best strategy. My first cut
>> analysis is below.
>>
>> Note that I'm not trying to make any claims about what the best set of
>> venues is. It's obviously easy to figure out any statistic we want
>> about each proposed venue, but how you map that data to "best" is up
>> to you. In particular, there's some tradeoff between minimal total
>> travel time and a "fair" distribution of travel times (not that I
>> claim to know what that means).
>>
>>
>> METHODOLOGY
>> The data below is derived by treating both people and venues as
>> airport locations and using travel time as our primary instrument.
>>
>> 1. For each responder for the current Doodle poll, assign a home
>>   airport based on their draft publication history.  We're missing a
>>   few people but basically it should be pretty complete. Since
>>   these people responded before the venue is known, it's at
>>   least somewhat unbiased.
>>
>> 2. Compute the shortest advertised flight between each home airport
>>   and the locations for each venue by looking at the shortest
>>   advertised Kayak flights around one of the proposed interim
>>   dates (6/10 - 6/13), ignoring price, but excluding "Hacker fares".
>>   [Thanks to Martin Thomson or helping me gather these.]
>>
>
> 1.) Why are some fields doubled ? I.e.,
>
> ARN SFO 14 13
>
> Are these counted twice ? That would, of course, give more weight to
> those records.

Laziness. When I started recording flight times, I used the total time
and then later realized that what I wanted was to break them out by
out and back, but I was too lazy to go back and fix the earlier ones.


> 2.) At any rate, I couldn't quite match your numbers. For SFO, for
> example, I got
>
> # SFO
>
>  Records            29  |
>  Mean            12.52  |
>  RMS             15.34  |
>  Std Dev          8.55  |
>  Minimum          1.00  |
>  Maximum         34.00  |
>
> This assumes that each doubled entry counts as 2 separate entries. If
> the second entries are ignored, I get

I'm not sure what procedure you are following here, but if it's taking the
SD of the data in durations.txt, that's not what I did. That's just
the input data.

The summary data that I am showing is produced by weighting by
participant from each home airport. The script to generate that is
pairings.py and the results are found in doodle-out.txt. Of course,
it could still all be wrong.

FWIW, I'm using R's sd() which uses n-1.

-Ekr



> # SFO
>
>  Records            21  |
>  Mean            14.05  |
>  RMS             17.05  |
>  Std Dev          9.14  |
>  Minimum          1.00  |
>  Maximum         34.00  |
>
> If two entries are averaged together (when present)
>
> # SFO
>  Records            21  |
>  Mean            13.93  |
>  RMS             16.97  |
>  Std Dev          9.18  |
>  Minimum          1.00  |
>  Maximum         34.00  |
>
> None of these 3 options match your
>
> Venue         Mean         Median           SD
> ----------------------------------------------
> SFO           13.5             11         12.2
>
> In particular, your SD value seems high.
>
> (Note, I use the SD = root mean square /(n-1) not / n convention, but
> that won't explain the difference. )
>
> Regards
> Marshall
>
>
>> This lets us compute statistics for any venue and/or combination
>> of venues, based on the candidate attendee list.
>>
>> The three proposed venues:
>>
>> - San Francisco (SFO)
>> - Boston (BOS)
>> - Stockholm (ARN)
>>
>> Three hubs not too distant from the proposed venues:
>>
>> - London (LHR)
>> - Frankfurt (FRA)
>> - New York (NYC) [0]
>>
>> Also, Calgary (YYC), since the other two chair locations (BOS and SFO)
>> were already proposed as venues, and I didn't want Cullen to feel
>> left out.
>>
>>
>> RESULTS
>> Here are the results for each of the above venues, measured in total
>> hours of travel (i.e., round trip).
>>
>> Venue         Mean         Median           SD
>> ----------------------------------------------
>> SFO           13.5             11         12.2
>> BOS           12.3             11          7.5
>> ARN           17.0             21         10.7
>> FRA           14.8             17          7.3
>> LHR           13.3             14          7.5
>> NYC           11.5             11          5.8
>> YYC           14.9             13         10.2
>> SFO/BOS/ARN   14.3             13          3.6
>> SFO/NYC/LHR   12.7             11.3        3.7
>>
>> XXX/YYY/ZZZ a three-way rotation of XXX, YYY, and ZZZ. Obviously, mean
>> and median are intended to be some sort of aggregate measure of travel
>> time. I don't have any way to measure "fairness", but SD is intended
>> as some metric of the variation in travel time between attendees.
>>
>> The raw data and software are attached. The files are:
>>
>>  home-airports     -- the list of people's home airports
>>  durations.txt     -- the list of airport-airport durations
>>  doodle.txt        -- the attendees list
>>  pairings.py       -- the software to compute travel times
>>  doodle-out.txt -- the computed travel times for each attendee
>>
>> Obviously, there could be an error in the raw data or the software.
>> Please feel free to send corrections, especially if you find
>> something material.
>>
>>
>> OBSERVATIONS
>> Obviously, it's hard to know what the optimal solution is without
>> some model for optimality, but we can still make some observations
>> based on this data:
>>
>> 1. If we're just concerned with minimizing total travel time, then we
>> would always in New York, since it has both the shortest mean travel
>> time and the shortest median travel time, but as I said above, this
>> arguably isn't fair to people who live either in Europe or California,
>> since they always have to travel.
>>
>> 2. Combining West Coast, East Coast, and European venues has
>> comparable (or at least not too much worse) mean/median values than
>> NYC with much lower SDs. So, arguably that kind of mix is more fair.
>>
>> 3. There's a pretty substantial difference between hub and non-hub
>> venues. In particular, LHR has a median travel time 7 hours less than
>> ARN, and the SFO/NYC/LHR combination has a median/mean travel time
>> about 2 hours less than SFO/BOS/ARN (primarily accounted for by the
>> LHR/ARN difference). [Full disclosure, I've favored Star Alliance hubs
>> here, but you'd probably get similar results if, for instance, you
>> used AMS instead of LHR.]
>>
>>
>> Obviously, your mileage may vary based on your location and feelings
>> about what's fair, but based on this data, it looks to me like a
>> three-way rotation between West Coast, East Coast, and European hubs
>> offers a good compromise between minimum cost and a flat distribution
>> of travel times.
>>
>> Personally, whatever we decide to do I'd ask that the WG settle now on
>> a pattern going forward so that we can predictably budget our travel
>> time and dollars.
>>
>>
>> [0] Treating all three NYC airports as a single location.
>>
>> _______________________________________________
>> rtcweb mailing list
>> rtcweb@ietf.org
>> https://www.ietf.org/mailman/listinfo/rtcweb
>>