Re: [rtcweb] Data on travel times

Marshall Eubanks <marshall.eubanks@gmail.com> Mon, 09 April 2012 15:36 UTC

Return-Path: <marshall.eubanks@gmail.com>
X-Original-To: rtcweb@ietfa.amsl.com
Delivered-To: rtcweb@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A384C21F874A for <rtcweb@ietfa.amsl.com>; Mon, 9 Apr 2012 08:36:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.582
X-Spam-Level:
X-Spam-Status: No, score=-103.582 tagged_above=-999 required=5 tests=[AWL=0.017, BAYES_00=-2.599, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id LmSMYmrzrkJB for <rtcweb@ietfa.amsl.com>; Mon, 9 Apr 2012 08:36:00 -0700 (PDT)
Received: from mail-lb0-f172.google.com (mail-lb0-f172.google.com [209.85.217.172]) by ietfa.amsl.com (Postfix) with ESMTP id 5CB6B21F86FF for <rtcweb@ietf.org>; Mon, 9 Apr 2012 08:36:00 -0700 (PDT)
Received: by lbok13 with SMTP id k13so2055890lbo.31 for <rtcweb@ietf.org>; Mon, 09 Apr 2012 08:35:59 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=2IAlMV+0la8FHaSuoCcwdDgbYv0Fyy163Lp0goHpqR0=; b=SF0c1ON+TElUTsM8YD45cCLDXTGRtG8PkOXQj0Mo5M8ReJFT0kxj7B6epkvXU1y4ue VDMAq2dVm4IQVtOg032YO+Aeia4/Vn3zTgIyldsVKAaVOUDMx0B8dXkcdlCoDzZ19FkZ IBS4zThIgoh6bEssSsy3nWRJynm0sgVS6Kel7o707rZ6fXfyC0QLEMSqpSYTFQu3vXqA E5tjG/hXB9KwKIUpoc6bYM2t1MRo+DAneKdmo6ydwVChfYEYzf9O6XZfEc6WB8TOGCY8 zeVSDTWpHhRi1d53rawqkZ0VNyz2tu+DQenj3iQiAKyC6kANeO3fsOXgGcfZ99b4GcX7 Kc/g==
MIME-Version: 1.0
Received: by 10.152.105.19 with SMTP id gi19mr12101175lab.11.1333985759187; Mon, 09 Apr 2012 08:35:59 -0700 (PDT)
Received: by 10.112.46.4 with HTTP; Mon, 9 Apr 2012 08:35:59 -0700 (PDT)
In-Reply-To: <CABcZeBPDpguge1zT5JyDk+tohMn1_av4jgdgDhNLnXMFKNzcbg@mail.gmail.com>
References: <CABcZeBPDpguge1zT5JyDk+tohMn1_av4jgdgDhNLnXMFKNzcbg@mail.gmail.com>
Date: Mon, 09 Apr 2012 11:35:59 -0400
Message-ID: <CAJNg7VLfrn_SkTXHQYmR52NP5sxpO-03swiC4RBSDpwgOOt6cg@mail.gmail.com>
From: Marshall Eubanks <marshall.eubanks@gmail.com>
To: Eric Rescorla <ekr@rtfm.com>
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: quoted-printable
Cc: rtcweb@ietf.org, public-webrtc@w3.org
Subject: Re: [rtcweb] Data on travel times
X-BeenThere: rtcweb@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Real-Time Communication in WEB-browsers working group list <rtcweb.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/rtcweb>
List-Post: <mailto:rtcweb@ietf.org>
List-Help: <mailto:rtcweb-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/rtcweb>, <mailto:rtcweb-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 09 Apr 2012 15:36:01 -0000

I really like this analysis. Some questions.

2012/4/9 Eric Rescorla <ekr@rtfm.com>:
> Hi folks,
>
> Since it seems like we're going to be having a large number of
> interims, I thought it might be instructive to try to analyze a bunch
> of different locations to figure out the best strategy. My first cut
> analysis is below.
>
> Note that I'm not trying to make any claims about what the best set of
> venues is. It's obviously easy to figure out any statistic we want
> about each proposed venue, but how you map that data to "best" is up
> to you. In particular, there's some tradeoff between minimal total
> travel time and a "fair" distribution of travel times (not that I
> claim to know what that means).
>
>
> METHODOLOGY
> The data below is derived by treating both people and venues as
> airport locations and using travel time as our primary instrument.
>
> 1. For each responder for the current Doodle poll, assign a home
>   airport based on their draft publication history.  We're missing a
>   few people but basically it should be pretty complete. Since
>   these people responded before the venue is known, it's at
>   least somewhat unbiased.
>
> 2. Compute the shortest advertised flight between each home airport
>   and the locations for each venue by looking at the shortest
>   advertised Kayak flights around one of the proposed interim
>   dates (6/10 - 6/13), ignoring price, but excluding "Hacker fares".
>   [Thanks to Martin Thomson or helping me gather these.]
>

1.) Why are some fields doubled ? I.e.,

ARN SFO 14 13

Are these counted twice ? That would, of course, give more weight to
those records.

2.) At any rate, I couldn't quite match your numbers. For SFO, for
example, I got

# SFO

 Records            29  |
 Mean            12.52  |
 RMS             15.34  |
 Std Dev          8.55  |
 Minimum          1.00  |
 Maximum         34.00  |

This assumes that each doubled entry counts as 2 separate entries. If
the second entries are ignored, I get

# SFO

 Records            21  |
 Mean            14.05  |
 RMS             17.05  |
 Std Dev          9.14  |
 Minimum          1.00  |
 Maximum         34.00  |

If two entries are averaged together (when present)

# SFO
 Records            21  |
 Mean            13.93  |
 RMS             16.97  |
 Std Dev          9.18  |
 Minimum          1.00  |
 Maximum         34.00  |

None of these 3 options match your

Venue         Mean         Median           SD
----------------------------------------------
SFO           13.5             11         12.2

In particular, your SD value seems high.

(Note, I use the SD = root mean square /(n-1) not / n convention, but
that won't explain the difference. )

Regards
Marshall


> This lets us compute statistics for any venue and/or combination
> of venues, based on the candidate attendee list.
>
> The three proposed venues:
>
> - San Francisco (SFO)
> - Boston (BOS)
> - Stockholm (ARN)
>
> Three hubs not too distant from the proposed venues:
>
> - London (LHR)
> - Frankfurt (FRA)
> - New York (NYC) [0]
>
> Also, Calgary (YYC), since the other two chair locations (BOS and SFO)
> were already proposed as venues, and I didn't want Cullen to feel
> left out.
>
>
> RESULTS
> Here are the results for each of the above venues, measured in total
> hours of travel (i.e., round trip).
>
> Venue         Mean         Median           SD
> ----------------------------------------------
> SFO           13.5             11         12.2
> BOS           12.3             11          7.5
> ARN           17.0             21         10.7
> FRA           14.8             17          7.3
> LHR           13.3             14          7.5
> NYC           11.5             11          5.8
> YYC           14.9             13         10.2
> SFO/BOS/ARN   14.3             13          3.6
> SFO/NYC/LHR   12.7             11.3        3.7
>
> XXX/YYY/ZZZ a three-way rotation of XXX, YYY, and ZZZ. Obviously, mean
> and median are intended to be some sort of aggregate measure of travel
> time. I don't have any way to measure "fairness", but SD is intended
> as some metric of the variation in travel time between attendees.
>
> The raw data and software are attached. The files are:
>
>  home-airports     -- the list of people's home airports
>  durations.txt     -- the list of airport-airport durations
>  doodle.txt        -- the attendees list
>  pairings.py       -- the software to compute travel times
>  doodle-out.txt -- the computed travel times for each attendee
>
> Obviously, there could be an error in the raw data or the software.
> Please feel free to send corrections, especially if you find
> something material.
>
>
> OBSERVATIONS
> Obviously, it's hard to know what the optimal solution is without
> some model for optimality, but we can still make some observations
> based on this data:
>
> 1. If we're just concerned with minimizing total travel time, then we
> would always in New York, since it has both the shortest mean travel
> time and the shortest median travel time, but as I said above, this
> arguably isn't fair to people who live either in Europe or California,
> since they always have to travel.
>
> 2. Combining West Coast, East Coast, and European venues has
> comparable (or at least not too much worse) mean/median values than
> NYC with much lower SDs. So, arguably that kind of mix is more fair.
>
> 3. There's a pretty substantial difference between hub and non-hub
> venues. In particular, LHR has a median travel time 7 hours less than
> ARN, and the SFO/NYC/LHR combination has a median/mean travel time
> about 2 hours less than SFO/BOS/ARN (primarily accounted for by the
> LHR/ARN difference). [Full disclosure, I've favored Star Alliance hubs
> here, but you'd probably get similar results if, for instance, you
> used AMS instead of LHR.]
>
>
> Obviously, your mileage may vary based on your location and feelings
> about what's fair, but based on this data, it looks to me like a
> three-way rotation between West Coast, East Coast, and European hubs
> offers a good compromise between minimum cost and a flat distribution
> of travel times.
>
> Personally, whatever we decide to do I'd ask that the WG settle now on
> a pattern going forward so that we can predictably budget our travel
> time and dollars.
>
>
> [0] Treating all three NYC airports as a single location.
>
> _______________________________________________
> rtcweb mailing list
> rtcweb@ietf.org
> https://www.ietf.org/mailman/listinfo/rtcweb
>