Re: [Tzdist] Fwd: [tzdist] #32 (service): managing historical data

Paul Eggert <eggert@cs.ucla.edu> Sun, 14 December 2014 20:13 UTC

Return-Path: <eggert@cs.ucla.edu>
X-Original-To: tzdist@ietfa.amsl.com
Delivered-To: tzdist@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 2EF2B1A0172 for <tzdist@ietfa.amsl.com>; Sun, 14 Dec 2014 12:13:01 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -3.211
X-Spam-Level:
X-Spam-Status: No, score=-3.211 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, GB_SUMOF=1, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id Xfv9bbtkU9Sx for <tzdist@ietfa.amsl.com>; Sun, 14 Dec 2014 12:13:00 -0800 (PST)
Received: from smtp.cs.ucla.edu (smtp.cs.ucla.edu [131.179.128.62]) by ietfa.amsl.com (Postfix) with ESMTP id 2DF651A0167 for <tzdist@ietf.org>; Sun, 14 Dec 2014 12:13:00 -0800 (PST)
Received: from localhost (localhost.localdomain [127.0.0.1]) by smtp.cs.ucla.edu (Postfix) with ESMTP id BA54CA60001; Sun, 14 Dec 2014 12:12:59 -0800 (PST)
X-Virus-Scanned: amavisd-new at smtp.cs.ucla.edu
Received: from smtp.cs.ucla.edu ([127.0.0.1]) by localhost (smtp.cs.ucla.edu [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id h+UDgvzYyOlv; Sun, 14 Dec 2014 12:12:50 -0800 (PST)
Received: from [192.168.1.9] (pool-71-177-17-123.lsanca.dsl-w.verizon.net [71.177.17.123]) by smtp.cs.ucla.edu (Postfix) with ESMTPSA id C34DD39E80BC; Sun, 14 Dec 2014 12:12:50 -0800 (PST)
Message-ID: <548DEF42.8000300@cs.ucla.edu>
Date: Sun, 14 Dec 2014 12:12:50 -0800
From: Paul Eggert <eggert@cs.ucla.edu>
Organization: UCLA Computer Science Department
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0
MIME-Version: 1.0
To: Cyrus Daboo <cyrus@daboo.name>, tzdist@ietf.org
References: <059.5da79d7c9d394e20e3c22513cfe04c33@tools.ietf.org> <5488921B.8020900@lsces.co.uk> <5488973F.7050400@andrew.cmu.edu> <54889C39.1080103@lsces.co.uk> <D2BE5C3BFE11019ECF4BEF62@caldav.corp.apple.com> <5488A6E6.8050903@lsces.co.uk> <CADC+-gTiyJ4QHZT6m3je9M9-ifSELnSWgmgy7iXSWNS+p8pthg@mail.gmail.com> <5488C0EA.8090505@lsces.co.uk> <CADC+-gTgckSe1ca6Sai6RguQid=ReM7bH6K8+dVVFm-YfbpFbA@mail.gmail.com> <5488DA56.2090306@lsces.co.uk> <CADC+-gQN=Qb2y8M-bHnPzMcK8r=xUG-seQ7XzvZwwcWsHpHnBQ@mail.gmail.com> <54895986.6060806@lsces.co.uk> <5489CA90.1070307@gmail.com> <35BC5886C9A58F866E8A46A8@caldav.corp.apple.com> <5489D9F7.3080207@gmail.com> <D196D63077FEC1B090DF7C86@caldav.corp.apple.com> <5489F79E.4080909@gmail.com> <BC19CC6916DC0E59CA63737D@caldav.corp.apple.com> <548B929C.3010505@gmail.com> <548C04F8.30005@lsces.co.uk> <D0F712C2A7EF425A8887E231@cyrus.local> <548C8535.5060508@cs.ucla.edu> <5F5A838C3A2D2B962EB84B92@cyrus.local>
In-Reply-To: <5F5A838C3A2D2B962EB84B92@cyrus.local>
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/tzdist/pIs7ROLp235DeneQ020Z5h2iBLc
Subject: Re: [Tzdist] Fwd: [tzdist] #32 (service): managing historical data
X-BeenThere: tzdist@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <tzdist.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tzdist>, <mailto:tzdist-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tzdist/>
List-Post: <mailto:tzdist@ietf.org>
List-Help: <mailto:tzdist-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tzdist>, <mailto:tzdist-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sun, 14 Dec 2014 20:13:01 -0000

Cyrus Daboo wrote:
> In that kind of environment, yes, every byte does count, because the sum of all
> those extra bytes across the full set of users/devices is huge.

In that case I'm puzzled why VTIMEZONE was chosen as the format for data, as 
it's pretty fat.  For America/Los_Angeles, vzic generates a VTIMEZONE format 
containing 4004 bytes, compared to the 1058 bytes that Android uses for tz 
binary format.  If the problem is that VTIMEZONE is bloated, perhaps a better 
solution would be to recommend a more-compact data format.

That being said, I'm still puzzled as to why shrinking the byte count is so 
important that we should complicate the protocol to support it.  As a 
back-of-the-envelope calculation, suppose a VTIMEZONE zone is 4000 bytes and we 
make the same assumptions as before for everything else (32-bit native tz binary 
data is 1100 bytes, the tzdist request is 200 bytes, the protocol overhead is 
600 bytes for both request and response, and there are 10 updates per year). 
Then we see the following per-client costs over time:

0.0137 b/s VTIMEZONE full copy each request
0.0063 b/s 32-bit tz binary full copy each request
0.0035 b/s tzdist version-style no-change handshake
0.0030 b/s? ETag-style no-change handshake? (not clear what's involved here)

To give us some perspective on this, if we assume a single reasonably-cheap 
server can process 300 Mb/s sustained, then a single tzdist server could support 
21 billion tzdist clients each using the least-efficient protocol mentioned 
above.  That's pretty scalable: one cheapish server should be able to handle the 
entire planet and then some.  So why are we worrying so much about making every 
byte count?  Shouldn't it be more important to keep things simple?