Re: [Tzdist] Fwd: [tzdist] #32 (service): managing historical data

Cyrus Daboo <cyrus@daboo.name> Mon, 15 December 2014 19:15 UTC

Return-Path: <cyrus@daboo.name>
X-Original-To: tzdist@ietfa.amsl.com
Delivered-To: tzdist@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 6C1281A876A for <tzdist@ietfa.amsl.com>; Mon, 15 Dec 2014 11:15:46 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.91
X-Spam-Level:
X-Spam-Status: No, score=-1.91 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001, T_RP_MATCHES_RCVD=-0.01] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id n7aiqzJUGPCV for <tzdist@ietfa.amsl.com>; Mon, 15 Dec 2014 11:15:42 -0800 (PST)
Received: from daboo.name (daboo.name [173.13.55.49]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 0877C1A8764 for <tzdist@ietf.org>; Mon, 15 Dec 2014 11:15:42 -0800 (PST)
Received: from localhost (localhost [127.0.0.1]) by daboo.name (Postfix) with ESMTP id 3450262BB9F; Mon, 15 Dec 2014 14:15:41 -0500 (EST)
X-Virus-Scanned: amavisd-new at example.com
Received: from daboo.name ([127.0.0.1]) by localhost (daboo.name [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id zUhGAnNQCkE8; Mon, 15 Dec 2014 14:15:40 -0500 (EST)
Received: from caldav.corp.apple.com (unknown [17.45.162.46]) by daboo.name (Postfix) with ESMTPSA id 4368C62BB90; Mon, 15 Dec 2014 14:15:38 -0500 (EST)
Date: Mon, 15 Dec 2014 14:15:35 -0500
From: Cyrus Daboo <cyrus@daboo.name>
To: Paul Eggert <eggert@cs.ucla.edu>, tzdist@ietf.org
Message-ID: <A895EB0A0055A08DA80F49B6@caldav.corp.apple.com>
In-Reply-To: <548DEF42.8000300@cs.ucla.edu>
References: <059.5da79d7c9d394e20e3c22513cfe04c33@tools.ietf.org> <5488921B.8020900@lsces.co.uk> <5488973F.7050400@andrew.cmu.edu> <54889C39.1080103@lsces.co.uk> <D2BE5C3BFE11019ECF4BEF62@caldav.corp.apple.com> <5488A6E6.8050903@lsces.co.uk> <CADC+-gTiyJ4QHZT6m3je9M9-ifSELnSWgmgy7iXSWNS+p8pthg@mail.gmail.com> <5488C0EA.8090505@lsces.co.uk> <CADC+-gTgckSe1ca6Sai6RguQid=ReM7bH6K8+dVVFm-YfbpFbA@mail.gmail.com> <5488DA56.2090306@lsces.co.uk> <CADC+-gQN=Qb2y8M-bHnPzMcK8r=xUG-seQ7XzvZwwcWsHpHnBQ@mail.gmail.com> <54895986.6060806@lsces.co.uk> <5489CA90.1070307@gmail.com> <35BC5886C9A58F866E8A46A8@caldav.corp.apple.com> <5489D9F7.3080207@gmail.com> <D196D63077FEC1B090DF7C86@caldav.corp.apple.com> <5489F79E.4080909@gmail.com> <BC19CC6916DC0E59CA63737D@caldav.corp.apple.com> <548B929C.3010505@gmail.com> <548C04F8.30005@lsces.co.uk> <D0F712C2A7EF425A8887E231@cyrus.local> <548C8535.5060508@cs.ucla.edu> <5F5A838C3A2D2B962EB84B92@cyrus.local> <548DEF42.8000300@cs.ucla.edu>
X-Mailer: Mulberry/4.1.0b1 (Mac OS X)
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; format="flowed"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; size="4058"
Archived-At: http://mailarchive.ietf.org/arch/msg/tzdist/uc4gBTnhKI3Z-w2KEMwB1xT8yhw
Subject: Re: [Tzdist] Fwd: [tzdist] #32 (service): managing historical data
X-BeenThere: tzdist@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <tzdist.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tzdist>, <mailto:tzdist-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tzdist/>
List-Post: <mailto:tzdist@ietf.org>
List-Help: <mailto:tzdist-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tzdist>, <mailto:tzdist-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 15 Dec 2014 19:15:46 -0000

Hi Paul,

--On December 14, 2014 at 12:12:50 PM -0800 Paul Eggert 
<eggert@cs.ucla.edu> wrote:

>> In that kind of environment, yes, every byte does count, because the sum
>> of all those extra bytes across the full set of users/devices is huge.
>
> In that case I'm puzzled why VTIMEZONE was chosen as the format for data,
> as it's pretty fat.  For America/Los_Angeles, vzic generates a VTIMEZONE
> format containing 4004 bytes, compared to the 1058 bytes that Android
> uses for tz binary format.  If the problem is that VTIMEZONE is bloated,
> perhaps a better solution would be to recommend a more-compact data
> format.

VTIMEZONE is used because the first consumers of this protocol are 
iCalendar-based products (and as has been stated before it was those 
products that started this whole thing off by first implementing their own 
tzdist-like behavior in a proprietary manner, with the subsequent 
realization that an interoperable protocol would be better for all). Yes it 
could be considered bloated compared to some binary representation, however 
it is an interoperable standard format that exists today.

> That being said, I'm still puzzled as to why shrinking the byte count is
> so important that we should complicate the protocol to support it.

I think the protocol as it stands is far from complicated. It relies on 
ETags (which are a fundamental element of HTTP that nearly all devices 
support in their HTTP stacks), and the dtstamp/changedsince behavior, that 
I am sure all the other implementors of the current protocol will claim, 
like me, is trivial to implement and provides an incremental update 
mechanism that is more efficient at very little added cost.

The complexity is all being added by the apparent need for versioning. I 
still contend that is unnecessary for tzdist. It is a problem for the 
consumers of the tzdist data and tzdist itself provides sufficient 
meta-data for them to deal with that.

> As a
> back-of-the-envelope calculation, suppose a VTIMEZONE zone is 4000 bytes
> and we make the same assumptions as before for everything else (32-bit
> native tz binary data is 1100 bytes, the tzdist request is 200 bytes, the
> protocol overhead is 600 bytes for both request and response, and there
> are 10 updates per year). Then we see the following per-client costs over
> time:
>
> 0.0137 b/s VTIMEZONE full copy each request
> 0.0063 b/s 32-bit tz binary full copy each request
> 0.0035 b/s tzdist version-style no-change handshake
> 0.0030 b/s? ETag-style no-change handshake? (not clear what's involved
> here)
>
> To give us some perspective on this, if we assume a single
> reasonably-cheap server can process 300 Mb/s sustained, then a single
> tzdist server could support 21 billion tzdist clients each using the
> least-efficient protocol mentioned above.  That's pretty scalable: one
> cheapish server should be able to handle the entire planet and then some.
> So why are we worrying so much about making every byte count?  Shouldn't
> it be more important to keep things simple?

Scaleability involves a lot more than just the server by itself, there are 
many other issues related to intermediaries, to efficiency on the client 
side (where battery drain on devices can be a show stopper), security 
requirements, etc. The systems I have built have not had to scale much 
beyond 100K's of users, but I work with others who do build out to 10's 
millions or more. Simple back of the envelope calculations don't suffice - 
there are far too many variables involved - but the efficiency of the base 
protocol itself is one major aspect that we do have control over when we 
design it. Even in the systems I have built, protocol efficiency has been 
one of the biggest pieces to deal with in terms of scaleability.

I think, as described earlier, what we have right now is a pretty lean 
protocol that provides a reasonable path to scaleable implementations, and 
we have extensibility to go further if what we have done to date is not 
sufficient for the very large providers.


-- 
Cyrus Daboo