Re: [Tzdist] Fwd: [tzdist] #32 (service): managing historical data

Lester Caine <lester@lsces.co.uk> Tue, 16 December 2014 22:30 UTC

Return-Path: <lester@lsces.co.uk>
X-Original-To: tzdist@ietfa.amsl.com
Delivered-To: tzdist@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id DB14C1A0055 for <tzdist@ietfa.amsl.com>; Tue, 16 Dec 2014 14:30:51 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001] autolearn=ham
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 1ykDWUB0pL5b for <tzdist@ietfa.amsl.com>; Tue, 16 Dec 2014 14:30:42 -0800 (PST)
Received: from mail4.serversure.net (mail4-2.serversure.net [217.147.176.214]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id DAE2A1A0067 for <tzdist@ietf.org>; Tue, 16 Dec 2014 14:30:35 -0800 (PST)
Received: (qmail 13649 invoked by uid 89); 16 Dec 2014 22:30:34 -0000
Received: by simscan 1.3.1 ppid: 13642, pid: 13646, t: 0.2223s scanners: attach: 1.3.1 clamav: 0.96/m:52/d:10677
Received: from unknown (HELO ?10.0.0.8?) (lester@rainbowdigitalmedia.org.uk@86.178.188.220) by mail4.serversure.net with ESMTPA; 16 Dec 2014 22:30:33 -0000
Message-ID: <5490B289.6030602@lsces.co.uk>
Date: Tue, 16 Dec 2014 22:30:33 +0000
From: Lester Caine <lester@lsces.co.uk>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: tzdist@ietf.org
References: <059.5da79d7c9d394e20e3c22513cfe04c33@tools.ietf.org> <5488C0EA.8090505@lsces.co.uk> <CADC+-gTgckSe1ca6Sai6RguQid=ReM7bH6K8+dVVFm-YfbpFbA@mail.gmail.com> <5488DA56.2090306@lsces.co.uk> <CADC+-gQN=Qb2y8M-bHnPzMcK8r=xUG-seQ7XzvZwwcWsHpHnBQ@mail.gmail.com> <54895986.6060806@lsces.co.uk> <5489CA90.1070307@gmail.com> <35BC5886C9A58F866E8A46A8@caldav.corp.apple.com> <5489D9F7.3080207@gmail.com> <D196D63077FEC1B090DF7C86@caldav.corp.apple.com> <5489F79E.4080909@gmail.com> <BC19CC6916DC0E59CA63737D@caldav.corp.apple.com> <548B929C.3010505@gmail.com> <548C04F8.30005@lsces.co.uk> <D0F712C2A7EF425A8887E231@cyrus.local> <548DD49E.2050300@gmail.com> <4316F5E10254D07BBC24E4E1@caldav.corp.apple.com> <548F5B6B.4090702@gmail.com> <CADZyTknKw3zYZgF4Udiu4Ythz3OD2-AXc-96VBrq1GXYtYfu8Q@mail.gmail.com> <9A4B38EEDB927575ED1A77F1@caldav.corp.apple.com> <549093FF.3090708@gmail.com> <59FC7AD7C4EC6D33043E9E3F@caldav.corp.apple.com>
In-Reply-To: <59FC7AD7C4EC6D33043E9E3F@caldav.corp.apple.com>
Content-Type: text/plain; charset="windows-1252"
Content-Transfer-Encoding: 7bit
Archived-At: http://mailarchive.ietf.org/arch/msg/tzdist/1So_9EoGeq4qwkz4xT0MITa6I3o
Subject: Re: [Tzdist] Fwd: [tzdist] #32 (service): managing historical data
X-BeenThere: tzdist@ietf.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: <tzdist.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tzdist>, <mailto:tzdist-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tzdist/>
List-Post: <mailto:tzdist@ietf.org>
List-Help: <mailto:tzdist-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tzdist>, <mailto:tzdist-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Dec 2014 22:30:53 -0000

On 16/12/14 21:33, Cyrus Daboo wrote:
> Hi Doug,
> 
> --On December 16, 2014 at 1:20:15 PM -0700 Doug Royer
> <douglasroyer@gmail.com> wrote:
> 
>>> As I proposed before I want to see the version number be part of the
>>> meta-data associated with the time zone data - i.e., part of the list
>>> action response. I don't want to see it in the time zone data due to
>>> the fact that, with the current manner in which IANA tz data is
>>> "tagged" with a version, the entire set of time zones would be marked
>>> as changed.
>>
>> That point is valid, however as soon as the data is saved on the
>> tzdist-server, and the first fetch is performed, is it not just as likely
>> that it would trigger a everything is out of date when they compare it to
>> 'nothing authoritative known about this TZID yet - so get it' ?
> 
> OK, so when a list action changedsince is done we need to make sure of
> the following:
> 
> 1) The server only reports a time zone as changed if the underlying data
> or meta-data (other than version id) has changed.
The version is not part of the data. And if one is serving a 'modern'
truncated data set then and changes to TZ outside scope of the period
are ignored as well.

> 2) The server always reports the latest versions now in use for each
> publisher.
This is the bit that bothers me.
And it relates to your incremental changes, so there may be several
'versions' uncovered by a list/changedsince. So the server has to
produce a concatenated list based on all the versions between the old
and new version. Not a particular problem with the right caching store,
but another race hazard.

> So #1 allows clients to fetch the updated tzdata and update any
> meta-data associated with a "changed" time zone.
As indicated previously ...  one does not need to provide the whole TZid
period. Only the diff needs to be transmitted and this simplifies the
checking of material which may be using the updated period. Transmitting
the whole data set does then require the client to establish what the
change was and then assess the knock on effects. Discussion on package
size may be mute if processing a change set becomes the bottleneck?

> #2 allows the client to update the version id meta-data for any other
> time zone it has cached, but was not reported in #1.
While the implementation is a matter for the server software, I envision
a system which would have a database of TZid records which is read and
as a TZid record is updated it is added keyed by it's version number.
When one asks for a 'changedsince', the database returns all of the
versions up until latest. At any time one can read a version as a
complete set of TZid's or the diff's for a version number. I envisage
the same server working as the client cache so all that ever needs to be
transferred is the update diff. The problem arises where there are two
changes to a TZid in the window. Operationally one may only need the
second, but later the first one may be important for archival reasons.

> Note that, if a publisher chooses to include a version number in time
> zones they publish, and they ensure those only change when the data
> changes, then having the version embedded in the time zone data works
> fine as part of #1.
I think the problem people are having is that some caching processes do
treat the version as a variable. Code management systems where the
version number is in every file are a right pain when only the odd file
actually changed. As long as the version is only used as a tag to
identify a particular view then the caching system need only transfer
those 'files' which actually have changes. ETag can safely contain a
version since the list/changedsince mechanism will only list TZid's that
have changed. Users can still access any TZid at any version ...
bypassing the list/changedsince mechanism but if the data is unchanged
from the currently stored version then it can be thrown away.

I think it is probably worth pointing out that probably 90% of the
initial data can be installed as currently happens as part of the OS
installation. Then only updates from that base version need to be
supplied by tzdist. The installed version is a base version ID and
tzdist simply starts up with a list/changedsince from that base. But in
future that base install would also include the historic versions from
the start point that tzdist becomes active. The extra historic versions
are unlikely to be more than a percentage of the total data.

-- 
Lester Caine - G8HFL
-----------------------------
Contact - http://lsces.co.uk/wiki/?page=contact
L.S.Caine Electronic Services - http://lsces.co.uk
EnquirySolve - http://enquirysolve.com/
Model Engineers Digital Workshop - http://medw.co.uk
Rainbow Digital Media - http://rainbowdigitalmedia.co.uk