Outage analysis and report
Glen <glen@amsl.com> Tue, 28 January 2020 23:56 UTC
Return-Path: <glen@amsl.com>
X-Original-To: ietf-announce@ietfa.amsl.com
Delivered-To: ietf-announce@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id C80C2120100 for <ietf-announce@ietfa.amsl.com>; Tue, 28 Jan 2020 15:56:53 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.2
X-Spam-Level:
X-Spam-Status: No, score=-104.2 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id BZVZfC6jasBX for <ietf-announce@ietfa.amsl.com>; Tue, 28 Jan 2020 15:56:52 -0800 (PST)
Received: from mail.amsl.com (c8a.amsl.com [4.31.198.40]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 442541200FF for <ietf-announce@ietf.org>; Tue, 28 Jan 2020 15:56:52 -0800 (PST)
Received: from mail.amsl.com (localhost [127.0.0.1]) by c8a.amsl.com (Postfix) with ESMTPS id 87FFE2034C0 for <ietf-announce@ietf.org>; Tue, 28 Jan 2020 15:56:18 -0800 (PST)
Received: from mail-oi1-f178.google.com (mail-oi1-f178.google.com [209.85.167.178]) by c8a.amsl.com (Postfix) with ESMTPSA id 680AB2034BF for <ietf-announce@ietf.org>; Tue, 28 Jan 2020 15:56:18 -0800 (PST)
Received: by mail-oi1-f178.google.com with SMTP id b18so8288238oie.2 for <ietf-announce@ietf.org>; Tue, 28 Jan 2020 15:56:52 -0800 (PST)
X-Gm-Message-State: APjAAAUhgYMGYjJRuvJR+LxO18a6BDoQvSxN88DjmgtHPf2G/y4o4pI1 xPrL9xPFDwV3PxNGmIpXo+tVY1KRkabD3lbh9HU=
X-Google-Smtp-Source: APXvYqx04LOvvUyKOJih2Gi47LXJkBu/900wKDUbkefv/KJP+IS7FQNrz70cEyb+lHyVLik2Jr9TEZoVwSi8kzzabtI=
X-Received: by 2002:a05:6808:6d6:: with SMTP id m22mr4405741oih.138.1580255811440; Tue, 28 Jan 2020 15:56:51 -0800 (PST)
MIME-Version: 1.0
From: Glen <glen@amsl.com>
Date: Tue, 28 Jan 2020 15:56:40 -0800
X-Gmail-Original-Message-ID: <CABL0ig5G0K+ULxAHjLcXw6LicBHutdeOckJ==QMy6=kLZOOhoA@mail.gmail.com>
Message-ID: <CABL0ig5G0K+ULxAHjLcXw6LicBHutdeOckJ==QMy6=kLZOOhoA@mail.gmail.com>
Subject: Outage analysis and report
To: ietf-announce@ietf.org
Content-Type: text/plain; charset="UTF-8"
Archived-At: <https://mailarchive.ietf.org/arch/msg/ietf-announce/Zo3YbZJVpM74fioJ1P6fVKs2NTc>
X-BeenThere: ietf-announce@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "IETF announcement list. No discussions." <ietf-announce.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf-announce>, <mailto:ietf-announce-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf-announce/>
List-Post: <mailto:ietf-announce@ietf.org>
List-Help: <mailto:ietf-announce-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf-announce>, <mailto:ietf-announce-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 28 Jan 2020 23:56:54 -0000
Dear IETF Community - As you know, about 30 hours ago, we moved the IETF to a new server, containing upgraded OS, software, and a new Python version. As a part of that process, the Tools Team moved the Datatracker and their other software to Python 3. 14 hours ago, that new server suffered a significant data loss. Henrik was online shortly after the data loss occurred, and called me immediately. Investigation determined that the loss was caused by a command in the daily Datatracker cron script. One rsync command in that script is designed to make iana yang- parameters available to the Datatracker. After the upgrade to Python 3, that script generated a bad command-line argument, resulting in the rsync command running with an incorrect target and incorrectly deleting server data. The missing data then caused the Datatracker, the Mail Archive, and other tools to malfunction or fail. We of course operate a number of hot backup servers, all of which dutifully picked up the data changes immediately. Fortunately, just prior to that script's execution, one of AMS' offsite backup systems had grabbed a complete copy of the data on the new server. So, with the exception of approximately 2 hours of traffic during which the operating servers were impaired (roughly 0815-1015 GMT Tuesday morning) , no other data was lost. However, given the estimated time to restore that data (3-4 hours over the Internet), and given that there could be other unknowns in the software we hadn't yet identified, the optimal course was clearly to bring the old server group back online, which I did, restoring service using the old server approximately 3 hours after the problem started. At this time, AMS programmers are working on merging yesterday's Mail archive data into the live archive, while the Tools Team members are working on merging drafts and Datatracker data and other information they manage into the live system. We will send an update when this process is completed, and we will then consult with IETF leadership and schedule a new cutover event in the near future. Thank you for your patience. Glen -- Glen Barney IT Director AMS (IETF Secretariat)