Re: IETF Service Outage

Glen <glen@amsl.com> Wed, 23 March 2016 13:27 UTC

Return-Path: <glen@amsl.com>
X-Original-To: ietf@ietfa.amsl.com
Delivered-To: ietf@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 07CD912D635 for <ietf@ietfa.amsl.com>; Wed, 23 Mar 2016 06:27:56 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -104.231
X-Spam-Level:
X-Spam-Status: No, score=-104.231 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H4=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_WHITELIST=-100] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id e52mvVuC4a8Q for <ietf@ietfa.amsl.com>; Wed, 23 Mar 2016 06:27:53 -0700 (PDT)
Received: from mail.amsl.com (c8a.amsl.com [4.31.198.40]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 732D212D6B4 for <ietf@ietf.org>; Wed, 23 Mar 2016 06:13:45 -0700 (PDT)
Received: from mail.amsl.com (localhost [127.0.0.1]) by c8a.amsl.com (Postfix) with ESMTPS id 042CB1E5D58 for <ietf@ietf.org>; Wed, 23 Mar 2016 06:13:22 -0700 (PDT)
Received: from mail-ob0-f172.google.com (mail-ob0-f172.google.com [209.85.214.172]) by c8a.amsl.com (Postfix) with ESMTPSA id D05BC1E5D4F for <ietf@ietf.org>; Wed, 23 Mar 2016 06:13:21 -0700 (PDT)
Received: by mail-ob0-f172.google.com with SMTP id xj3so11721176obb.0 for <ietf@ietf.org>; Wed, 23 Mar 2016 06:13:45 -0700 (PDT)
X-Gm-Message-State: AD7BkJJhequu9KNUcHliEI/FLGC/GuIOyUlJLNVtRyBI4EhhAnsPCW2KlGDJLDepVJhvoxXdFr6evcMrA7ATjw==
X-Received: by 10.60.227.231 with SMTP id sd7mr1322491oec.77.1458738812167; Wed, 23 Mar 2016 06:13:32 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.202.206.5 with HTTP; Wed, 23 Mar 2016 06:13:12 -0700 (PDT)
In-Reply-To: <56F29132.4010109@cisco.com>
References: <CABL0ig4s9PBVS5icz4y=M2nJXtcPH9iVFEptooLsRUCnUjprMg@mail.gmail.com> <56F29132.4010109@cisco.com>
From: Glen <glen@amsl.com>
Date: Wed, 23 Mar 2016 06:13:12 -0700
X-Gmail-Original-Message-ID: <CABL0ig468ZDB3tAA0C0NPyP553CH8Knq4bzobD_Pt2p3rE35uQ@mail.gmail.com>
Message-ID: <CABL0ig468ZDB3tAA0C0NPyP553CH8Knq4bzobD_Pt2p3rE35uQ@mail.gmail.com>
Subject: Re: IETF Service Outage
To: Eliot Lear <lear@cisco.com>
Content-Type: text/plain; charset=UTF-8
Archived-At: <http://mailarchive.ietf.org/arch/msg/ietf/gQmQvYh8_Atwz_2NOP1m6MJRSVE>
Cc: ietf <ietf@ietf.org>
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.17
Precedence: list
Reply-To: glen@amsl.com
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/ietf/>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 23 Mar 2016 13:27:56 -0000

On Wed, Mar 23, 2016 at 5:50 AM, Eliot Lear <lear@cisco.com> wrote:
> I'm curious- do you have sufficient data to perform an analysis to
> determine the source of the service degradation?

Hi Eliot -

We're still looking, but, at the moment, no, it does not appear that way.

Our servers run Linux, our switches are Cisco, and log levels on
everything are quite high.  Our logs are full of noise about all kinds
of other, unrelated, server activities; but, as often seems to be the
case, the logs are silent about things related to this issue.  To my
great frustration.

Our engineers are always looking at additional ways of monitoring
things, but whether this was some kind of denial-of-service attack
against a physical host, or an OS failure of some kind, or maybe just
a bad network cable, we can't yet tell.

When things like this happen unexpectedly, we only want to take a
minimal amount of time to try and perform testing mid-event.  Having
done a number of tests, checks, and localized reset procedures, we
were just about to do a reboot of the physical host when the network
just came back to normal, all by itself.  So, this time, so far, I am
unable to determine a source, which, of course, is probably the most
frustrating outcome possible.

Glen
Glen Barney
IT Director
AMS (IETF Secretariat)