[109all] NOC update #2

Sean Croghan <sean@xagsolutions.com> Thu, 19 November 2020 01:56 UTC

Return-Path: <sean@xagsolutions.com>
X-Original-To: 109all@ietfa.amsl.com
Delivered-To: 109all@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id E76C83A02BD for <109all@ietfa.amsl.com>; Wed, 18 Nov 2020 17:56:08 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -1.9
X-Spam-Level:
X-Spam-Status: No, score=-1.9 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id eKE5DZjT7YM6 for <109all@ietfa.amsl.com>; Wed, 18 Nov 2020 17:56:06 -0800 (PST)
Received: from NAM11-DM6-obe.outbound.protection.outlook.com (mail-dm6nam11on2117.outbound.protection.outlook.com [40.107.223.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 30AEC3A02BB for <109all@ietf.org>; Wed, 18 Nov 2020 17:56:05 -0800 (PST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=P6w52vNzQRT1Ha4mrESVB8BXwfYcvQfnSlNYTf2gF6XVm2EolCFYR28MKbTKKnfui75caCWhJ9J65oYUKk384jUkXnL1PESJgB1NCy/t1T+5XJ0Fw86XRjR3QxC6p+JWyz0zl0xGrP4i7eWBYsRPgzKy/nZdOmVVIE8wcZibNyEemANQO7BjHRP6LNRSTH+MicABeJ5BKF8cVE0A48hkB1XzGIHVQpXe+Z3uMegBOo0+P6Xi3IUeIdG+tusnDljeEQHecxZoVubR4Dg8t7W1RZfvj9LgT1q6uZxkAYKlIszDUq9sp3w5N7eGYPyet+hHVutVmK3YGGieC1gtIJZ9zg==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=IYPP9j1/6qKvwHcFCFV94e1PKBhKoFARiAUXKVTwwUg=; b=O1ZTBAOzAtDAgOClmiTFkfHaiVv/sxSy4poj/3qU1Cvro8Tlb+B9cOkL2Y8rMpoawmjboq+4M+C0P+wsoCZswQ3etBh2nFbqOsxhyM2ntHaJ9dfWF1MBicLTlG4ERuxDaczP0hpVeGUHFzfnp1YjPnxSWhJ8LWMvidp/iwxl7ugb4kV1IdzbPX8Imr+9O8bfqd7PSaVrcGIFYLBa9HRzfG8+v6HTtzSAjouq0u4nzEk/EIDnaTgMIdxev9gmpPpb0SgtYs8O7tUabieoSRRqSod4lDWZLeGuenx63gIZNTD1MYN4bspk/PocWFJDp4ol9D/Jat3O0UWoLYnmZj3Ing==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=xagsolutions.com; dmarc=pass action=none header.from=xagsolutions.com; dkim=pass header.d=xagsolutions.com; arc=none
Received: from BYAPR06MB4022.namprd06.prod.outlook.com (2603:10b6:a02:8c::14) by BYAPR06MB5464.namprd06.prod.outlook.com (2603:10b6:a03:de::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3564.28; Thu, 19 Nov 2020 01:56:02 +0000
Received: from BYAPR06MB4022.namprd06.prod.outlook.com ([fe80::f801:8798:1298:c3a2]) by BYAPR06MB4022.namprd06.prod.outlook.com ([fe80::f801:8798:1298:c3a2%5]) with mapi id 15.20.3477.038; Thu, 19 Nov 2020 01:56:02 +0000
From: Sean Croghan <sean@xagsolutions.com>
To: "109all@ietf.org" <109all@ietf.org>
Thread-Topic: NOC update #2
Thread-Index: AQHWvhciyF3j6fX0GE69jme3DvNO7A==
Date: Thu, 19 Nov 2020 01:56:02 +0000
Message-ID: <F2645AEB-4CFD-4629-9463-AF6DA019DFB7@xagsolutions.com>
References: <1A8D2B8D-CC33-4430-B4FB-61995B3CF5EF@xagsolutions.com>
In-Reply-To: <1A8D2B8D-CC33-4430-B4FB-61995B3CF5EF@xagsolutions.com>
Reply-To: "109attendees@ietf.org" <109attendees@ietf.org>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-mailer: Apple Mail (2.3608.120.23.2.4)
authentication-results: ietf.org; dkim=none (message not signed) header.d=none;ietf.org; dmarc=none action=none header.from=xagsolutions.com;
x-originating-ip: [50.53.103.206]
x-ms-publictraffictype: Email
x-ms-office365-filtering-correlation-id: 0692ef4b-9648-4c5f-df4d-08d88c2e44cb
x-ms-traffictypediagnostic: BYAPR06MB5464:
x-microsoft-antispam-prvs: <BYAPR06MB546455001CE052D4B6CACAD3B8E00@BYAPR06MB5464.namprd06.prod.outlook.com>
x-ms-oob-tlc-oobclassifiers: OLM:10000;
x-ms-exchange-senderadcheck: 1
x-microsoft-antispam: BCL:0;
x-microsoft-antispam-message-info: SYMxVjpR3ZIRTJGOWi0efRRRyXQ3b+B3QMA9Ylup8aP+sfw9klqNQQUKsUsD6727nKpJZf0QGuj6ruQwB5Uf2qHnL3o9m8HylQE3hpSEMfjo3BbgHvp0K+aSIcTEEez4LdmLJAb8D+n4AzqTUM5OsH9EIlPb/sUAHM074+1iU6X4FdkZUvCLHwVcMCQw3/T4QG7ohadCTBu/2DCsZxmjXbMT5I4SSgfegDLbyMDxAowodxk523ZNpaSXHB45VcZ5m0gSP1papW+/XCqN1UNnNtvjXsS9rD0ZhwplSgSY7TTm9qiibYK6ccuTC32OusH5buiuN0n9PmwFjrts5VLeSHhJHWpWihVTjEtlkPnEq5t6Wg5sBybltFUeELHl+dd0q8pMdSoXNqaGqsZFor33vw==
x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR06MB4022.namprd06.prod.outlook.com; PTR:; CAT:NONE; SFS:(346002)(39830400003)(366004)(376002)(396003)(136003)(2906002)(8676002)(316002)(36756003)(86362001)(33656002)(15650500001)(66946007)(53546011)(6506007)(66446008)(66806009)(8936002)(186003)(6512007)(26005)(2616005)(5660300002)(966005)(71200400001)(8796002)(91956017)(6916009)(64756008)(66476007)(76116006)(66556008)(478600001)(6486002)(7116003)(83380400001); DIR:OUT; SFP:1102;
x-ms-exchange-antispam-messagedata: vRtHo2AkUBHVTxyPGo/Gg4VvkeWVCEP7JpdsXp6P+taPqXkiVpEkTCZodY0E1qosuZLhkHm1DIIzwrSejPECDZ1G4Q3odVInG4HdXctoRh/42MM+peZlLiNCYz8vZ3xFlbruQx3h7kbJEpH+F1giYKxQBuzYBDAvjyVykQDTT0+PAxRKERAIa7BN/3BXGrwCtJjOmPivvwD5rsunYTNWor3Xs+hh6YqHLWWPVfHe6LKhMQjtBZFNMhdMPvGaaHYfza8UAoiTlB6gefBzLeJxu6crK9OkNyWwU8it3L494Nzcqq9EqtSv8fTDOduA9UmXh+fIwrTcB7mfgnEV3erciefKcPB6IV/aq7b7YAtTqacfDLeCnsTShT7wYSkfLF/nHABBp2t0COzcAuOi5DdyIDCdWpa76VMrad7/NitbSjy/sDXh+Wq63AYRDjT29xRpsZOfk0KkO18LkFoI0yaRz6RiGSZilCXxbWmAjFURtn+5XrMnr1laHYNDLmACp1k+Bhg3U0N1pvN7q0fryBdD7N332eP9tqlLzTwtGRJIoBF+WCerO3CXEEJ0d8AEbC6OFO1BWre7mcUXv7uw10LIx/nZi94rQCs11GzxoF35QnKRGYMBMe765ws/FLQzHcy2nhnFXXshDqN8M1vayMdz+Q==
x-ms-exchange-transport-forked: True
Content-Type: multipart/alternative; boundary="_000_F2645AEB4CFD46299463AF6DA019DFB7xagsolutionscom_"
MIME-Version: 1.0
X-OriginatorOrg: xagsolutions.com
X-MS-Exchange-CrossTenant-AuthAs: Internal
X-MS-Exchange-CrossTenant-AuthSource: BYAPR06MB4022.namprd06.prod.outlook.com
X-MS-Exchange-CrossTenant-Network-Message-Id: 0692ef4b-9648-4c5f-df4d-08d88c2e44cb
X-MS-Exchange-CrossTenant-originalarrivaltime: 19 Nov 2020 01:56:02.2551 (UTC)
X-MS-Exchange-CrossTenant-fromentityheader: Hosted
X-MS-Exchange-CrossTenant-id: 3905f813-0012-41b2-a403-d7a3a748e3c3
X-MS-Exchange-CrossTenant-mailboxtype: HOSTED
X-MS-Exchange-CrossTenant-userprincipalname: KH75/DRWEojGEJGXuHTc7fkJpuRph9a/bmT9NvMYMhebaRtoPpEGGXdvi2Y8dwEw9xcTAbTBPEbW2lTDYqIwdw==
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR06MB5464
Archived-At: <https://mailarchive.ietf.org/arch/msg/109all/U_SNwDokIncatBfNT7742kYJnhY>
Subject: [109all] NOC update #2
X-BeenThere: 109all@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Official communication about IETF 109 <109all.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/109all>, <mailto:109all-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/109all/>
List-Post: <mailto:109all@ietf.org>
List-Help: <mailto:109all-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/109all>, <mailto:109all-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Nov 2020 01:56:09 -0000

As previously reported, we tracked down the cause of the interruption of the iabopen session to an issue with an unexpected Azure network interface removal event on network interfaces provisioned with SR-IOV.  To prevent this happening again we intended to remove SR-IOV networking entirely.  Unfortunately it now transpires that this change did not get applied to 2 of the 16 VMs including the application VM for the Plenary. So to add to the list of reasons to want 2020 to be over, towards the end of Plenary the same network interface removal event occurred and triggered an outage long enough to affect everyone.

I can confirm that the SR-IOV provisioning has now been removed from all VMs, which we believe eliminates the risk of the same thing happening again.  We continue to work with Azure Direct Support to determine the underlying cause of the removal events.

Please let me know if you have any questions.

Sean



On Nov 17, 2020, at 4:56 PM, Sean Croghan  wrote:



I have an update for those of you affected by the outage in yesterdays IABOPEN session. We have isolated this to a interrupt to the virtual machines network interface. We currently have no explanation for this outage. We have engaged the hardware and network team with Azure to determine the cause of this event but do not have an explanation at this time.

I will provide an update when we have received more information.


For those interested in details:

At 07:56:36 UTC the network interface (eth0) went link down and the interface was removed from the VM
At 08:00:28 UTC then a new interface was added to the VM
At 08:00:29 UTC (eth1) went link up

Yes the VM added a new interface. The servers were provisioned with SR-IOV and we suspect that a migration event occurred that moved the VM to different hardware causing the NIC driver to be reloaded. We have found some evidence that would support our theory that a migration or unscheduled maintenance event occurred and are working to verify if that happened during this event. We have removed SR-IOV from the network interfaces on all servers.

I hope you are having a good and productive week


— The IEFT NOC Team

--
109all mailing list
109all@ietf.org<mailto:109all@ietf.org>
https://www.ietf.org/mailman/listinfo/109all