[Tools-implementation] Revisiting whether we should continue using Docker as we currently do.

Robert Sparks <rjsparks@nostrum.com> Wed, 09 September 2020 16:21 UTC

Return-Path: <rjsparks@nostrum.com>
X-Original-To: tools-implementation@ietfa.amsl.com
Delivered-To: tools-implementation@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 5B8413A0522 for <tools-implementation@ietfa.amsl.com>; Wed, 9 Sep 2020 09:21:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.079
X-Spam-Level:
X-Spam-Status: No, score=-2.079 tagged_above=-999 required=5 tests=[BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, T_SPF_HELO_PERMERROR=0.01, T_SPF_PERMERROR=0.01, URIBL_BLOCKED=0.001] autolearn=ham autolearn_force=no
Authentication-Results: ietfa.amsl.com (amavisd-new); dkim=pass (1024-bit key) header.d=nostrum.com
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2tT3emCnrDwv for <tools-implementation@ietfa.amsl.com>; Wed, 9 Sep 2020 09:21:05 -0700 (PDT)
Received: from nostrum.com (raven-v6.nostrum.com [IPv6:2001:470:d:1130::1]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 345A33A0476 for <tools-implementation@ietf.org>; Wed, 9 Sep 2020 09:21:05 -0700 (PDT)
Received: from unescapeable.local ([47.186.30.41]) (authenticated bits=0) by nostrum.com (8.16.1/8.15.2) with ESMTPSA id 089GL301034120 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for <tools-implementation@ietf.org>; Wed, 9 Sep 2020 11:21:04 -0500 (CDT) (envelope-from rjsparks@nostrum.com)
DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=nostrum.com; s=default; t=1599668464; bh=zO8vwNz5/Y4KgGYs12JcRF56/IH0YGxM7ywrSKOGiQI=; h=To:From:Subject:Date; b=mmfbPTjHv1XeS7E/xtFpQ2Bu15USzXjRfcEnzyT5gy6O1y+Ko3jkxcr9tVeSWKywz IfYXY//k0yUeaGnKJzcsbuFtAdNQlpDnkcIwKb9S9OS59JqD3WAUktAstDNhF4It9h O8qwYpYCSoPXrd7zXgc9RqkHmFJb7ZEtDRizkYOo=
X-Authentication-Warning: raven.nostrum.com: Host [47.186.30.41] claimed to be unescapeable.local
To: "tools-implementation@ietf.org" <tools-implementation@ietf.org>
From: Robert Sparks <rjsparks@nostrum.com>
Message-ID: <3ae28788-898a-de72-22b6-b0f036d1b23a@nostrum.com>
Date: Wed, 09 Sep 2020 11:21:03 -0500
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.12.0
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"; format="flowed"
Content-Transfer-Encoding: 7bit
Content-Language: en-US
Archived-At: <https://mailarchive.ietf.org/arch/msg/tools-implementation/0v1Eb4Yl1pArRVtgY3Ci2O_QOxU>
Subject: [Tools-implementation] Revisiting whether we should continue using Docker as we currently do.
X-BeenThere: tools-implementation@ietf.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Tools Implementation <tools-implementation.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-implementation>, <mailto:tools-implementation-request@ietf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/tools-implementation/>
List-Post: <mailto:tools-implementation@ietf.org>
List-Help: <mailto:tools-implementation-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-implementation>, <mailto:tools-implementation-request@ietf.org?subject=subscribe>
X-List-Received-Date: Wed, 09 Sep 2020 16:21:09 -0000

A few weeks ago, shortly after the yc.o team's work managed to crash the 
host they were working on through changing settings in a container, I 
proposed that we unroll the things we currently have in containers on 
production. At the time, Glen suggested we not do that. I'd like to ask 
the question again - I think we may have more to consider.

1. I remain uneasy about docker's implementation on OpenSuse. The 
container crash above, the issues we've run into with containers locking 
(and sometimes causing processes talking to them like apache) to hang 
are suspicious. That we've not been able to pin down what's really going 
on suggests to me the issue is in a place we can't really look, inside 
docker's interstitial networking or filesystem abstraction code perhaps.

2. Many of the containers we have (and in particular the one for the 
website) really need to be designed differently if they are going to 
remain deployed as containers. The amount of file-system mapping they do 
is not what the docker architects expect as a normal use-case. Mapping 
sockets in the way we do is also likely not something they focus on testing.

3. Docker is making Glen uncomfortable and the benefit for him 
(operationally) of the containerization is not proportional to the extra 
problems it is bringing.

So I again suggest that we unroll for the production deploys, at least 
for now. I think we can unroll everything at this point, but there might 
still be a hitch in unrolling the trac instances. Henrik - could you 
remind me what our thinking was with respect to those?

I do plan to keep up the pressure to have containerized versions of 
these services - this isn't a call to abandon Docker - but I suggest we 
need to change how we're currently using it.

RjS