Re: Subscriber List Damage

Glen <glen@amsl.com> Tue, 01 July 2008 02:29 UTC

Return-Path: <ietf-bounces@ietf.org>
X-Original-To: ietf-archive@megatron.ietf.org
Delivered-To: ietfarch-ietf-archive@core3.amsl.com
Received: from [127.0.0.1] (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 4EABE3A6B61; Mon, 30 Jun 2008 19:29:58 -0700 (PDT)
X-Original-To: ietf@core3.amsl.com
Delivered-To: ietf@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 87DC03A6B2F; Mon, 30 Jun 2008 19:29:57 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -99.892
X-Spam-Level:
X-Spam-Status: No, score=-99.892 tagged_above=-999 required=5 tests=[AWL=-0.597, BAYES_00=-2.599, FH_RELAY_NODNS=1.451, HELO_MISMATCH_COM=0.553, J_CHICKENPOX_23=0.6, J_CHICKENPOX_63=0.6, RDNS_NONE=0.1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ycEFK-gEJjSH; Mon, 30 Jun 2008 19:29:56 -0700 (PDT)
Received: from mail.amsl.com (mail.amsl.com [IPv6:2001:1890:1112:1::14]) by core3.amsl.com (Postfix) with ESMTP id 76E2E3A6AF6; Mon, 30 Jun 2008 19:29:56 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1]) by thunder2.amsl.com (Postfix) with ESMTP id AE6F6480B4; Mon, 30 Jun 2008 19:30:16 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
Received: from mail.amsl.com ([64.170.98.20]) by localhost (thunder2.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id OF2gLXBaL0j3; Mon, 30 Jun 2008 19:30:16 -0700 (PDT)
Received: from [192.168.1.100] (c-76-103-55-1.hsd1.ca.comcast.net [76.103.55.1]) by thunder2.amsl.com (Postfix) with ESMTP id 23636480B3; Mon, 30 Jun 2008 19:30:16 -0700 (PDT)
Message-ID: <486996AC.2050405@amsl.com>
Date: Mon, 30 Jun 2008 19:30:04 -0700
From: Glen <glen@amsl.com>
User-Agent: Thunderbird 2.0.0.14 (Windows/20080421)
MIME-Version: 1.0
To: Eric Rescorla <ekr@networkresonance.com>
Subject: Re: Subscriber List Damage
References: <48671722.1020502@amsl.com> <486962AA.3000800@cisco.com> <20080701014218.93BD5509A9@romeo.rtfm.com>
In-Reply-To: <20080701014218.93BD5509A9@romeo.rtfm.com>
X-Enigmail-Version: 0.95.6
Cc: sip@ietf.org, IAOC <iaoc@ietf.org>, Michael Thomas <mat@cisco.com>, ietf@ietf.org
X-BeenThere: ietf@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: IETF-Discussion <ietf.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=unsubscribe>
List-Post: <mailto:ietf@ietf.org>
List-Help: <mailto:ietf-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/ietf>, <mailto:ietf-request@ietf.org?subject=subscribe>
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Sender: ietf-bounces@ietf.org
Errors-To: ietf-bounces@ietf.org

Unfortunately I really don't have anything but assumptions about what 
happened at this point.

An increase in TMDA activity caused the system to run out of resources. 
  We saw an extremely high load average and a sudden decrease in kernel 
buffer memory.  Processes started to fail to fork.  Our engineers 
connected in, and rebooted the server.

Upon reboot, we quickly discovered that the IETF list, which we 
moderate, was no longer letting us in with the normal list password.  A 
quick check showed that the database file config.pck was now only about 
50% of its original size, and that many subscribers had been removed - 
and others added.  Passwords and other settings had been "reverted" to 
pre-cutover values.  A comparison with our recent backup of the file 
showed massive differences - not just removals, but additions.  We 
hypothesize that mailman fell back to some type of cached copy of the 
database from 2005, which was also in the directory, and recreated the 
data from that.

Unfortunately, we have no way to verify what happened, and certainly 
don't want to try to cause it again.

I believe our solution is going to be to remove TMDA completely, which, 
I believe will eliminate the side-effects we've been seeing such as this.

I'm happy to discuss this further; however, I do not want to pollute the 
various lists with this.  I'm cc'ing the lists in this case so everyone 
understands we are responding to emails, but I'm going to take any 
further threads off-list to keep the lists focused on their primary 
purposes.

Glen

Eric Rescorla wrote:
> At Mon, 30 Jun 2008 15:48:10 -0700,
> Michael Thomas wrote:
>> 1) Have you brought this up with the mailman folks? I've interacted with
>>     them and they seem like a responsive set of folks. I'm sure that 
>> this sort
>>     of thing would horrify them.
> 
> I agree that this is horrifying.
> 
> More importantly, doesn't this mean that this is a problem we actually
> need a solution for pronto? As I understand Glen's message, he's
> saying that this is a bug in mailman triggered by some problem in
> TMDA. I realize that TMDA is being replaced, but presumably Henrik's
> code isn't perfect, so don't we have to worry about it triggering the
> same behavior?
> 
> Glen, I'm sure there are some people on this list who understand
> mailman well. I realize you may not have complete info, but if you can
> provide us some more information--e.g., what file(s) got stomped and
> which code you think stomped it--about what you think happened, maybe
> they can help track it down?
> 
> -Ekr
> 
> 
>> 2) 3 years since the last backup? Oi.
>>
>>        Mike
>>
>> Glen wrote:
>>> All -
>>>
>>> I was asked by the IAOC to post a message to the IETF and SIP lists, 
>>> to ensure that people were aware that the subscriber lists for the 
>>> IETF and SIP lists were damaged as a result of an anomaly in TMDA and 
>>> Mailman that occurred Thursday night.
>>>
>>> Basically, TMDA misbehaved, and, in the process, caused Mailman to 
>>> encounter a transient failure in the reading of its databases for 
>>> these two lists.  As a result, rather than simply holding the mail and 
>>> retrying it, Mailman decided to discard the current list databases and 
>>> re-create them from 3-year-old data, for both the IETF and the SIP lists.
>>>
>>> *sigh*
>>>
>>> No email was lost to the system or the archives; however, some people 
>>> may have missed some messages, or may still not be resubscribed to the 
>>> list.
>>>
>>> Of course we restored the files from backups; however, we want to make 
>>> sure that everyone gets the mail they missed, and that everyone is 
>>> subscribed to these lists who wishes to be subscribed.
>>>
>>> So...
>>>
>>> If you're reading this message in your email box, you're subscribed to 
>>> the list identified in the subject line, and all should be okay.
>>>
>>> If you're reading this message in the archives, wondering why you're 
>>> not getting list mail, please take a moment to resubscribe yourself to 
>>> the list, which should resolve your problem.
>>>
>>> And regardless, if you feel you missed any mail, we do have the 
>>> archives available for your reference.
>>>
>>> IETF List Subscription Link:  https://www.ietf.org/mailman/listinfo/ietf
>>> IETF List Archive Link:  http://www.ietf.org/mail-archive/web/ietf/
>>>
>>> SIP List Subscription Link:  https://www.ietf.org/mailman/listinfo/sip
>>> SIP List Archive Link:  http://www.ietf.org/mail-archive/web/sip/
>>>
>>> We are in the home stretch of getting TMDA removed and replaced on the 
>>> servers, and I apologize for any inconvenience caused by this issue. 
>>> Because server problems apparently happen only in the dead of night, 
>>> you can be sure that we feel any and all pain anyone may be experiencing.
>>>
>>> If you need any assistance, please contact the IETF Secretariat, using 
>>> the links at:  http://www.ietf.org/secretariat/
>>>
>>> Thank you,
>>> Glen Barney
>>> IT Director
>>> AMS (IETF Secretariat)
>>> _______________________________________________
>>> Ietf mailing list
>>> Ietf@ietf.org
>>> https://www.ietf.org/mailman/listinfo/ietf
>> _______________________________________________
>> Ietf mailing list
>> Ietf@ietf.org
>> https://www.ietf.org/mailman/listinfo/ietf
> 
_______________________________________________
Ietf mailing list
Ietf@ietf.org
https://www.ietf.org/mailman/listinfo/ietf