[TOOLS-DEVELOPMENT] Data on slowdowns

Glen <glen@amsl.com> Mon, 27 June 2011 14:37 UTC

Return-Path: <glen@amsl.com>
X-Original-To: tools-development@ietfa.amsl.com
Delivered-To: tools-development@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1FC2D21F8656 for <tools-development@ietfa.amsl.com>; Mon, 27 Jun 2011 07:37:24 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.199
X-Spam-Level:
X-Spam-Status: No, score=-106.199 tagged_above=-999 required=5 tests=[AWL=0.400, BAYES_00=-2.599, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jfZB+7NKoBTv for <tools-development@ietfa.amsl.com>; Mon, 27 Jun 2011 07:37:23 -0700 (PDT)
Received: from mail.amsl.com (mail.amsl.com [64.170.98.20]) by ietfa.amsl.com (Postfix) with ESMTP id 6043621F8655 for <tools-development@ietf.org>; Mon, 27 Jun 2011 07:37:23 -0700 (PDT)
Received: by c1a.amsl.com (Postfix, from userid 1000) id 59CA21C3852B; Mon, 27 Jun 2011 07:37:23 -0700 (PDT)
Date: Mon, 27 Jun 2011 07:37:23 -0700
From: Glen <glen@amsl.com>
To: tools-development@ietf.org
Message-ID: <20110627143723.GD21782@amsl.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
User-Agent: Mutt/1.5.17 (2007-11-01)
Subject: [TOOLS-DEVELOPMENT] Data on slowdowns
X-BeenThere: tools-development@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Tools Development list server <tools-development.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/tools-development>, <mailto:tools-development-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/tools-development>
List-Post: <mailto:tools-development@ietf.org>
List-Help: <mailto:tools-development-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/tools-development>, <mailto:tools-development-request@ietf.org?subject=subscribe>
X-List-Received-Date: Mon, 27 Jun 2011 14:37:24 -0000

Tools team -

You've already read my report to the I*** in which I explained my findings
thus far. 

My data at this point is as follows:

At 0315, the server appeared to be running normally:

----------------------------------------
Mon Jun 27 03:15:01 PDT 2011
 
w:
 03:15:01 up  9:20,  2 users,  load average: 1.04, 2.30, 2.12
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     xvc0      21:29    4:37m  0.22s  0.22s bash
henrik   pts/2     18:09    9:05m  0.04s  0.04s bash -i
 
vmstat:
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 3  1     76 1922680 2046980 10300904    0    0   129   172  144    8  6  1 90  
2
 
free -lmt:
             total       used       free     shared    buffers     cached
Mem:         16394      14518       1876          0       1999      10059
Low:         16394      14518       1876
High:            0          0          0
-/+ buffers/cache:       2459      13935
Swap:         8189          0       8189
Total:       24584      14518      10066

None of the apache processes appeared to have been impacted yet:

11787 root      20   0  475m  69m  10m S    0  0.4   1:20.26 httpd2-prefork     
19352 wwwrun    20   0  569m  97m 6564 S    0  0.6   0:01.50 httpd2-prefork     
19847 wwwrun    20   0  555m  89m 6356 S    0  0.5   0:00.56 httpd2-prefork     
19869 wwwrun    20   0  565m  93m 6380 S    0  0.6   0:00.76 httpd2-prefork     
19877 wwwrun    20   0  476m  63m 3484 S    0  0.4   0:00.14 httpd2-prefork     
19986 wwwrun    20   0  476m  62m 3260 S    4  0.4   0:00.08 httpd2-prefork     
20005 wwwrun    20   0  476m  62m 3260 S    0  0.4   0:00.10 httpd2-prefork     
20012 wwwrun    20   0  555m  89m 6356 S    0  0.5   0:00.52 httpd2-prefork     
20028 wwwrun    20   0  476m  61m 2092 S    4  0.4   0:00.08 httpd2-prefork     
20029 wwwrun    20   0  556m  90m 6384 R    8  0.5   0:00.68 httpd2-prefork     
20030 wwwrun    20   0  476m  62m 3236 S    0  0.4   0:00.08 httpd2-prefork     
20031 wwwrun    20   0  568m  94m 6412 S    0  0.6   0:00.72 httpd2-prefork     
20033 wwwrun    20   0  476m  62m 3188 S    0  0.4   0:00.02 httpd2-prefork     
20059 wwwrun    20   0  568m  93m 5472 S    0  0.6   0:00.74 httpd2-prefork     
20081 wwwrun    20   0  558m  87m 5436 R   82  0.5   0:00.68 httpd2-prefork     
20097 wwwrun    20   0  476m  61m 2020 S    0  0.4   0:00.02 httpd2-prefork     
20106 wwwrun    20   0  476m  62m 3204 S    4  0.4   0:00.06 httpd2-prefork     
20107 wwwrun    20   0  479m  65m 3248 S    0  0.4   0:00.36 httpd2-prefork     
20119 wwwrun    20   0  476m  61m 2032 S    0  0.4   0:00.06 httpd2-prefork     
20120 wwwrun    20   0  476m  61m 2012 S    0  0.4   0:00.02 httpd2-prefork     
20141 wwwrun    20   0  556m  86m 5088 D   78  0.5   0:00.44 httpd2-prefork     
20252 wwwrun    20   0  475m  59m  852 S    0  0.4   0:00.00 httpd2-prefork     


By 0320, things were going downhill:

----------------------------------------
Mon Jun 27 03:20:01 PDT 2011
 
Load average still looked okay (w):
 03:20:01 up  9:25,  2 users,  load average: 3.16, 3.43, 2.76
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     xvc0      21:29    4:42m  0.22s  0.22s bash
henrik   pts/2     18:09    9:10m  0.04s  0.04s bash -i
 
But check out the free memory (from free -lmt, down 10GB from 5 minutes ago):
             total       used       free     shared    buffers     cached
Mem:         16394      15354       1040          0          8       2326
Low:         16394      15354       1040
High:            0          0          0
-/+ buffers/cache:      13020       3374
Swap:         8189          0       8189
Total:       24584      15354       9229

And note that two of the apache processes were now using that 10GB:

21095 wwwrun    20   0     0    0    0 Z    0  0.0   0:04.72 httpd2-pr <defunct>
11787 root      20   0  475m  69m  10m S    0  0.4   1:26.33 httpd2-prefork     
20804 wwwrun    20   0 8999m 8.2g 5324 R   71 51.2   3:38.15 httpd2-prefork     
20988 wwwrun    20   0 3386m 2.8g 6300 R   58 17.5   1:28.75 httpd2-prefork     
20991 wwwrun    20   0  568m  95m 6564 D    0  0.6   0:01.72 httpd2-prefork     
20994 wwwrun    20   0  560m  89m 6144 D    0  0.5   0:10.74 httpd2-prefork     
21019 wwwrun    20   0  567m  95m 6488 D    0  0.6   0:04.12 httpd2-prefork     


By 0325, we were thrashing:

----------------------------------------
Mon Jun 27 03:25:10 PDT 2011
 
 03:25:19 up  9:30,  2 users,  load average: 63.64, 39.90, 18.77
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
root     xvc0      21:29    4:47m  0.22s  0.22s bash
henrik   pts/2     18:09    9:16m  0.04s  0.04s bash -i
 
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
38  5 1040428 844356   2928 662264    1    4   134   174  101   14  6  2 89  3
 
             total       used       free     shared    buffers     cached
Mem:         16394      15560        834          0          2        648
Low:         16394      15560        834
High:            0          0          0
-/+ buffers/cache:      14909       1485
Swap:         8189       1026       7162
Total:       24584      16587       7997

11787 root      20   0  475m  33m 9364 S    0  0.2   1:28.48 httpd2-prefork     
20804 wwwrun    20   0 11.5g 6.5g 5236 R    7 40.3  13:32.55 httpd2-prefork     
20988 wwwrun    20   0 6150m 3.8g 6196 R    6 23.8  11:25.67 httpd2-prefork     
20991 wwwrun    20   0  571m  58m 6520 S    0  0.4   1:17.19 httpd2-prefork     
20994 wwwrun    20   0  577m  69m 6608 D    0  0.4   1:44.54 httpd2-prefork     
21019 wwwrun    20   0  577m  69m 6592 S    0  0.4   1:43.82 httpd2-prefork     


Note that two apache processes were impacted.  When I do a grep on POST
requests from the datatracker log around this time frame, I see nothing 
but POST /doc/search/ requests, except for these two identical requests,
at just the right time:

134.226.56.7 - - [27/Jun/2011:03:16:57 -0700] "POST /doc/draft-ietf-payload-rfc3016bis/edit/position/ HTTP/1.1" 302 - "https://datatracker.ietf.org/doc/draft-ietf-payload-rfc3016bis/edit/position/" "Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0"
134.226.56.7 - - [27/Jun/2011:03:19:07 -0700] "POST /doc/draft-ietf-payload-rfc3016bis/edit/position/ HTTP/1.1" 302 - "https://datatracker.ietf.org/doc/draft-ietf-payload-rfc3016bis/edit/position/" "Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0"

The next slowdown occurred starting between 0510 and 0515.  And here again I
see similar requests:

134.226.56.7 - - [27/Jun/2011:05:11:38 -0700] "POST /doc/draft-ietf-payload-rfc3016bis/edit/position/ HTTP/1.1" 302 - "https://datatracker.ietf.org/doc/draft-ietf-payload-rfc3016bis/edit/position/" "Mozilla/5.0 (X11; Linux i686; rv:5.0) Gecko/20100101 Firefox/5.0"
213.145.205.249 - - [27/Jun/2011:05:14:55 -0700] "POST /submit/status/33455/f52ef05298cd7945af3596bafd20aa63/ HTTP/1.1" 200 33982 "https://datatracker.ietf.org/submit/status/33455/f52ef05298cd7945af3596bafd20aa63/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-GB; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"
213.145.205.249 - - [27/Jun/2011:05:15:05 -0700] "POST /submit/status/33455/confirm/aeb3677f2a2be33d87708522a2da1fae935688fa/ HTTP/1.1" 200 33245 "http://datatracker.ietf.org/submit/status/33455/confirm/aeb3677f2a2be33d87708522a2da1fae935688fa/" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-GB; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3"

I am thinking that something might be wrong with either the code, or the
database entry, relating to ballot positions.

Thoughts would be welcome!

Glen