Re: [dtn-users] DTN2 - Frequent Spinlock errors
Nik Ansell <nikansell00@gmail.com> Mon, 25 January 2016 07:19 UTC
Return-Path: <nikansell00@gmail.com>
X-Original-To: dtn-users@ietfa.amsl.com
Delivered-To: dtn-users@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A6D1C1ACEC1 for <dtn-users@ietfa.amsl.com>; Sun, 24 Jan 2016 23:19:23 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 1.551
X-Spam-Level: *
X-Spam-Status: No, score=1.551 tagged_above=-999 required=5 tests=[BAYES_50=0.8, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, J_CHICKENPOX_39=0.6, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id RJ0UByHFqAA3 for <dtn-users@ietfa.amsl.com>; Sun, 24 Jan 2016 23:19:21 -0800 (PST)
Received: from mail-ig0-x235.google.com (mail-ig0-x235.google.com [IPv6:2607:f8b0:4001:c05::235]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 7BB601ACEBD for <dtn-users@irtf.org>; Sun, 24 Jan 2016 23:19:21 -0800 (PST)
Received: by mail-ig0-x235.google.com with SMTP id h5so25172629igh.0 for <dtn-users@irtf.org>; Sun, 24 Jan 2016 23:19:21 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=Z5LGOlv+TsJRmZyWjyZZXiPYY6V+5k2C8cJmGI53EEk=; b=LLpCXNRg6iNQYoxzcMIgCzADeKnjPR2KDdb5iaYeDyff1/7MS2zKvFQicqgkHiXa77 NExdQkeYgvu7KeUU+w6xyLc9xkkblyXcWh6zhMuTKzUGzkFB3PuuoH6ZlSYCUDNnoNIp lS2Jy6ndDONbOIBtd+ToLq8r+vI+7OKaPqYZuPowuRgg1fx2kJsxjhIVNQBDU1YHzbpJ wy4P9ogjgFJQuaZoLIuY2KGE40hb1MJ/61IcFXhpVspjGtpmxBV32WNy86lCffIYHihc r+hAS/H/P/+xWczvPEk0L5fEOGBQ9YUssb7F2GLNWfkDIL9EJDUoGSR58xzi5ZlucxZl 32AA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:reply-to:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=Z5LGOlv+TsJRmZyWjyZZXiPYY6V+5k2C8cJmGI53EEk=; b=J/S4FyNmPXCgrOrwWMtYNrZwftRe3ZntFLQOwW5YKLlSxSHAo2U3+GIovPgcuBynZw FwW5/q3TKOf7ZaY3BcZ/ZNxooX3t3+KVdVf8zD2CjuT/aPiCXX/NKIlI2OXBTFls2fna 3c7H6zvbevGzJiGEYhDD+gQmjPrmRHuw1jWJRgtv7jdKBqT4MuyQDHDvEX7+JnF6BoiN VV7E7JUdJCgRFJCpd3aZBcE+EzCM3AXYx6rsWrFsm6OhTxFTlnC7dEEGHtp5dwohEQPI MLJ/1kLaRyGjomWGBa0nN/n5ipZBqDodgbAeDY6SZ8miCHycOzqDBPvtpsaHAAX5EAu7 uREA==
X-Gm-Message-State: AG10YOSFt/BemKfe9M4sBYA0a/qInSgiw7sw2Bg7ZrC1F03pwiB4fng2WoJ6Ejvf6UAqHXNRC+xLqjNO3YUAow==
MIME-Version: 1.0
X-Received: by 10.50.147.34 with SMTP id th2mr11304203igb.27.1453706360605; Sun, 24 Jan 2016 23:19:20 -0800 (PST)
Received: by 10.79.24.2 with HTTP; Sun, 24 Jan 2016 23:19:20 -0800 (PST)
In-Reply-To: <CAKLzrV_0Khan5YbJqyv+--ZHy78DnjZnrscXVMpFjZafq772tA@mail.gmail.com>
References: <CAKLzrV_0Khan5YbJqyv+--ZHy78DnjZnrscXVMpFjZafq772tA@mail.gmail.com>
Date: Mon, 25 Jan 2016 11:19:20 +0400
Message-ID: <CAKLzrV_WC5i-rGjv0MDFDG4YhBKu3Si4qjXa=UztjDkQ3pLtYQ@mail.gmail.com>
From: Nik Ansell <nikansell00@gmail.com>
To: dtn-users@irtf.org
Content-Type: multipart/alternative; boundary="089e014952383b59c5052a2363f1"
Archived-At: <http://mailarchive.ietf.org/arch/msg/dtn-users/fldvyNq6aqF6kp173fZ_yV085k0>
Subject: Re: [dtn-users] DTN2 - Frequent Spinlock errors
X-BeenThere: dtn-users@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: nikansell00@gmail.com
List-Id: "The Delay-Tolerant Networking Research Group \(DTNRG\) - Users." <dtn-users.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/dtn-users>, <mailto:dtn-users-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dtn-users/>
List-Post: <mailto:dtn-users@irtf.org>
List-Help: <mailto:dtn-users-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/dtn-users>, <mailto:dtn-users-request@irtf.org?subject=subscribe>
X-List-Received-Date: Mon, 25 Jan 2016 07:19:23 -0000
I've done some more digging and have a few observations (below), I will continue down this path but think I may be out of my depth already! :) 1. SpinLock.h: SpinLock::lock(): a) The error message appearing in the log (below) has a null lock_holder_name_ - Does this mean the lock is invalid (i.e perhaps the calling thread terminated due to an exception) and can be safely released? fprintf(stderr, "warning: %s is waiting for spin lock held by %s, which has reached spin limit\n", lock_user, lock_holder_name_) b) The call to (Thread.h) Thread::spin_yield() (which calls either thread_id_yield() or sched_yield() ). Is possibly failing in some cases - not sure how to debug this one....Has anyone any idea if I can somehow force the release of the lock if (as in above) lock_holder_name_ == NULL? c) The function atomic_cmpxchg32(&lock_count_,0,1) !=0 called is contained within several Atomic-<CPUARCH>.h files e.g. Atomic-arm.h or Atomic-x86.h - I am not sure how to confirm the correct header file was used during the build, but will try to build oasys and dtn2 again using the --host=arm-linux argument to configure to see if this makes a difference. 2. I have noticed there are some locking debugging functions available (Thread::lock_debugger->add_lock / remove_lock) if the --enable_debug_locking argument is passed during the build of oasys and dtn2. I am considering enabling this to get some more info on the problem. Kind Regards, Nik On Sun, Jan 24, 2016 at 12:05 PM, Nik Ansell <nikansell00@gmail.com> wrote: > Hello All, > > I am running some experiments to identify bundle delivery characteristics > of DTN2 (+LTPlib), ION and IBR-DTN on the Raspberry Pi B2. Currently I am > trying to identify reliable bundle delivery scenarios for all 3 DTN > implementations, so I can then apply certain network simulation scenarios > to observe how each DTN implementation and convergence layer behaves. > > The scope of my experimentation is below for info: > > ION: UDPCL, TCPCL, LTPCL > DTN2: UDPCL, TCLCP, LTPCL - using LTPlib > IBR-DTN: UDPCL, TCPCL > > I have determined reliable, repeatable scenarios for both ION and IBR-DTN, > however DTN2 is producing produce a number of errors when using UDP, TCP or > LTP. The errors also seem to appear randomly after any number of for loop > iterations, e.g. (5,13,50,48,95,413, 567, 988, etc etc). > > Errors that cause dtnd to use 200% CPU and stop delivering bundles: > warning: deliver_front is waiting for spin lock held by (null), which has > reached spin limit > warning: Bundle::del_ref is waiting for spin lock held by (null), which > has reached spin limit > warning: Bundle::is_queued_on is waiting for spin lock held by (null), > which has reached spin limit > warning: LinkBlockSet::find_blocks is waiting for spin lock held by > (null), which has reached spin limit > warning: BundleList::erase is waiting for spin lock held by (null), which > has reached spin limit > warning: Bundle::add_ref is waiting for spin lock held by (null), which > has reached spin limit > warning: ForwardingLog::get_count is waiting for spin lock held by (null), > which has reached spin limit > > Errors: that cause dtnd to quit: > LTP:ASSERTION FAILED (is_locked_by_me()) at thread/SpinLock.cc:91 > > Errors that seem to loop forever but not cause dtnd to quit or use 200% > CPU (This error seems to happen consistently after sending 99 LTP bundles): > /dtn/cl/ltp/sender error] Unable to create Sender LTP Socket in 5 seconds > - retrying > > To send files I am calling dtnsource in a for loop as below, which allows > me to adjust the time between each bundle transmission, or each block of 10 > bundle transmissions. For UDP and LTP I am using bundles of 63K, for TCP I > am using 1M bundles. I have tried several combinations for $WAIT (0-5 > seconds) and $BULKWAIT (0-20 seconds), but cannot seem to find a working > combination. > > for ((i=1;i<=END;i++)); do > echo "($COUNT) dtnsource -s dtn://tx.dtn/a -d dtn://rx.dtn/g -b $SIZE" > dtnsource -s dtn://tx.dtn/a -d dtn://rx.dtn/g -b $SIZE > sleep $WAIT > let "COUNT++" > > # Sleep for x secs after each 10 > if (( $COUNT % 10 == 0 )) > then > echo "Sleeping for $BULKWAIT seconds" > sleep $BULKWAIT > fi > > done > > The daemon is stopped and restarted after each test, I have also tried > deleting and recreating the berkley DB each time, but this does not seem to > have much effect. I have also managed to recreate some of the errors on a > virtual Ubuntu test-bed, specifically Bundle::del_ref. > > After reading through the oasys SpinLock::lock code, it looks like this is > possibly an infinite loop that tries up to 1,000,000 times to get a lock, > then resets the counter to zero and puts out a warning message. > > Has anyone had a similar experience or can suggest any troubleshooting > tips? > Any suggestions to workaround or fix the problem will be very gratefully > received! > > Kind Regards, > Nik >
- [dtn-users] DTN2 - Frequent Spinlock errors Nik Ansell
- Re: [dtn-users] DTN2 - Frequent Spinlock errors Nik Ansell
- Re: [dtn-users] DTN2 - Frequent Spinlock errors Nik Ansell