[dtn-users] DTN2 - Frequent Spinlock errors

Nik Ansell <nikansell00@gmail.com> Sun, 24 January 2016 08:05 UTC

Return-Path: <nikansell00@gmail.com>
X-Original-To: dtn-users@ietfa.amsl.com
Delivered-To: dtn-users@ietfa.amsl.com
Received: from localhost (ietfa.amsl.com [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 1835A1A9107 for <dtn-users@ietfa.amsl.com>; Sun, 24 Jan 2016 00:05:20 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: 0.75
X-Spam-Level:
X-Spam-Status: No, score=0.75 tagged_above=-999 required=5 tests=[BAYES_40=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, J_CHICKENPOX_39=0.6, SPF_PASS=-0.001] autolearn=no
Received: from mail.ietf.org ([4.31.198.44]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 9p6AcVxEu52t for <dtn-users@ietfa.amsl.com>; Sun, 24 Jan 2016 00:05:18 -0800 (PST)
Received: from mail-io0-x22e.google.com (mail-io0-x22e.google.com [IPv6:2607:f8b0:4001:c06::22e]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by ietfa.amsl.com (Postfix) with ESMTPS id 561911A9106 for <dtn-users@irtf.org>; Sun, 24 Jan 2016 00:05:18 -0800 (PST)
Received: by mail-io0-x22e.google.com with SMTP id 77so125245487ioc.2 for <dtn-users@irtf.org>; Sun, 24 Jan 2016 00:05:18 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:date:message-id:subject:from:to:content-type; bh=5Ar2VYjAbR0e5xym337qPxDupYngTOy0u6Z60RDfQKE=; b=pim7zE7D4GzGbW55+nUETSGun3mUIo9siW/NVAZeuNUv/SWnPQNhFEemdWftbdBw53 mFQ1lI4bvWBREU9JrPI8H7ecTQrx5x5vKT1GzJjU1fyAa38ph1wcOTL9ReE0wzHN2Sc7 41pLj8V3YUPJVCbI16JmbESFjiQWqIdLP1pHJlBKZGju2saBpAVpoMZuj7HwfQo8HKH9 OOD3zANfdgdfh9buShG3qiQLMQlmQYMaPCoqAsq251U+nHH3tHb17rW7cosvpQIJMIp5 iuL8q+8SSC2FJEEy8waeK55ERpI5p+K9Si4RnPwdKwyUZp40MJ86dAY6m9kWgTByA+Mi Smhw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:reply-to:date:message-id:subject :from:to:content-type; bh=5Ar2VYjAbR0e5xym337qPxDupYngTOy0u6Z60RDfQKE=; b=PZXmmvdrY3aQpNp9q03ArrgHeILkoh1BtwNFvfBOJXH9254raChIQON1XMkIIsxhIj 8mNXNjAFK8+51d0EEEshXpZw/EWjs0M37nNeABGZKJcFJp0PkYUK/xWEtdbgC8QZLqXE Iv+4Tuz0z74Yt3ALgFQcZmcod00f4T3idqz6PBJij8xN2EYP0rW34k0UYNWeZ5s5MN34 j+WrQiFzKuH2EEfwXuO4j0/wXah4nRoFZZghPgwsapk/QMUQLhApNxb0olQI3SkMV8+Z LZoK4WN+3uJDdOIIeAg3G/sgfogIIc6/mYHUaSThWvladKk7E7TNLFBYSva8pX+9kJAF aMBg==
X-Gm-Message-State: AG10YORDb3klRbNit5JDTIlXHr4tuVkjDRIVR+OUEpJ5B6WwQeSHztX9NjzOopxIDdgz4bk2OmUxqRe0sRN3tQ==
MIME-Version: 1.0
X-Received: by 10.107.28.80 with SMTP id c77mr13067871ioc.98.1453622717527; Sun, 24 Jan 2016 00:05:17 -0800 (PST)
Received: by 10.79.24.2 with HTTP; Sun, 24 Jan 2016 00:05:17 -0800 (PST)
Date: Sun, 24 Jan 2016 12:05:17 +0400
Message-ID: <CAKLzrV_0Khan5YbJqyv+--ZHy78DnjZnrscXVMpFjZafq772tA@mail.gmail.com>
From: Nik Ansell <nikansell00@gmail.com>
To: dtn-users@irtf.org
Content-Type: multipart/alternative; boundary="001a11409b18b74493052a0fe97f"
Archived-At: <http://mailarchive.ietf.org/arch/msg/dtn-users/8PmVikv8YZyox63P31URUBRdsm0>
Subject: [dtn-users] DTN2 - Frequent Spinlock errors
X-BeenThere: dtn-users@irtf.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: nikansell00@gmail.com
List-Id: "The Delay-Tolerant Networking Research Group \(DTNRG\) - Users." <dtn-users.irtf.org>
List-Unsubscribe: <https://www.irtf.org/mailman/options/dtn-users>, <mailto:dtn-users-request@irtf.org?subject=unsubscribe>
List-Archive: <https://mailarchive.ietf.org/arch/browse/dtn-users/>
List-Post: <mailto:dtn-users@irtf.org>
List-Help: <mailto:dtn-users-request@irtf.org?subject=help>
List-Subscribe: <https://www.irtf.org/mailman/listinfo/dtn-users>, <mailto:dtn-users-request@irtf.org?subject=subscribe>
X-List-Received-Date: Sun, 24 Jan 2016 08:05:20 -0000

Hello All,

I am running some experiments to identify bundle delivery characteristics
of DTN2 (+LTPlib), ION and IBR-DTN on the Raspberry Pi B2. Currently I am
trying to identify reliable bundle delivery scenarios for all 3 DTN
implementations, so I can then apply certain network simulation scenarios
to observe how each DTN implementation and convergence layer behaves.

The scope of my experimentation is below for info:

ION: UDPCL, TCPCL, LTPCL
DTN2: UDPCL, TCLCP, LTPCL - using LTPlib
IBR-DTN: UDPCL, TCPCL

I have determined reliable, repeatable scenarios for both ION and IBR-DTN,
however DTN2 is producing produce a number of errors when using UDP, TCP or
LTP. The errors also seem to appear randomly after any number of for loop
iterations, e.g. (5,13,50,48,95,413, 567, 988, etc etc).

Errors that cause dtnd to use 200% CPU and stop delivering bundles:
warning: deliver_front is waiting for spin lock held by (null), which has
reached spin limit
warning: Bundle::del_ref is waiting for spin lock held by (null), which has
reached spin limit
warning: Bundle::is_queued_on is waiting for spin lock held by (null),
which has reached spin limit
warning: LinkBlockSet::find_blocks is waiting for spin lock held by (null),
which has reached spin limit
warning: BundleList::erase is waiting for spin lock held by (null), which
has reached spin limit
warning: Bundle::add_ref is waiting for spin lock held by (null), which has
reached spin limit
warning: ForwardingLog::get_count is waiting for spin lock held by (null),
which has reached spin limit

Errors: that cause dtnd to quit:
LTP:ASSERTION FAILED (is_locked_by_me()) at thread/SpinLock.cc:91

Errors that seem to loop forever but not cause dtnd to quit or use 200% CPU
(This error seems to happen consistently after sending 99 LTP bundles):
/dtn/cl/ltp/sender error] Unable to create Sender LTP Socket in 5 seconds -
retrying

To send files I am calling dtnsource in a for loop as below, which allows
me to adjust the time between each bundle transmission, or each block of 10
bundle transmissions. For UDP and LTP I am using bundles of 63K, for TCP I
am using 1M bundles. I have tried several combinations for $WAIT (0-5
seconds) and $BULKWAIT (0-20 seconds), but cannot seem to find a working
combination.

for ((i=1;i<=END;i++)); do
    echo "($COUNT) dtnsource -s dtn://tx.dtn/a -d dtn://rx.dtn/g -b $SIZE"
    dtnsource -s dtn://tx.dtn/a -d dtn://rx.dtn/g -b $SIZE
    sleep $WAIT
    let "COUNT++"

    # Sleep for x secs after each 10
    if (( $COUNT % 10 == 0 ))
    then
    echo "Sleeping for $BULKWAIT seconds"
    sleep $BULKWAIT
    fi

done

The daemon is stopped and restarted after each test, I have also tried
deleting and recreating the berkley DB each time, but this does not seem to
have much effect. I have also managed to recreate some of the errors on a
virtual Ubuntu test-bed, specifically Bundle::del_ref.

After reading through the oasys SpinLock::lock code, it looks like this is
possibly an infinite loop that tries up to 1,000,000 times to get a lock,
then resets the counter to zero and puts out a warning message.

Has anyone had a similar experience or can suggest any troubleshooting tips?
Any suggestions to workaround or fix the problem will be very gratefully
received!

Kind Regards,
Nik