[storm] iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage

Mark Bakke <Mark_Bakke@DELL.com> Thu, 12 July 2012 13:29 UTC

Return-Path: <Mark_Bakke@DELL.com>
X-Original-To: storm@ietfa.amsl.com
Delivered-To: storm@ietfa.amsl.com
Received: from localhost (localhost []) by ietfa.amsl.com (Postfix) with ESMTP id 4C74D21F876F for <storm@ietfa.amsl.com>; Thu, 12 Jul 2012 06:29:34 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -105.596
X-Spam-Status: No, score=-105.596 tagged_above=-999 required=5 tests=[AWL=1.002, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([]) by localhost (ietfa.amsl.com []) (amavisd-new, port 10024) with ESMTP id iFZigTIw-3aJ for <storm@ietfa.amsl.com>; Thu, 12 Jul 2012 06:29:32 -0700 (PDT)
Received: from aussmtpmrkpc120.us.dell.com (aussmtpmrkpc120.us.dell.com []) by ietfa.amsl.com (Postfix) with ESMTP id A53BD21F8629 for <storm@ietf.org>; Thu, 12 Jul 2012 06:29:32 -0700 (PDT)
X-Loopcount0: from
X-IronPort-AV: E=Sophos; i="4.77,574,1336366800"; d="scan'208,217"; a="508005053"
Received: from mail.compellent.com ([]) by aussmtpmrkpc120.us.dell.com with ESMTP; 12 Jul 2012 08:30:05 -0500
From: Mark Bakke <Mark_Bakke@DELL.com>
To: "Storm (storm@ietf.org)" <storm@ietf.org>
Date: Thu, 12 Jul 2012 08:28:45 -0500
Thread-Topic: iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage
Thread-Index: Ac1fZpN1pUNbVByGSYmBlR/YJY9UBQ==
Message-ID: <975552A94CBC0F4DA60ED7B36C949CBA03F1834043@shandy.Beer.Town>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: multipart/alternative; boundary="_000_975552A94CBC0F4DA60ED7B36C949CBA03F1834043shandyBeerTow_"
MIME-Version: 1.0
Cc: "mrm@vmware.com" <mrm@vmware.com>, "prakashvn@hcl.com" <prakashvn@hcl.com>
Subject: [storm] iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 12 Jul 2012 13:29:34 -0000


One of things pointed out during the MIB doctor review is that although most implementations use NOP-In and NOP-Out as a heartbeat / keepalive mechanism, we don't provide counters for them or for timeouts based on missing heartbeats in the MIB module.

Since this requires a few new objects to be added to the MIB, I wanted to vet this by the list with two questions:

1.      Will it be useful to provide these counters?

2.      If so, which counters?

For example, to count the NOPs themselves, we can add to either iscsiSessionStatsTable or iscsiConnectionStatsTable

if we want to keep track of Nop-Ins and Nop-Outs per connection.  This should go to the list.  Is Counter32 sufficient for Nops since it is OK for commands and responses?  I think so:

    iscsiSsnNopIns                  Counter32,
    iscsiSsnNopOuts               Counter32,

If we do this, do we want per-session or per-connection?  If it's per-connection it would require an additional table.

Is Counter32 sufficient?  I believe so, since it's what we use for commands and responses, and the number of NOPs certainly won't exceed these.

The second part is to provide counters that indicate timeouts based on non-receipt of NOP responses.  This would likely be added to the session's error stats table.

Mike MacFaden asked us to consider separate counters for heartbeat / noop vs data timeouts.

  |  |  +--iscsiInstanceSsnErrorStatsTable(2)
  |  |     |
  |  |     +--iscsiInstanceSsnErrorStatsEntry(1) [iscsiInstIndex]
  |  |        |
  |  |        +-- r-n Counter32 iscsiInstSsnDigestErrors(1)
  |  |        +-- r-n Counter32 iscsiInstSsnCxnTimeoutErrors(2)
  |  |        +-- r-n Counter32 iscsiInstSsnFormatErrors(3)

Adding something like this might do the trick:

        iscsiInstSsnNopTimeoutErrors    Counter32
        iscsiInstSsnDataTimeoutErrors   Counter32

Any thoughts on what would be useful for that?  Anyone already count these in your implementations for other purposes?