Re: [storm] iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage

Mark Bakke <Mark_Bakke@DELL.com> Thu, 19 July 2012 22:39 UTC

Return-Path: <Mark_Bakke@DELL.com>
X-Original-To: storm@ietfa.amsl.com
Delivered-To: storm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id F140211E80A6 for <storm@ietfa.amsl.com>; Thu, 19 Jul 2012 15:39:43 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -106.348
X-Spam-Level:
X-Spam-Status: No, score=-106.348 tagged_above=-999 required=5 tests=[AWL=0.250, BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id yqNQgWQC+Tqs for <storm@ietfa.amsl.com>; Thu, 19 Jul 2012 15:39:40 -0700 (PDT)
Received: from aussmtpmrkpc120.us.dell.com (aussmtpmrkpc120.us.dell.com [143.166.82.159]) by ietfa.amsl.com (Postfix) with ESMTP id 91CED11E8087 for <storm@ietf.org>; Thu, 19 Jul 2012 15:39:39 -0700 (PDT)
X-Loopcount0: from 64.238.244.148
X-IronPort-AV: E=Sophos; i="4.77,618,1336366800"; d="scan'208,217"; a="508773640"
Received: from mail.compellent.com ([64.238.244.148]) by aussmtpmrkpc120.us.dell.com with ESMTP; 19 Jul 2012 17:40:33 -0500
From: Mark Bakke <Mark_Bakke@DELL.com>
To: "david.black@emc.com" <david.black@emc.com>, "storm@ietf.org" <storm@ietf.org>
Date: Thu, 19 Jul 2012 17:40:31 -0500
Thread-Topic: iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage
Thread-Index: Ac1fZpN1pUNbVByGSYmBlR/YJY9UBQBkbhagAUGWQoA=
Message-ID: <975552A94CBC0F4DA60ED7B36C949CBA03F3D38388@shandy.Beer.Town>
References: <975552A94CBC0F4DA60ED7B36C949CBA03F1834043@shandy.Beer.Town> <8D3D17ACE214DC429325B2B98F3AE71208D3B54C@MX15A.corp.emc.com>
In-Reply-To: <8D3D17ACE214DC429325B2B98F3AE71208D3B54C@MX15A.corp.emc.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: multipart/alternative; boundary="_000_975552A94CBC0F4DA60ED7B36C949CBA03F3D38388shandyBeerTow_"
MIME-Version: 1.0
Cc: "mrm@vmware.com" <mrm@vmware.com>, "prakashvn@hcl.com" <prakashvn@hcl.com>
Subject: Re: [storm] iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 19 Jul 2012 22:39:44 -0000

I agree with David on the new functionality being added separately from the MIB Doctor review.  That said, we'll need to figure out what's significant work and what isn't.

So, on the particular question of heartbeats, is there support for:

*         Adding NOP counters at session scope (simple enough to add in the next version)

*         Adding NOP counters at connection scope (requires a new table, so it would be a more significant change)

*         Adding nothing

I also agree with David that the above would be useful (at either session or connection) and that they would need to be optional for compliance.

Mark

From: david.black@emc.com [mailto:david.black@emc.com]
Sent: Friday, July 13, 2012 8:24 AM
To: Mark Bakke; storm@ietf.org
Cc: mrm@vmware.com; prakashvn@hcl.com; ttalpey@microsoft.com; david.black@emc.com
Subject: RE: iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage

Hi Mark,

<WG chair hat off>

The MIB Doctor review contained a number of suggestions for functional enhancements.

My view is that significant enhancements need to be worked on in a new draft in the WG, unless their absence is a significant problem for MIB usability.  Obviously, whether an enhancement is "significant" or its absence causes a "significant problem" is a judgment call, so YMMV.

I view addition of a new table as being significant, so my suggestions would be:
- Add the NOP counters at session scope to avoid a new table.
- Add the NOP timeout counters at connection scope.
This is based on the current MIB structure that tracks activity at a session level (sufficient if things are working) and tracks errors by connection (useful if something has gone wrong, as the problem may be connection-specific).

Counter32 values ought to suffice, as they're used for commands and responses (as you note).  Also, NOPs, whether used for heartbeats or otherwise should not be sent at any significant fraction of the line rate.

It probably doesn't hurt to add the data timeout counter, but that seems less important (IMHO) by comparison to the NOP timeout counter as data timeout  will manifest as a failed SCSI command, of which notice will be taken higher in the SCSI stack.  In contrast, iSCSI NOP timeouts aren't SCSI errors.

I strongly suggest that these new counters not be required for MODULE-COMPLIANCE.

Thanks,
--David

From: Mark Bakke [mailto:Mark_Bakke@DELL.com]
Sent: Thursday, July 12, 2012 9:29 AM
To: Storm (storm@ietf.org)
Cc: MacFaden, Michael (VMware); prakashvn@hcl.com; Black, David; ttalpey@microsoft.com
Subject: iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage

Everyone,

One of things pointed out during the MIB doctor review is that although most implementations use NOP-In and NOP-Out as a heartbeat / keepalive mechanism, we don't provide counters for them or for timeouts based on missing heartbeats in the MIB module.

Since this requires a few new objects to be added to the MIB, I wanted to vet this by the list with two questions:


1.      Will it be useful to provide these counters?

2.      If so, which counters?

For example, to count the NOPs themselves, we can add to either iscsiSessionStatsTable or iscsiConnectionStatsTable

if we want to keep track of Nop-Ins and Nop-Outs per connection.  This should go to the list.  Is Counter32 sufficient for Nops since it is OK for commands and responses?  I think so:

    iscsiSsnNopIns                  Counter32,
    iscsiSsnNopOuts               Counter32,

If we do this, do we want per-session or per-connection?  If it's per-connection it would require an additional table.

Is Counter32 sufficient?  I believe so, since it's what we use for commands and responses, and the number of NOPs certainly won't exceed these.

The second part is to provide counters that indicate timeouts based on non-receipt of NOP responses.  This would likely be added to the session's error stats table.

Mike MacFaden asked us to consider separate counters for heartbeat / noop vs data timeouts.

  |  |  +--iscsiInstanceSsnErrorStatsTable(2)
  |  |     |
  |  |     +--iscsiInstanceSsnErrorStatsEntry(1) [iscsiInstIndex]
  |  |        |
  |  |        +-- r-n Counter32 iscsiInstSsnDigestErrors(1)
  |  |        +-- r-n Counter32 iscsiInstSsnCxnTimeoutErrors(2)
  |  |        +-- r-n Counter32 iscsiInstSsnFormatErrors(3)

Adding something like this might do the trick:

        iscsiInstSsnNopTimeoutErrors    Counter32
        iscsiInstSsnDataTimeoutErrors   Counter32

Any thoughts on what would be useful for that?  Anyone already count these in your implementations for other purposes?

Thanks,

Mark