Re: [storm] iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage

<david.black@emc.com> Fri, 13 July 2012 13:25 UTC

Return-Path: <david.black@emc.com>
X-Original-To: storm@ietfa.amsl.com
Delivered-To: storm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0347221F86E4 for <storm@ietfa.amsl.com>; Fri, 13 Jul 2012 06:25:09 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -102.475
X-Spam-Level:
X-Spam-Status: No, score=-102.475 tagged_above=-999 required=5 tests=[AWL=0.123, BAYES_00=-2.599, HTML_MESSAGE=0.001, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 90qBd4ypTi+4 for <storm@ietfa.amsl.com>; Fri, 13 Jul 2012 06:25:06 -0700 (PDT)
Received: from mexforward.lss.emc.com (hop-nat-141.emc.com [168.159.213.141]) by ietfa.amsl.com (Postfix) with ESMTP id 157D421F869E for <storm@ietf.org>; Fri, 13 Jul 2012 06:25:05 -0700 (PDT)
Received: from hop04-l1d11-si02.isus.emc.com (HOP04-L1D11-SI02.isus.emc.com [10.254.111.55]) by mexforward.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id q6DDORPR004245 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 13 Jul 2012 09:24:29 -0400
Received: from mailhub.lss.emc.com (mailhubhoprd02.lss.emc.com [10.254.221.253]) by hop04-l1d11-si02.isus.emc.com (RSA Interceptor); Fri, 13 Jul 2012 09:24:11 -0400
Received: from mxhub15.corp.emc.com (mxhub15.corp.emc.com [128.222.70.236]) by mailhub.lss.emc.com (Switch-3.4.3/Switch-3.4.3) with ESMTP id q6DDOBsx027319; Fri, 13 Jul 2012 09:24:11 -0400
Received: from mx15a.corp.emc.com ([169.254.1.189]) by mxhub15.corp.emc.com ([128.222.70.236]) with mapi; Fri, 13 Jul 2012 09:24:11 -0400
From: <david.black@emc.com>
To: <Mark_Bakke@DELL.com>, <storm@ietf.org>
Date: Fri, 13 Jul 2012 09:24:09 -0400
Thread-Topic: iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage
Thread-Index: Ac1fZpN1pUNbVByGSYmBlR/YJY9UBQBkbhag
Message-ID: <8D3D17ACE214DC429325B2B98F3AE71208D3B54C@MX15A.corp.emc.com>
References: <975552A94CBC0F4DA60ED7B36C949CBA03F1834043@shandy.Beer.Town>
In-Reply-To: <975552A94CBC0F4DA60ED7B36C949CBA03F1834043@shandy.Beer.Town>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
acceptlanguage: en-US
Content-Type: multipart/alternative; boundary="_000_8D3D17ACE214DC429325B2B98F3AE71208D3B54CMX15Acorpemccom_"
MIME-Version: 1.0
X-EMM-MHVC: 1
Cc: mrm@vmware.com, prakashvn@hcl.com
Subject: Re: [storm] iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 13 Jul 2012 13:25:09 -0000

Hi Mark,

<WG chair hat off>

The MIB Doctor review contained a number of suggestions for functional enhancements.

My view is that significant enhancements need to be worked on in a new draft in the WG, unless their absence is a significant problem for MIB usability.  Obviously, whether an enhancement is "significant" or its absence causes a "significant problem" is a judgment call, so YMMV.

I view addition of a new table as being significant, so my suggestions would be:
- Add the NOP counters at session scope to avoid a new table.
- Add the NOP timeout counters at connection scope.
This is based on the current MIB structure that tracks activity at a session level (sufficient if things are working) and tracks errors by connection (useful if something has gone wrong, as the problem may be connection-specific).

Counter32 values ought to suffice, as they're used for commands and responses (as you note).  Also, NOPs, whether used for heartbeats or otherwise should not be sent at any significant fraction of the line rate.

It probably doesn't hurt to add the data timeout counter, but that seems less important (IMHO) by comparison to the NOP timeout counter as data timeout  will manifest as a failed SCSI command, of which notice will be taken higher in the SCSI stack.  In contrast, iSCSI NOP timeouts aren't SCSI errors.

I strongly suggest that these new counters not be required for MODULE-COMPLIANCE.

Thanks,
--David

From: Mark Bakke [mailto:Mark_Bakke@DELL.com]
Sent: Thursday, July 12, 2012 9:29 AM
To: Storm (storm@ietf.org)
Cc: MacFaden, Michael (VMware); prakashvn@hcl.com; Black, David; ttalpey@microsoft.com
Subject: iSCSI MIB: Heartbeat counters for NOP-In and NOP-Out Usage

Everyone,

One of things pointed out during the MIB doctor review is that although most implementations use NOP-In and NOP-Out as a heartbeat / keepalive mechanism, we don't provide counters for them or for timeouts based on missing heartbeats in the MIB module.

Since this requires a few new objects to be added to the MIB, I wanted to vet this by the list with two questions:


1.      Will it be useful to provide these counters?

2.      If so, which counters?

For example, to count the NOPs themselves, we can add to either iscsiSessionStatsTable or iscsiConnectionStatsTable

if we want to keep track of Nop-Ins and Nop-Outs per connection.  This should go to the list.  Is Counter32 sufficient for Nops since it is OK for commands and responses?  I think so:

    iscsiSsnNopIns                  Counter32,
    iscsiSsnNopOuts               Counter32,

If we do this, do we want per-session or per-connection?  If it's per-connection it would require an additional table.

Is Counter32 sufficient?  I believe so, since it's what we use for commands and responses, and the number of NOPs certainly won't exceed these.

The second part is to provide counters that indicate timeouts based on non-receipt of NOP responses.  This would likely be added to the session's error stats table.

Mike MacFaden asked us to consider separate counters for heartbeat / noop vs data timeouts.

  |  |  +--iscsiInstanceSsnErrorStatsTable(2)
  |  |     |
  |  |     +--iscsiInstanceSsnErrorStatsEntry(1) [iscsiInstIndex]
  |  |        |
  |  |        +-- r-n Counter32 iscsiInstSsnDigestErrors(1)
  |  |        +-- r-n Counter32 iscsiInstSsnCxnTimeoutErrors(2)
  |  |        +-- r-n Counter32 iscsiInstSsnFormatErrors(3)

Adding something like this might do the trick:

        iscsiInstSsnNopTimeoutErrors    Counter32
        iscsiInstSsnDataTimeoutErrors   Counter32

Any thoughts on what would be useful for that?  Anyone already count these in your implementations for other purposes?

Thanks,

Mark