Re: [storm] iSER - problem with unsolicited NOP-IN right after final Login Response

Mallikarjun Chadalapaka <cbm@chadalapaka.com> Sat, 26 May 2012 01:02 UTC

Return-Path: <cbm@chadalapaka.com>
X-Original-To: storm@ietfa.amsl.com
Delivered-To: storm@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 0F37F21F86D4 for <storm@ietfa.amsl.com>; Fri, 25 May 2012 18:02:01 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.598
X-Spam-Level:
X-Spam-Status: No, score=-6.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id FuNapTCGDUf2 for <storm@ietfa.amsl.com>; Fri, 25 May 2012 18:01:56 -0700 (PDT)
Received: from tx2outboundpool.messaging.microsoft.com (tx2ehsobe001.messaging.microsoft.com [65.55.88.11]) by ietfa.amsl.com (Postfix) with ESMTP id 625FF21F86EE for <storm@ietf.org>; Fri, 25 May 2012 18:01:56 -0700 (PDT)
Received: from mail161-tx2-R.bigfish.com (10.9.14.250) by TX2EHSOBE003.bigfish.com (10.9.40.23) with Microsoft SMTP Server id 14.1.225.23; Sat, 26 May 2012 01:01:43 +0000
Received: from mail161-tx2 (localhost [127.0.0.1]) by mail161-tx2-R.bigfish.com (Postfix) with ESMTP id 13C65E0206; Sat, 26 May 2012 01:01:43 +0000 (UTC)
X-Forefront-Antispam-Report: CIP:157.56.240.117; KIP:(null); UIP:(null); IPV:NLI; H:BL2PRD0610HT004.namprd06.prod.outlook.com; RD:none; EFVD:NLI
X-SpamScore: -25
X-BigFish: PS-25(zz9371Ic85fh98dK4015Izz1202hzz1033IL8275bh8275dhz2fh2a8h668h839hd25hf0ah)
Received-SPF: pass (mail161-tx2: domain of chadalapaka.com designates 157.56.240.117 as permitted sender) client-ip=157.56.240.117; envelope-from=cbm@chadalapaka.com; helo=BL2PRD0610HT004.namprd06.prod.outlook.com ; .outlook.com ;
Received: from mail161-tx2 (localhost.localdomain [127.0.0.1]) by mail161-tx2 (MessageSwitch) id 1337994099778995_22651; Sat, 26 May 2012 01:01:39 +0000 (UTC)
Received: from TX2EHSMHS026.bigfish.com (unknown [10.9.14.240]) by mail161-tx2.bigfish.com (Postfix) with ESMTP id B0D7AC0046; Sat, 26 May 2012 01:01:39 +0000 (UTC)
Received: from BL2PRD0610HT004.namprd06.prod.outlook.com (157.56.240.117) by TX2EHSMHS026.bigfish.com (10.9.99.126) with Microsoft SMTP Server (TLS) id 14.1.225.23; Sat, 26 May 2012 01:01:39 +0000
Received: from BL2PRD0610MB361.namprd06.prod.outlook.com ([169.254.10.84]) by BL2PRD0610HT004.namprd06.prod.outlook.com ([10.255.101.39]) with mapi id 14.16.0152.000; Sat, 26 May 2012 01:01:51 +0000
From: Mallikarjun Chadalapaka <cbm@chadalapaka.com>
To: "david.black@emc.com" <david.black@emc.com>, "mkosjc@gmail.com" <mkosjc@gmail.com>, "nezhinsky@gmail.com" <nezhinsky@gmail.com>
Thread-Topic: [storm] iSER - problem with unsolicited NOP-IN right after final Login Response
Thread-Index: AQHNOXtOIIaCinNGykWqG66+HjtgdZbbOw+g
Date: Sat, 26 May 2012 01:01:50 +0000
Message-ID: <E160851FCED17643AE5F53B5D4D0783A092D0294@BL2PRD0610MB361.namprd06.prod.outlook.com>
References: <CAEkHY=egdu9RNYojRq2jNSe1205VZxTa8fizM-sZaA8aGh_FNg@mail.gmail.com> <CAP_=6dL9DuqAWFJ-3dWOymO4kbdW_sRtyTUuZ4XEAwnvP3POOw@mail.gmail.com> <8D3D17ACE214DC429325B2B98F3AE71205813C92@MX15A.corp.emc.com>
In-Reply-To: <8D3D17ACE214DC429325B2B98F3AE71205813C92@MX15A.corp.emc.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach:
X-MS-TNEF-Correlator:
x-originating-ip: [131.107.174.123]
Content-Type: multipart/alternative; boundary="_000_E160851FCED17643AE5F53B5D4D0783A092D0294BL2PRD0610MB361_"
MIME-Version: 1.0
X-OriginatorOrg: chadalapaka.com
Cc: "ogerlitz@mellanox.com" <ogerlitz@mellanox.com>, "michaelc@cs.wisc.edu" <michaelc@cs.wisc.edu>, "storm@ietf.org" <storm@ietf.org>
Subject: Re: [storm] iSER - problem with unsolicited NOP-IN right after final Login Response
X-BeenThere: storm@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Storage Maintenance WG <storm.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/storm>, <mailto:storm-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/storm>
List-Post: <mailto:storm@ietf.org>
List-Help: <mailto:storm-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/storm>, <mailto:storm-request@ietf.org?subject=subscribe>
X-List-Received-Date: Sat, 26 May 2012 01:02:01 -0000

I am curious to understand a bit more on why this is a protocol issue per se.

Seems like one way to address this problem is via an implementation approach with the initiator posting in advance the negotiated number of unsolicited PDU buffers, at the same time it makes the (final) negotiation offer. As the in-bound unsolicited PDUs can technically arrive any time after the offer, due to standard network latency mechanics that Alexander summarized. Has that approach been considered?

Mallikarjun




From: storm-bounces@ietf.org [mailto:storm-bounces@ietf.org] On Behalf Of david.black@emc.com
Sent: Thursday, May 24, 2012 12:03 AM
To: mkosjc@gmail.com; nezhinsky@gmail.com
Cc: ogerlitz@mellanox.com; michaelc@cs.wisc.edu; storm@ietf.org
Subject: Re: [storm] iSER - problem with unsolicited NOP-IN right after final Login Response
Importance: High

Mike (Ko) and Alexander,

Mike is of course correct that iSER Hello usage can be forced by negotiating iSERHelloRequired to "Yes".  However, existing implementations are likely to reply with iSERHelloRequired=NotUnderstood, so we do need to specify what should be done in order to interoperate with an implementation that refuses to deal with the iSER Hello exchange.

I think the situation that Alexander described should be documented in a new section 5.1.4 of the iSER draft.  My general rule of thumb on this sort of surprise found by implementers in the "running code" is that it indicates that something is missing in the spec.  I believe that Alexander has described the solution - more below.

The new section 5.1.4 (suggested section title: Omission of the iSER Hello Exchange) should describe default omission of the exchange, use of iSERHelloRequired key to omit the iSER Hello exchange, and the consequences of target use of unsolicited PDUs after login when the exchange is omitted, including IB's use of NOP-IN (as a keep-alive measure, right?)

The crucial requirements points that I take away from Alexander's description are that if the iSER Hello exchange is omitted, then:

1) The target MAY send *one* unsolicited PDU immediately after sending the Login Response.

2) The target MUST wait at least 200ms (use some other number if 200ms isn't a good choice)
or until it receives a full feature mode PDU from the initiator before sending a
second unsolicited PDU in order to ensure that initiator has sufficient
      time to allocate the full feature buffer resources for the connection.
3) The initiator SHOULD allocate at least one additional buffer for use during login (so that
at least two buffers are in use during login) in order to receive an unsolicited PDU
that may follow login completion.  Failure to allocate this second buffer may
cause connection termination if no buffer is available when an unsolicited PDU arrives.

Both Mike and I are on vacation, so it may be a few weeks until we can agree on the new text and get a -12 version of the draft with that new text submitted.  In the interim, I've asked our AD (Martin Stiemerling) to hold off on further processing of the iSER draft until a -12 version with this new text is submitted.  I'd prefer to work this text out now rather than deal with it as an IETF Last Call comment - as the problem turned up in actual implementations, I think it's worth the extra month that it's likely to take to get correct text on how to avoid the problem into the draft.

I'd suggest that Mike Ko post an initial draft of the text for the new section 5.1.4 to the list when he resurfaces ...

Thanks,
--David

From: Michael Ko [mailto:mkosjc@gmail.com]<mailto:[mailto:mkosjc@gmail.com]>
Sent: Monday, May 21, 2012 10:23 AM
To: Alexander Nezhinsky
Cc: storm@ietf.org<mailto:storm@ietf.org>; Black, David; Or Gerlitz; Mike Christie
Subject: Re: iSER - problem with unsolicited NOP-IN right after final Login Response

Alex,

The iSER Hello support has never been removed in the latest spec.  Only its use is made optional.  So during login negotiation, just negotiate iSERHelloRequired to Yes.

Mike
On Mon, May 21, 2012 at 6:15 AM, Alexander Nezhinsky <nezhinsky@gmail.com<mailto:nezhinsky@gmail.com>> wrote:
Hi

I understand that it is a bad timing for sending this kind of mail, now that iSER draft was submitted,
but actually we still have a small problem.
It is related to the final Login Response handling and the transition to Full-Featured phase on the initiator side in
Infiniband setups.

When the target receives the final Login Request it send the final Login Response and from its perspective
the connection is now in Full Featured Phase (assuming that it agreed to transition in the Login Response being sent).

This means that the target is ready to accept SCSI commands, Text Requests etc. sent by the initiator.
It also means that the target is eligible to send some unsolicited PDUs, notably unsolicited NOP-INs.

With IB sending NOP-IN periodically is the easiest (an almost only feasible) way to determine closed connections
reliably, because this kind of error is delivered to user only in response to a previously initiated TX operation.

This leaves the initiator in a dubious position. It posts its RX buffers for that connection only when the final
Login Response arrives. But during that time (after the target had sent the Last Login Response but before
the Full Featured phase related RX-buffers are posted on the initiator side) the target may send
the first NOP-IN as it considers the connection in Full Featured phase already and NumOfUnsolicited PDUs
accounting for NOP-INs has been negotiated to a non-zero value.

If the initiator works with a single RX-buffer posted during the entire login phase (which is a logical thing to do
judging by the login exchange protocol) then an error occurs, as no buffers are posted when the NOP-IN arrives
and the connection is shut down.

Posting a single extra buffer before sending the last Login Request only alleviates the problem. Although this
often solves it in practical terms (as the target most probably sends the next NOP-IN only after some timeout
period measuring seconds or hundreds of milliseconds), it does not solves it in terms of protocol completeness,
as the target MAY theoretically send more than one NOP-IN until FF buffers are posted.

This issue was encountered recently in linux iscsi/iser initiator and the above solution has been applied to solve
it against the existing target implementation (STGT), but the initiator remains exposed to this kind of errors.

The solution is actually quite simple (theoretically) - if we bring back the requirement for iSER Hello exchange
then the iSER assisted Full Featured phase does not commence until HelloReply PDU arrives at the target
and the initiator has a definitive point in time when it can safely post its RX buffers - after the final LoginResponse
returns but before it sends iSER Hello PDU.

In practical terms it means that iSER Hello support requirement should be brought back to spec, which is a hassle.

Should we decide on this now?

Alexander

P.S. : Thanx to Mike Christie and Or Gerlitz, the maintainers of linux iSCSI and iSER initiator for raising the issue.