RE: [Ips] detection of failed sessions to allow re-login

Hi Paul,

That's a target problem, most likely a bug. The second login with TSIH=0
tells the target to perform session reinstatement. That's jargon for
"silently nuke the first session".

The target may be failing the login because it's internal cleanup
requires more time (I/O requests from the previous session are jammed
for instance). Your scenario suggests this is not likely.

HTH,
Ken

________________________________

From: Paul Hughes [mailto:phughes@pillardata.com] 
Sent: Friday, 20 April 2007 09:14
To: ips@ietf.org
Subject: [Ips] detection of failed sessions to allow re-login

I have a question about how a target can quickly detect session failures
so that a re-login can succeed.

Here's my scenario:

1) an initiator is booting from an iSCSI target
2) the initiator is using an iSCSI HBA to communicate with the iSCSI
target
3) the HBA BIOS creates the first session, discovers the boot LUN, and
reads the boot loader
4) the boot loader reads the kernel from the boot LUN
5) the kernel resets the iSCSI HBA while loading an HBA driver
6) the HBA driver attempts to create a new session

The problem I'm seeing is that the target is failing the login for the
new session because the target thinks the first session created by the
HBA BIOS is still valid (not in failed state).  The HBA reset was not
detected by the target soon enough for the target to know that the first
session is now in the failed state when the initiator attempts to login
and create the second session using the same InitiatorName, ISID,
TargetName, and TargetPortalGroupTag as the first session (with TSIH=0).
The target does not see a link down event because a switch is connected
between the HBA and the target port.  The target eventually detects that
the first session is failed when it sends a NOP-Out PDU and receives a
transport failure.  Unfortunately, this occurs too late and the boot
fails.

In my case the target is sending NOP-Out PDUs every 60 seconds.  I can
change that to 5 seconds, but I don't think that will fix every case.
Is there a better way for the target to determine that the first session
has failed so that a re-login will succeed on the first try?

Thanks,
Paul