Wednesday, May 4, 2011

EMC Symmetrix devices and SCSI reservations

The Symmetrix and Solutions Enabler documentation mention three types of SCSI reservations: "Exclusive", "Group" and "Persistent Group Reservations" (or "PGR"). Of these, the first two are always possible; i.e.: a host is always allowed to place an Exclusive or Group reservation on a device.

By default, a host is not allowed to place a PGR on a device. You need to explicitly set a flag on a device for it to accept PGR reservations. That flag is the "SCSI3_persist_reserv" flag, also known as the "PER flag" or the "PGR flag". Whether this flag is enabled on a device is visible in the output of "symdev show" on that device. Look for the line containing: "SCSI-3 Persistent Reserve". Without this flag, it's not possible to place a SCSI persistent reservation on the device.

When looking at the SCSI standards however, there are two types of SCSI reservations, not three:
  • Reservations managed by the RESERVE and RELEASE SCSI commands. These commands are deprecated in the newest SCSI standards, but they are still very widely used. I'll call these "old style" reservations.

  • Reservations managed by the PERSISTENT RESERVE IN and PERSISTENT RESERVE OUT SCSI commands. These commands allow for more control over reservations, and offer more possibilities. I'll call these "new style" reservations. They are to eventually replace old style reservations, but at the moment they are not that widely used. A Symmetrix device will only accept these commands when the "SCSI3_persist_reserv" flag is enabled for the particular device.


What Solutions Enabler calls "Exclusive" reservations, are old style reservations. These are visible with the command "symdev -sid xxx list -resv". "PGR" reservations are new style reservations. These are visible with the command "symdev -sid xxx list -pgr".(*)

It turns out there's some trickery involved regarding "Group" reservations. Old style reservations only allow for one initiator to reserve access to a device, and as you can imagine this is problematic in a multipath environment with multiple initiators. To solve this, when a host places an old style reservation on a device that is multipathed by PowerPath, PowerPath will translate this old style reservation to a new style reservation on the fly. In other words: PowerPath will translate RESERVE/RELEASE commands to PERSISTENT RESERVE IN/PERSISTENT RESERVE OUT commands on the fly. These translated commands result in "Group" reservations. These translated PERSISTENT RESERVE IN/OUT commands work irrespective of whether the flag SCSI3_persist_reserv is enabled for the device or not.

Apparently, PowerPath has some sort of "special handshake" with the Symmetrix, so it can use new style reservations even though these aren't enabled. (My guess would be that PowerPath uses a well known reservation key.) The Symmetrix recognizes when a new style reservation is placed by PowerPath, and will always allow these. The resulting reservation is called a "Group" reservation. Any other (non-PowerPath) use of the new style reservation commands results in a normal PGR, but only when the PGR-flag is enabled.



(*) Depending on your microcode and software versions, displaying reservations can sometimes be a bit erratic. If you encounter unexpected results, be sure to consult the EMC knowledgebase on Powerlink.

Friday, November 12, 2010

Windows 2008: VSS - Deleting shadow copies results in offline disks

When using a VSS Hardware Provider on Windows 2008, I noticed that under certain circumstances, deleting shadow copies resulted in offline disks in Disk Manager and diskpart.

A "DELETE SHADOWS <xxx>" command in diskshadow.exe would always work, but a subsequent "LIST DISK" in diskpart.exe would show a disk as "Offline". Also, the VDS service would often freak out when this happened: it could just outright crash with a segmentation violation, or it would just complain in the event log.

Manually performing a rescan (using "Rescan Disks" in Disk Manager, or issuing the "rescan" command in diskpart) would always clear out the offline disk.

After an interesting support case with both the vendor of the VSS Hardware Provider as well as Microsoft, the explanation turned out to be quite simple.

When deleting shadow copies, it's the job of the hardware provider to instruct the storage array to mask away the LUNs containing the to-be-deleted shadow copy. After the hardware provider is done, Windows will automatically perform a disk rescan to get rid of the now no longer visible LUNs.

When using a storage driver of the Storport model, STORPORT.SYS will be involved in the disk rescan. It turned out that storport.sys has an undocumented cooldown on performing these disk rescans: after performing one, storport.sys will ignore any subsequent disk rescans for a period of roughly 30 seconds, in some cases possibly even up to 5 minutes!

So, what happens is (simplified):

  • A shadow copy is deleted, which results in a disk rescan. Everything works fine and the deletion is processed normally.
  • Some seconds later, another shadow copy is deleted. This also results in a disk rescan, but since storport.sys is still in its cooldown period it'll silently ignore this second rescan.
  • The LUN of this second shadow copy is now no longer visible to the system, but since storport.sys ignored the rescan Windows still thinks the LUN is there. Since Windows did unmount the volume successfully, the LUN is  marked offline. This causes some components to get confused, for example VDS.
  • After the storport.sys cooldown expired, any disk rescan will clear out the offline LUN.
 Note deleting several shadow copies in one go works just fine, e.g. using the "delete shadows set <xxx>" or "delete shadows all" command in diskshadow.exe. This will not trigger the problem: VSS will process the entire list of shadow copies, and only then one single disk rescan is performed to clear out all LUNs underlying the entire set of deleted shadow copies.



The workaround is simple: make sure that there are at least five minutes between each delete operation.