A "DELETE SHADOWS <xxx>" command in diskshadow.exe would always work, but a subsequent "LIST DISK" in diskpart.exe would show a disk as "Offline". Also, the VDS service would often freak out when this happened: it could just outright crash with a segmentation violation, or it would just complain in the event log.
Manually performing a rescan (using "Rescan Disks" in Disk Manager, or issuing the "rescan" command in diskpart) would always clear out the offline disk.
After an interesting support case with both the vendor of the VSS Hardware Provider as well as Microsoft, the explanation turned out to be quite simple.
When deleting shadow copies, it's the job of the hardware provider to instruct the storage array to mask away the LUNs containing the to-be-deleted shadow copy. After the hardware provider is done, Windows will automatically perform a disk rescan to get rid of the now no longer visible LUNs.
When using a storage driver of the Storport model, STORPORT.SYS will be involved in the disk rescan. It turned out that storport.sys has an undocumented cooldown on performing these disk rescans: after performing one, storport.sys will ignore any subsequent disk rescans for a period of roughly 30 seconds, in some cases possibly even up to 5 minutes!
So, what happens is (simplified):
- A shadow copy is deleted, which results in a disk rescan. Everything works fine and the deletion is processed normally.
- Some seconds later, another shadow copy is deleted. This also results in a disk rescan, but since storport.sys is still in its cooldown period it'll silently ignore this second rescan.
- The LUN of this second shadow copy is now no longer visible to the system, but since storport.sys ignored the rescan Windows still thinks the LUN is there. Since Windows did unmount the volume successfully, the LUN is marked offline. This causes some components to get confused, for example VDS.
- After the storport.sys cooldown expired, any disk rescan will clear out the offline LUN.
The workaround is simple: make sure that there are at least five minutes between each delete operation.