Well THAT was annoying..  Last night I tried to replace a degraded drive in my array and it was not as simple as I had previously observed.  Here's what the array looked like before replacing the drive:
> /c6 show
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-6    DEGRADED       -       -       64K     2793.94   ON     ON     
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   931.51 GB SATA  0   -            ST31000528AS        
p4    OK             u0   931.51 GB SATA  4   -            WDC WD10EACS-00D6B0 
p5    OK             u0   931.51 GB SATA  5   -            WDC WD10EADS-00L5B1 
p6    OK             u0   931.51 GB SATA  6   -            WDC WD10EADS-00M2B0 
p7    DEGRADED       u0   931.51 GB SATA  7   -            WDC WD10EADS-00M2B0 
Note the drive isn't actually failed.  
Next I pulled the drive, and loaded the replacement.  Here's what tw_cli showed now:
# tw_cli /c6 show
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-6    DEGRADED       -       -       64K     2793.94   ON     ON     
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   931.51 GB SATA  0   -            ST31000528AS        
p4    OK             u0   931.51 GB SATA  4   -            WDC WD10EACS-00D6B0 
p5    OK             u0   931.51 GB SATA  5   -            WDC WD10EADS-00L5B1 
p6    OK             u0   931.51 GB SATA  6   -            WDC WD10EADS-00M2B0 
p7    OK             u?   931.51 GB SATA  7   -            WDC WD10EVDS-63U8B0 
What does "u?" mean in the Unit column?  It means that the controller is going to refuse to use it...
# tw_cli maint rebuild c6 u0 p7
The following drive(s) cannot be used [7].
Error: (CLI:144) Invalid drive(s) specified.
After hours of surfing the web, I never found a solution, but I didn't figure it out.  Since the original drive didn't outright fail, I decide to swap it BACK IN, and then forcefully remove it from the array, and then replace it.
#  tw_cli /c6/p7 remove
Removing /c6/p7 will take the disk offline.
Do you want to continue ? Y|N [N]: y
Removing port /c6/p7 ... Done.
# tw_cli /c6 show
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-6    DEGRADED       -       -       64K     2793.94   ON     ON     
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   931.51 GB SATA  0   -            ST31000528AS        
p4    OK             u0   931.51 GB SATA  4   -            WDC WD10EACS-00D6B0 
p5    OK             u0   931.51 GB SATA  5   -            WDC WD10EADS-00L5B1 
p6    OK             u0   931.51 GB SATA  6   -            WDC WD10EADS-00M2B0 
Now I replaced the old bad drive with the new drive.
# tw_cli /c6 show
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-6    DEGRADED       -       -       64K     2793.94   ON     ON     
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   931.51 GB SATA  0   -            ST31000528AS        
p4    OK             u0   931.51 GB SATA  4   -            WDC WD10EACS-00D6B0 
p5    OK             u0   931.51 GB SATA  5   -            WDC WD10EADS-00L5B1 
p6    OK             u0   931.51 GB SATA  6   -            WDC WD10EADS-00M2B0 
p7    OK             -    931.51 GB SATA  7   -            WDC WD10EVDS-63U8B0 
Ah, now that's better, it's no longer showing that "u?" in the Unit column.
Next I forced the rebuild using the new drive:
# tw_cli maint rebuild c6 u0 p7
Sending rebuild start request to /c6/u0 on 1 disk(s) [7] ... Done.
# tw_cli /c6 show
Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
------------------------------------------------------------------------------
u0    RAID-6    REBUILDING     0%(A)   -       64K     2793.94   ON     ON     
VPort Status         Unit Size      Type  Phy Encl-Slot    Model
------------------------------------------------------------------------------
p0    OK             u0   931.51 GB SATA  0   -            ST31000528AS        
p4    OK             u0   931.51 GB SATA  4   -            WDC WD10EACS-00D6B0 
p5    OK             u0   931.51 GB SATA  5   -            WDC WD10EADS-00L5B1 
p6    OK             u0   931.51 GB SATA  6   -            WDC WD10EADS-00M2B0 
p7    DEGRADED       u0   931.51 GB SATA  7   -            WDC WD10EVDS-63U8B0 
Hooray.  Rebuilding.  A few hours later the rebuild completed.
I also got the u? last night. Most annoying as it was for a drive I didn't touch. I had planned to pull the DEGRADED p5 drive and add a new p0 as the RAID 6 was only using 7 drives, but on inserting the p0 drive p1 has gone to u? and the auto-rebuild kicked in as two drives had failed. So annoying as it looks as though it will take 5 days to rebuild!
ReplyDelete/c2 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-6 REBUILDING 12%(A) - 256K 9313.17 RiW ON
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p0 DEGRADED u0 1.82 TB SATA 0 - SAMSUNG HD204UI
p1 OK u? 1.82 TB SATA 1 - WDC WD20EARS-00MVWB0
p2 OK u0 1.82 TB SATA 2 - WDC WD20EARS-00MVWB0
p3 OK u0 1.82 TB SATA 3 - WDC WD20EARS-00MVWB0
p4 OK u0 1.82 TB SATA 4 - SAMSUNG HD204UI
p5 DEGRADED u0 1.82 TB SATA 5 - WDC WD20EARS-00MVWB0
p6 OK u0 1.82 TB SATA 6 - WDC WD20EARS-00MVWB0
p7 OK u0 1.82 TB SATA 7 - WDC WD20EARS-00MVWB0
OK I figured it out guys! For future reference, you do this:
ReplyDeletep18 OK u? yada...
tw_cli /c0/p18 remove
tw_cli /c0 rescan
Now the drive should show back up but be part of a non-existent volume:
p18 OK u4 yada...
In my case I had u0 through u3 but u4 was "new".
tw_cli /c0/u4 del
And now I can use the drive.