Well THAT was annoying.. Last night I tried to replace a degraded drive in my array and it was not as simple as I had previously observed. Here's what the array looked like before replacing the drive:
> /c6 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-6 DEGRADED - - 64K 2793.94 ON ON
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p0 OK u0 931.51 GB SATA 0 - ST31000528AS
p4 OK u0 931.51 GB SATA 4 - WDC WD10EACS-00D6B0
p5 OK u0 931.51 GB SATA 5 - WDC WD10EADS-00L5B1
p6 OK u0 931.51 GB SATA 6 - WDC WD10EADS-00M2B0
p7 DEGRADED u0 931.51 GB SATA 7 - WDC WD10EADS-00M2B0
Note the drive isn't actually failed.
Next I pulled the drive, and loaded the replacement. Here's what tw_cli showed now:
# tw_cli /c6 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-6 DEGRADED - - 64K 2793.94 ON ON
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p0 OK u0 931.51 GB SATA 0 - ST31000528AS
p4 OK u0 931.51 GB SATA 4 - WDC WD10EACS-00D6B0
p5 OK u0 931.51 GB SATA 5 - WDC WD10EADS-00L5B1
p6 OK u0 931.51 GB SATA 6 - WDC WD10EADS-00M2B0
p7 OK u? 931.51 GB SATA 7 - WDC WD10EVDS-63U8B0
What does "u?" mean in the Unit column? It means that the controller is going to refuse to use it...
# tw_cli maint rebuild c6 u0 p7
The following drive(s) cannot be used [7].
Error: (CLI:144) Invalid drive(s) specified.
After hours of surfing the web, I never found a solution, but I didn't figure it out. Since the original drive didn't outright fail, I decide to swap it BACK IN, and then forcefully remove it from the array, and then replace it.
# tw_cli /c6/p7 remove
Removing /c6/p7 will take the disk offline.
Do you want to continue ? Y|N [N]: y
Removing port /c6/p7 ... Done.
# tw_cli /c6 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-6 DEGRADED - - 64K 2793.94 ON ON
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p0 OK u0 931.51 GB SATA 0 - ST31000528AS
p4 OK u0 931.51 GB SATA 4 - WDC WD10EACS-00D6B0
p5 OK u0 931.51 GB SATA 5 - WDC WD10EADS-00L5B1
p6 OK u0 931.51 GB SATA 6 - WDC WD10EADS-00M2B0
Now I replaced the old bad drive with the new drive.
# tw_cli /c6 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-6 DEGRADED - - 64K 2793.94 ON ON
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p0 OK u0 931.51 GB SATA 0 - ST31000528AS
p4 OK u0 931.51 GB SATA 4 - WDC WD10EACS-00D6B0
p5 OK u0 931.51 GB SATA 5 - WDC WD10EADS-00L5B1
p6 OK u0 931.51 GB SATA 6 - WDC WD10EADS-00M2B0
p7 OK - 931.51 GB SATA 7 - WDC WD10EVDS-63U8B0
Ah, now that's better, it's no longer showing that "u?" in the Unit column.
Next I forced the rebuild using the new drive:
# tw_cli maint rebuild c6 u0 p7
Sending rebuild start request to /c6/u0 on 1 disk(s) [7] ... Done.
# tw_cli /c6 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-6 REBUILDING 0%(A) - 64K 2793.94 ON ON
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p0 OK u0 931.51 GB SATA 0 - ST31000528AS
p4 OK u0 931.51 GB SATA 4 - WDC WD10EACS-00D6B0
p5 OK u0 931.51 GB SATA 5 - WDC WD10EADS-00L5B1
p6 OK u0 931.51 GB SATA 6 - WDC WD10EADS-00M2B0
p7 DEGRADED u0 931.51 GB SATA 7 - WDC WD10EVDS-63U8B0
Hooray. Rebuilding. A few hours later the rebuild completed.
2 comments:
I also got the u? last night. Most annoying as it was for a drive I didn't touch. I had planned to pull the DEGRADED p5 drive and add a new p0 as the RAID 6 was only using 7 drives, but on inserting the p0 drive p1 has gone to u? and the auto-rebuild kicked in as two drives had failed. So annoying as it looks as though it will take 5 days to rebuild!
/c2 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-6 REBUILDING 12%(A) - 256K 9313.17 RiW ON
VPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p0 DEGRADED u0 1.82 TB SATA 0 - SAMSUNG HD204UI
p1 OK u? 1.82 TB SATA 1 - WDC WD20EARS-00MVWB0
p2 OK u0 1.82 TB SATA 2 - WDC WD20EARS-00MVWB0
p3 OK u0 1.82 TB SATA 3 - WDC WD20EARS-00MVWB0
p4 OK u0 1.82 TB SATA 4 - SAMSUNG HD204UI
p5 DEGRADED u0 1.82 TB SATA 5 - WDC WD20EARS-00MVWB0
p6 OK u0 1.82 TB SATA 6 - WDC WD20EARS-00MVWB0
p7 OK u0 1.82 TB SATA 7 - WDC WD20EARS-00MVWB0
OK I figured it out guys! For future reference, you do this:
p18 OK u? yada...
tw_cli /c0/p18 remove
tw_cli /c0 rescan
Now the drive should show back up but be part of a non-existent volume:
p18 OK u4 yada...
In my case I had u0 through u3 but u4 was "new".
tw_cli /c0/u4 del
And now I can use the drive.
Post a Comment