RAID Rebuild Failure

Where to start with this one? We see it all the time.

Your redundant RAID fails. (I specifically mention redundant, because there’s no such thing as a RAID rebuild with RAID 0 or a striped or spanned array.)

The RAID controller was taught a few things when it was growing up, and it informs you that one of the drives is failing. You replace the drive, the controller acknowledges this, and promptly starts rebuilding the array.

Sure, it takes time – especially if you have 20TB-worth of storage. A couple of days, even. And it’s all being handled by the computer, so nothing’s could possibly go wrong. Or is it?

RAID rebuild is a lengthy, intensive, stressful process for hard drives and it exercises them like they’ve never been stressed before. During a rebuild, a second drive can fail – and often does. Or there’s a power cut (you really should never use a server without a decent UPS; we’re talking APC, not Belkin or other, low-end products) and the process is halted. (At this point, do not try to restart the rebuild; your only option to recover your data is a very experienced data recovery company that specifically specialises in RAID recovery; this will be a tough one.)

Unfortunately, almost all users will try to restart a RAID rebuild and this is where we encounter arrays that are totally unrecoverable. Terminally so. As we so often mention, users are the main cause of total data loss – and RAID rebuilds are right up there near the top of the list.

This entry was posted in Glossary. Bookmark the permalink.

Comments are closed.