LaCie failed drive alert

22 September 2010

In recent weeks we have been receiving extremely high volumes of failed LaCie NAS devices with the following hard drives:

Seagate 1.5TB SATA drive
Model number ST31500341AS

In many cases, all the hard drives appear to be developing bad sectors. We recommend you monitor your LaCie carefully. If you know you have these drives installed, please ensure your data is backed up onto another device. Do not treat LaCie or any NAS device as backup.

Back…

Posted in Glossary, News | Leave a comment

Data recovery RAID 0

Data recovery RAID 0

RAID 0 is pure, block-level striping. This means the data is broken into “blocks” – also known as “stripe size” which can be defined when initially building the array. Stripe size is normally chosen depending on the type of data stored on the array; a large stripe size for large files such as video, and smaller for general data and databases.

There is no redundancy in RAID 0; if only one of the drives fails, all access to the data will be lost. The only chance of recovery is to repair and recover the failed drive, after which RAID rebuild procedures will need to take place.

Retrodata’s engineers have industry-leading expertise at recovering RAID 0 failures. Not only are we probably the quickest option available to recover RAID 0 arrays, but we have a flawless track record dating back to June 2009. We have not failed to recover a single RAID 0 array (nor that with any other RAID level) since then.

Posted in Glossary, News | Leave a comment

RAID Recovery RAID 30

RAID 30 utilises the redundancy and high performance of RAID 3, and the performance of RAID 0. It consists of two servers, each configured with RAID 3, and then striped with RAID 0.

Each of the two servers can continue functioning with the loss of a single drive, although the RAID will run in degraded mode pending replacement of the faulty drive and a RAID rebuild. Most servers will “drop” a drive even if it develops a single excessive sector and once it is replaced, a RAID rebuild can take a long time – even days – during which period if another drive develops bad sectors or fails, it will result in a critical failure from which recovery will only be possible by RAID recovery specialists.

Posted in Glossary, News | Leave a comment

Windows Server recovery

More information on Windows Server Recovery.

Retrodata offer a full Windows Server recovery and Server Rebuild service. We work with all versions of Microsoft Server, including the following:

  • Windows NT Advanced Server 3.1
  • Windows NT Server 3.51
  • Windows NT Server 4.0
  • Windows 2000 Server
  • Windows Server 2003
  • Windows Server 2008

If your server has failed, if it is already powered off then we recommend you leave it switched off and contact us to discuss the best course of action to reduce the chance of further data loss. If a RAID rebuild is in progress and appears to be failing, contact us as soon as possible as the rebuild may be overwriting your data.
Typical Windows Server failure scenarios we recover from

  • RAID array failure
  • RAID controller failure
  • RAID Rebuild failure
  • Firmware update failure
  • Hard drive failure
  • Boot failure
  • User error
  • RAID reinitialised
  • Power surge
  • Data corruption
  • File system corruption

Retrodata are not only data recovery engineers; we have highly skilled software engineers and for businesses running powerful servers with complex configurations, we offer a Full Server Rebuild service for many operating systems.

Full Server Rebuild

We will rebuild your server following the recovery so that it is returned to you in the same state as before the failure. All your applications, data, account and active directory settings (where applicable) will be intact. This can save days of critical downtime.

It is charged extra, but we will be able to quote only after the recovery has taken place. You will not be under any obligation to proceed with the rebuild option.

Posted in Glossary, News | Leave a comment

Hard Drive Failure

Hard drives fail!

Almost every hard disk that does not get replaced or disposed of before it’s allowed to live a full life will probably unexpectedly fail at some point, due to any number of reasons. Poor quality manufacturing, or a manufacturing defect. Overheating or other physical damage. Owing to the precision involved in modern hard drive manufacture, they are extremely delicate instruments and require careful handling and treatment.

overheating

Overheating

Overheating is responsible for many hard drive failures we encounter as a data recovery company. However, as the availability and popularity of portable devices increases and practically everything from PDAs to iPads and laptops contains a hard drive, we are now seeing a reversal of overheating-related failures in favour of dropped, shocked or otherwise-neglected storage devices.


Server-grade computers make an astonishing noise when their fans are running at full tilt. This is not designed to annoy the System Administrator – the fans are simply doing their job and keeping the hard drives and internal components at a safe operating temperature.

Consumers are increasingly demanding totally silent computer storage – yet are blind to the effect this “passive cooling” has on hard drives. NAS storage devices are being located in hot office cupboards. Home users’ external hard drives are sandwiched between two tomes on the bookshelf, fully insulated against any ventilation. Most laptop users will quite happily leave their computers on the sofa, or on their bed – even for periods of hours, powered on and charging, blissfully unaware of the potential damage (and even fire hazard) posed.

Passive cooling DOES NOT WORK. Manufacturers of external drives and NAS units (LaCie and Maxtor are both culprits, as are others, of palming off passively-cooled drives as reliable backup) are lying to you when they make these claims. We know – we test these things. In the photograph above is an external drive. The two hard disks have been mounted almost touching; there is scarcely a millimetre clearance between the drives and the top cover. There is no active cooling. Small wonder, then, that one of the drives overheated and caught fire.



enterprise hard disk failures error correction

Enterprise drives and RAID storage systems

One cause of failure that is not widely known relates to the hard disk’s method of error correction. When a modern desktop drive encounters a bad sector (a tiny, “bad” area on the physical recording medium) it will mark that area as bad and will no longer use it. However, this process can take up to 60 seconds. During this reallocation process, the hard drive is effectively hiding itself from the rest of the system. Now, RAID controllers are often programmed to “fail” a non-responsive hard drive – but after only a few seconds of inactivity. As a result, the controller takes out that hard drive and marks it bad. For this reason, you should use only Enterprise-class hard drives for RAID storage.



hard disk firmware failure corruption

Firmware failure

A drive’s firmware can be considered to be its “translator” between the operating system and the physical storage medium. It “talks” to the computer and to the drive’s components. Sometimes it loses the plot, though; either because of a problem at the manufacturing stage (perhaps incorrectly programmed) or it encountered some obscure operation of the hard drive that it simply did not understand. Maybe the read/write head / slider was a little optimistic in how high it could fly above the platters (something controlled by the firmware, depending on ambient conditions and other reasons too complex to mention here) and the firmware gave up on itself, rendering the drive inaccessible. Not so long ago, Seagate had a spate of firmware failures with their 7200.11 series hard drives. In their favour, they did correct the issue – but not before many users had emptied their pockets and took out a second mortgage to have the data professionally recovered.


damaged read write heads hard disk

Physical damage

This is an area that is flying up the charts of Main Causes of Drive Failure.
In an age where anyone old enough to walk is carrying around more computing power than is required for a manned space mission to Mars, mangled laptops and MP3 players are rushing through the doors of data recovery companies. Hard drives aren’t designed to be banged about, or worn on the person during a rugby match. They are designed to be cocooned in peace and quiet (albeit with plenty of ventilation.) This picture is a close-up of the inside of a laptop hard drive that was dropped whilst switched off. As you can see, the read/write heads (the tiny black rectangle is called a “slider” – and the read/write heads themselves are embedded into the trailing edge of the slider) have been severely damaged.

And now to issues with laptop and portable computers. In the movies, you will see some office exec walking about with his laptop computer, slamming it down on someone’s desk to show them something on the screen. Unfortunately, this happens in real life, too – and a drive is not accustomed to that sharp knock which, if the drive is running, is very likely to cause some media damage caused by a head crash.

Desktop, tower, server computers – should all be left in situ and not moved when powered on. Not even slid aside gently to gain access to the rear. Always power them down first. Fortunately, most mission-critical servers and enterprise systems are rack-mounted, and situated on concrete floors.

Physical damage also occurs when a component fails; either due to a manufacturing anomaly (for this, read “the use of sub-standard components, bought in as cheaply as possible”), or simply a bad batch of drives manufactured in China, and which did not go through the correct (or any) process of quality control. This is the type of physical damage we experience most often with high-end servers and storage systems. IT Administrators in larger corporations are usually extremely responsible individuals, and it is purely bad luck on their part.

Posted in Glossary | Leave a comment

Unrecoverable Xsan recovered

recover crashed apple xsanIn early 2010 we successfully performed what we believe to be the first Xsan data recovery procedure that involved working on only one of the two striped RAID arrays, totally independent of the second member of the array and the Metadata server.

Our client, a London University, had a crashed Xsan storage network that consisted of two Infortrend 16-bay storage servers, each configured as RAID 3, and then striped to RAID 30. The Metadata server was a third Infortrend bay, consisting of 24 hard drives. All were connected using fibre channel and gigabit ethernet.

The Xsan system had suffered from a power failure. The Metadata server and one of the Data servers shut down cleanly. However, the second Data server’s uninterruptible power supply had unknowingly been disconnected, with the result that two drives were severely damaged by the power spike and subsequent failure.

One drive had already failed from excessive bad sectors, but the system failed to flag this correctly and, after copying the drive to the global spare, it then marked the failed disk as a global spare. This meant that the failure of only one more drive would bring the system down – but we already had a situation with multiple drive failure.

Once we’d repaired the two failed drives and the two with bad sectors, we made secure backups of all the drives and started the long process of the file system rebuild. It had to present identically to the rest of the Xsan as it did before the failure – all logical disk and logical volume ID numbers had to match the original configuration, and this we achieved by forensically editing individual bits on the hard drives to force the new IDs.

We then delivered the recovered RAID array member to our client and, with some final on-site configuration, we managed to bring the entire storage network back online with absolutely no data loss whatsoever.

It’s important to bear in mind that the University’s own system suppliers and consultants had deemed the data unrecoverable, as did Infortrend Technical Support, and even Apple Xsan Technical Support.

This proves that, even if you have been told by the highest authority that your data is not recoverable, this is almost never the case, in our experience. In fact, the statements by the above Technical Support consultants could easily have convinced the client to abandon recovery, and re-install from scratch – with complete loss of data. We were told by everyone that nothing could be done. Our expertise with low-level data recovery and forensics techniques proved them all wrong.

Posted in Glossary, News | Leave a comment

Incorrect RAID drive substitution

When a disk in a modern multi-drive RAID array fails, if it has a hot-spare installed, it will “pick” that drive up, mark the bad drive as such, and rebuild the array using the hot spare. The old drive will be marked as bad, and should be replaced as soon as possible.

The problems with drive substitution start occurring when there is no hot spare installed, or if the controller does not automatically proceed as above.

It is then up to the RAID administrator to remove the bad drive, insert a new drive, and manually start the rebuild.

Sometimes, however, for whatever reason, the admin will remove a perfectly good drive and replace it with another, new, perfectly good drive and force a rebuild. Clearly, if the controller allows this to proceed, there is going to be massive corruption and very likely permanent data loss.

Posted in Glossary | Leave a comment

Full System Rebuild

We understand the meaning of downtime and what it can do to a business. We’ve witnessed the frustration, desperation and feeling of utter futility an IT Administrator can experience, having no idea how long a computer or storage failure will take to be rectified, or if indeed it can be recovered at all.

We aim to remove that unnecessary stress and worry within two minutes of you picking up the ‘phone to us.

In the event your mission-critial computer, server or storage system should fail, we can often offer a Full System Rebuild service.

This entails rebuilding the system to the point just before it failed. All your data, your configuration, settings, accounts, security and even active directory will be intact.

If for example you have a computer controlling a busy CNC mill and it fails, we’ll get your operating system, CNC programs and control data back to where it should be, allowing you to get back to engineering as quickly as possible.

If you have mail servers, database servers or web servers, or if you’re hosting other companies’ websites, we’re almost certainly the quickest and most secure option you have available for emergency data recovery.

Posted in Glossary | Leave a comment

Non standard RAID Recovery

Non standard RAID recovery includes the following RAID levels:

  • BeyondRAID by Data Robotics (DROBO)
  • Double Parity RAID (also known as Diagonal Parity, or RAID 6)
  • RAID Z (used by the Sun ZFS file system)
  • RAID Z2 Double Parity RAID-Z by for Sun systems’ ZFS file system
  • Drive Extender (Microsoft Windows Home Server)
  • IBM ServeRAID (Proprietary IBM controller supporting RAID 0, RAID 1, RAID 1E, RAID 5, RAID 5E, RAID 00, RAID 10, RAID 1E0 and RAID 50)
  • RAID-DP (double parity RAID by NetApp)
  • RAID-K (Kaleidescape)
  • RAID S (proprietary variant of RAID 5 adapted by EMC Corporation for their Symmetrix storage arrays)
  • RAID 5E (Enhanced RAID 5)
  • RAID 5EE (Integrates the capacity of the hot spare into the array)
  • RAID 6E (no dedicated hot-swap drive)
  • Linux MD RAID 10 (RAID 0 with redundancy)
  • RAID 0+3
  • RAID 30
  • RAID 100 (also known as RAID 10+0)
  • RAID 50 (also known as RAID 5+0)
  • RAID 51 (mirrored RAID 5 arrays)
  • RAID 60 (RAID 6 with striping)

Our RAID Recovery System enables us to locate and extract data and repair failed RAID arrays that have sufferred from RAID rebuild failure, severe file system corruption, partial overwrite, and all logical failures.

Posted in Glossary, News | Leave a comment

Perfect RAID recovery track record

perfect raid recovery successThe month of August 2010 saw a major milestone for Retrodata. Since June 2009, we have not failed to recover a single RAID storage system – regardless of the method of failure. This included a 56-drive Apple Xsan Storage Area Network with 4 failed drives which was declared non-recoverable by everyone from the top Xsan engineers at Apple, the manufacturer of the hardware storage, and the client’s own IT suppliers. The client had gone so far as to tell us he was about to format the system and start over.

Despite the doom and gloom painted by all, Retrodata managed a perfect recovery, with no data loss.

Another recovery involved a Promise Vtrak with three failed drives and one failing drive. The IT manager swapped out the incorrect disk for a RAID rebuild, which then appeared to hang for 24 hours. We told our client to immediately remove power from the JBOD, which was then delivered to us.

The entire server itself was recovered and restored. When it was returned and attached to the Xsan network, it booted immediately and they were back in production. No lengthy data transfers took place; the entire recovery was performed in situ, without the need for their Xserve controller or the metadata server.

We have encountered corrupt XFS RAID systems and corrupt Virtual servers, proprietary RAID devices, numerous failed Linux boxes, corrupted and damaged UNIX RAID arrays, and many extremely damaged RAID arrays and corrupt operating systems and file systems that other data recovery companies have not been able to recover.

We would welcome any Data Recovery company with an apparently terminal RAID storage device to let us assist. This is a far better option than allowing your client’s data to be forever lost.

Posted in Glossary, News | Leave a comment