In this three-part series, the Chaos Manor gets an upgrade of its data storage with the installation of a new Network Attached Storage system. Along the way, several old systems are deemed retirement candidates. We start with Part 1, where the project and features are discussed amongst the Chaos Manor Advisors:
The Chaos Manor Advisors determined that Dr. Pournelle’s various systems needed a cleanup, consolidation, and upgrade of at Chaos Manor. There was a need for a better backup and archiving process, along with some retirement of systems. Consolidating systems and data storage in light of Dr. Pournelle’s mobility problems was another objective, now that the new Chaos Manor wireless network was in place.
One aspect of this consolidation was to create a Network Attached Storage system that had RAID capability to serve as centralized data storage. A backup process was also needed. And there was a need for the archive to be protected against any encrypting-type malware. Although Dr. Pournelle practices ‘safe computing’ that reduces that risk, a protected backup of his data, including past and future books, was deemed to be a good objective for this project. We thought that this similar project would be interesting for Chaos Manor Reviews readers.
Chaos Manor Advisor Eric Pobirs, an experienced technician that works with Dr. Pournelle’s son Alex (as well as doing some freelancing) took the lead on this project. A discussion among the Advisors discussed the configuration and issues involved in creating this NAS/RAID system.
Eric started out with his general objectives:
Well, the idea was to have capacity wildly in excess of need to reduce the amount of management concern it generates once it has been configured fully. The difference in cost for the somewhat safer lower capacity drives is fairly minor and they’d still be at risk for the Rebuild+URE [more on this below] problem. So doubling up on the 4 TB drives in RAID 6 likely works out better than say, 2 TB drives in RAID 5, for a difference of around $100 for the set.
Part of this was offering the example of just how amazingly cheap this stuff has gotten over the years and how a bit of research can lead to better results without massive expense. Now, some might regard this investment as massive expense but creating organized bytes is Jerry’s livelihood, so this is just insurance. Also, the use of better qualified equipment should win back some of the expenditure in reduced power costs for the house. A few percent here, a few percent there…
After some thought, Eric came up with the outlines of a plan.
NAS and Backups
The main objective of this project was to determine how to configure a backup system for all of the computers at Chaos Manor. Backups are important, and Dr. Pournelle has lots of data: massive amounts of emails and his equally massive book files.
After a survey of the possibilities, Eric decided on a Network Attached Storage (NAS) system that consisted of Netgear ReadyNAS 104 4-bay NAS (http://goo.gl/lYAl2p )
Advisor Brian Bilbrey has much experience with large systems, being a senior systems administrator. He discussed the basics of the various types of RAID systems, beginning with an explanation of an “URE”, an ‘Unrecoverable Read Error’:
Magnetic disk storage sizes are now on the same order of magnitude as the quoted bit error rate for reading data from the disk. That is, if there are 4 TB on a disk, the chances of 1 in 10^14 Unrecoverable Read Error are pretty small. You don’t read a lot from your drive at any given time.
However, if you have an array of five 4 TB disks in a RAID 5 configuration, then you’ve got approximately 4 disks worth of data and one disks’ worth of calculated parity spread across all of the disks. If any ONE of those disks fails, then when you put in a new disk to rebuild that array, ALL 16 TB of bytes will be read to rebuild. There’s a significant chance that during that process, a read will fail. At that point, the array cannot be rebuilt. Data done and gone; restore from proper backups.
I recommend RAID 6 for 4 or more disks, and 2 or 3 way mirrors for 2 or 3 disk systems. Yes, you’re “throwing away” storage. Or, to put it another way, you’re managing the risk of data loss. With RAID 6, during the rebuild, you can lose a disk, suffer a URE during the rebuild, and still have all your data.
Personally, I also buy Enterprise-grade disks, because there’s usually another factor of 10 added to the URE reliability. For more info, use your favorite search engine and the phrase “URE RAID 5” without the quotes.
With that explanation, Brian continues:
One thing I’m pondering in light of the Rebuild+URE problem is whether a RAID 10 might be safer. This would be a two-drive stripe set mirrored by the second set of two drives. This cuts the raw capacity from 16 TB to a ‘mere’ 8 TB, which is still a vast capacity for storing primarily text files. In this case, recovering from a drive failure is more a matter of copying than a complex rebuild and the NAS should keep working with the intact set until the replacement drive is installed.
The Netgear box will also do RAID 6 with the four drives but as the capacity works out the same I find myself wondering what advantage remains, if any. RAID 10 may have the advantage in continued functionality after the loss of a single drive, whereas I have the impression a RAID 6 would be out of commission until the failed drive is replaced and the volume rebuilt.
In 234 pages the manual has remarkably little to say about drive failures, how to handle them, and how different configurations affect this.
Advisor Peter Glaskowsky agreed with Brian, adding:
To add to Brian’s reply, a RAID 6 array not only keeps working after a single drive failure, it still has redundancy– it becomes the equivalent of a RAID 5 array. Even two drive failures in a RAID 6 array will not stop the array from working.
So if you have an effective RAID 6 option, that’s my recommendation too. I know it’s painful to lose half your capacity, but in the long run, that’s better than losing all your data.
Brian added some additional thoughts about the various RAID types:
RAID 6: Lose one drive, you’re running degraded, and can rebuild at leisure. IF there’s a bit read failure during the rebuild, you have the second parity to fall back on.
Lose two drives (or lose a second drive during the rebuild after the loss of a first drive) and you’re running degraded, with no backstop. If you lose a third drive while rebuilding against a two-disk failure, you’re dead.
RAID 10 (and friends): Lose one drive, you’re running degraded. Rebuild, and hope there’s no bit read failure.
Lose two drives, and if it’s half of one mirror pair, and the OTHER half of the other mirror pair, and you’re still running degraded. But after one drive failure, you have a one-in three chance of catastrophic failure during the rebuild, should there be a bit read error.
The point of spinning storage is to have large data available for immediate access. Periodic copies of this data to spinning rust that is then stored offsite with which to rebuild if you lose the RAID 6 is prudent.
One of the considerations of a RAID system is that it is more of a centralized storage area than a full backup/restore solution. As Eric noted:
This article and its links cover the nature of the problem:
In short, drive capacity has advanced far faster than reliability and it may not be possible for reliability to ever be high enough to overcome the odds of a crippling failure. This is why RAID cannot be regarded as a backup but merely a way to centralize data to be backed up.
In the next installment, a system is selected, and installation and configuration is begun. Let us know what you think in the comments below, and share this article with others.