Wednesday, October 25, 2006

NAS/RAID for dummies

No comments:
On another forum, someone asked about a basic primer regarding RAID drive arrays and NAS (Network Attached Storage) systems. For the benefit of the rest of the world, here is my response (paraphrased to be more generic, since the original post was to a Mac forum.)


NAS basics

First off, despite the way they are marketed, a NAS is not an external hard drive. It is what the name implies - Network Attached Storage. The NAS has a special-purpose computer built-in. It appears to your computer as a file server on your LAN. You attach it to an Ethernet hub/switch (which your computer will also be attached to.)

If it works properly, your computer should see the NAS as a network file server. On Mac OS X, you should be able to click on the Network icon in the Sidebar of a Finder window, locate it and mount its volume(s) on your desktop. On Windows, you should be able to open the Network Neighborhood/My Network Places icon, locate it and mount its volume(s).

Once the drive is mounted, it should behave like any other volume (hard drive, flash card, etc.), except, of course, that you probably won't be able to boot from it. It's still a network server, not a local hard drive.

When selecting an NAS, be sure it has good support for your operating system's network protocol. For Mac OS, this is AppleShare. For Windows, this is Windows File Sharing. For UNIX, this is NFS. Most operating systems can connect to "foreign" servers, but they won't work at peak efficiency, and may end up operating with some quirks (like not supporting your OS's entire range of characters in filenames.)

Expect an NAS to be slower than a local hard drive. 100M Ethernet is significantly slower than ATA, SATA and FireWire drives. It might even be slower than some USB drives. If your NAS supports Gigabit Ethernet (and your LAN is GigE, of course), then you might not notice a significant speed difference.


RAID basics

When people speak of RAID, they are typically referring to one of the following technologies:

Name Description
RAID-0 RAID level 0, aka "striping"
RAID-1 RAID level 1, aka "mirroring"
RAID-5 RAID level 5 - striping with distributed checksums

RAID levels 2, 3, and 4 are rarely used, and will not be described here.

RAID-0, aka "striping" is not really a RAID system, because it provides no redundancy. Your data is spread across all the drives in the array. If any drive fails, you've lost all the data in the array. (In other words, the overall reliability of the array is less than that of a single drive!) RAID-0 has the advantage of maximizing the storage capacity of the array, and increasing performance (since each drive only has to read/write half the data), but at the cost of reduced reliability.

RAID-1, aka "mirroring" does just what the name implies. Every piece of data is written to both drives - one is an identical copy of the other. If one drive fails, it is automatically removed from the array, and the remaining drive is used instead. RAID-1 offers redundancy (making the reliability of the array greater than that of a single drive), but at the expense of 50% of your storage capacity, since all data is written to both drives.

For small RAID arrays (those with only two drives), RAID-1 is the only possible mechanism for providing redundancy. For larger arrays (with more drives), it is inefficient and RAID-5 should be considered instead.

RAID-5 is what large RAID arrays typically use. Error-recovery data is generated for all your data, and the combined data (your data and the recovery data) is striped across three or more drives. If any single drive fails, the entire array remains functional. If two drives fail, however, all data in the array is lost. The capacity of a RAID-5 array is the total capacity of all-but-one of the drives.

For example, suppose you have five drives in a RAID-5 array. When you write data to the array, 25% of your data will go on each of four drives, and error-recovery data will be written to the fifth. (Which drive gets each piece of data is irrelevant. It actually rotates across all the drives in the array, in order to maximize performance.)

Now suppose a drive in this array has failed and you try to read the data back. If the failed drive is the one with the error-recovery data, the array can get your data from the remaining four drives (since they are the ones that actually contain it). If the failed drive is holding data, the data on the remaining three drives, combined with the error-recovery data, can be used to reconstruct what was on the failed drive.

In addition to these RAID levels, there are also two common "hybrid" RAID levels: RAID-0+1 and RAID-1+0. These, as the names imply, involve both mirroring and striping, to provide increased capacity with redundancy.

RAID-0+1 is a mirror of a striped drive array. Your data is striped across a set of two or more drives (a-la RAID-0), then the striped-set of drives are mirrored to another set of identical drives (a-la RAID-1). If a single drive fails, its entire striped-set gets hosed, but the mirror of the striped-set keeps on working. If multiple drives fail, and they all belong to the same striped-set, the array can keep on working. If multiple drives from different striped-sets fail, the entire array is lost.

RAID-1+0 is a striped set of mirrored drives. Each drive is mirrored against a matching drive. Your data is striped across all of the mirrored-pairs. If a single drive fails, its mirror-partner drive keeps the array working. If multiple drives fail, but each is in a different mirrored-pair, the array keeps on working. If multiple drives fail, but two are in the same mirrored-pair, the entire array is lost.

While these hybrid RAID levels are often used for inexpensive software-based RAID arrays, they are wasteful of storage, because of all the mirroring. RAID-5 is much more efficient, and is preferable, where possible.

It should be noted that with all of the RAID levels that support redundancy (all but RAID-0), you can replace a failed drive with a new, blank drive, and the RAID controller will rebuild its contents from the data stored on the other drives in the array. Good RAID systems will allow you to do this without turning off the power or even interrupting service to users on the network (a feature known as hot-swappable drives.)

Of course, when a drive fails, even if the array remains operational, it will be operating without any redundancy - an additional drive failure can trash the entire array. So it's very important to keep a spare drive on-hand. This way, when one fails, you can quickly swap it for the spare. This practice is known as keeping a "cold spare".

This is compared to a "hot spare" drive. A hot spare is a spare drive that is attached to the RAID controller. It is not part of any configured array, but sits idle (and possibly powered-off) until a drive in the array fails. When this happens, the RAID controller immediately removes the failed drive from the array and starts using the hot-spare in its place.

The point behind a hot-spare drive is that when there's a failure, the amount of time the array goes without redundancy can be reduced to the absolute minimum amount of time.

Of course, once this happens, you will still want to replace the failed drive, in order to make another hot-spare available, in case another drive fails. Because ordering new drives may take a week or two, some large RAID systems will allow two or more hot-spare drives.


Capacities

Clearly, the overall capacity of a RAID array will be a function of the number of drives, the capacity of the drives, the number of drives used as hot-spares, and the RAID level configured. Some examples follow:

An array of only two drives can not support RAID-5. It must be configured as either RAID-0 (no redundancy) or RAID-1 (mirroring.) If the drives hold 500GB each, the RAID-0 array will be able to hold 1TB, and the RAID-1 array will be able to hold 500GB.

An array of 3 drives can be used either as RAID-0, or RAID-5. Using our example of 500G drives, the RAID-0 array will be able to hold 1.5TB (but remember, there's no redundancy, so if any one of the three fails, all data is lost!). The RAID-5 array will be able to hold 1TB (remember, the overall capacity of RAID-5 is that of all-but-one of the drives.)

An array of 4 drives can be configured in four different ways: RAID-0, RAID-0+1, RAID-1+0, or RAID-5. Using 500G drives, RAID-0 will provide 2TB (but with no redundancy), RAID-0+1 and RAID-1+0 will provide 1TB, and RAID-5 will provide 1.5TB.

If hot-spares are used, just don't count them when figuring out the array capacity.


Conclusion

NAS and RAID terminology may be confusing to novices, but the concepts are not overly complicated. The important points to remember are:

  • NAS devices are network file servers. They run a self-contained operating system, but are otherwise the same as a server you might set up on your own using a separate computer.
  • RAID comes in three main flavors: RAID-0, RAID-1 and RAID-5
  • RAID-0 maximizes the storage capacity of the array, but offers no redundancy. If one drive in a RAID-0 array fails, all your data is lost.
  • RAID-1 mirrors drives. This increases reliability, but at the expense of storage capacity. The overall capacity of a RAID-1 array is 50% of the aggregate capacity of the drives. It is your only choice if you only have two drives and want redundancy.
  • RAID-5 is a happy medium between RAID-0 and RAID-1, if you have three or more drives. Single-drive failures will not make the array fail, and the overall capacity is greater than RAID-1.

Keep in mind, however, that RAID systems are not a substitute for making regular backups of your system. RAID (all levels other than 0) will protect you against single drive failures, but backups protect you from many other things, including:

  • Simultaneous failure of multiple drives (e.g. a bad power surge that fries the entire array.)
  • Accidental (or deliberate) deletion or corruption of files.
  • Software bugs, viruses, and worms
  • Natural disasters (fire, flood, hurricane, etc.)

Friday, October 20, 2006

The reason for integration testing

No comments:
I am a do-it-yourselfer when it comes to PCs. Nearly all of my PCs have been home-built. Well, not really home-built, since I don't solder chips, but home-assembled - I buy boards, drives, cases, etc., and assemble it into a system.

I do it because it's a hobby, and I enjoy doing the work. Sometimes, however, I read articles about people choosing to build their own PCs for professional work. Often because some corporate finance-type person thinks he can save money over buying complete systems from Dell or HP. (And the idea that you can save money by building your own systems hasn't been true for many years, but many people haven't figured this out yet.)

Similar principles apply (more frequently) to people who perform their own upgrades - adding/replacing memory, video cards or hard drives.

What these people fail to realize is that assembling a computer system for a production environment involves much more than simply slapping parts together. Even if there are no software-compatibility issues (not always the case), there are sometimes obscure problems that only become apparent after extensive integration testing. All the major PC manufacturers test their systems before offering them for sale. Very few hobbyists do much testing beyond what's necessary to get their favorite game up and running. And I doubt many corporate finance people realize the expense (in terms of time spent) needed to thoroughly test a completely custom-designed system.

(FWIW, many corporations have standardized software environments, and even this requires extensive integration testing as PC manufacturers introduce new systems. It may take a month (or more, if there are problems) to ensure that a new computer is compatible with corporate-standard software. Now imagine the time that would be required if the hardware itself also had to be thoroughly tested.)

But this is nothing new. I mention it simply because recent articles report of a problem that is an almost textbook case for why integration testing is necessary.

For the last several years, Apple's laptop computers have sported a "sudden motion sensor" (SMS) that senses acceleration and vibration. The system software uses this sensor to detect sudden motion (such as if the computer is dropped) and parks the hard drive heads, so the drive won't be damaged on impact.

Western Digital's new Scorpio line of hard drives also has a motion sensor, serving the same purpose as Apple's.

So what happens when you put a Scorpio drive in a MacBook? The two systems step on each other. When the computer is jostled, the drive parks itself, and the Apple SMS system sends the command to park the drive. The SMS system get an error back from the drive (the drive probably goes off-line while it is in this auto-park state), assumes that the drive has failed, and panics the OS kernel.

One perfectly good drive plus one perfectly good computer combine to form an unstable system. This is a problem that very few people could ever predict without actually assembling and testing a completed system. It is something that a manufacturer would (or at least should) test for as a part of deciding what brand/model drive to bundle, but is something the rest of us will not be able to figure out in advance while shopping for a new drive.

Friday, October 06, 2006

Junkscience.com -- 100 things you should know about DDT

No comments:
This is a fascinating an eye-opening article.

It would appear that the worldwide decision to ban DDT was based almost entirely on political pressure and junk science. The scientific facts (both known today and what was known then) indicate that the alleged dangers do not exist.

But millions of children die from malaria every year as a result of the ban.

Quantum information teleported from light to matter - Yahoo! News

1 comment:
Read the linked Reuters article.

This is incredibly cool. And we're one step closer to Star Trek.