History of RAID
The term RAID was coined in 1987 by David Patterson, Randy Katz and Garth A. Gibson. In their 1988 technical report, “A Case for Redundant Arrays of Inexpensive Disks (RAID),” the three argued that an array of inexpensive drives could beat the performance of the top disk drives of the time. By using redundancy, a RAID array could be more reliable than any one disk drive.
While this report was the first to put a name to the concept, the use of redundant disks was already being discussed by others. Geac Computer Corp.’s Gus German and Ted Grunau first referred to this idea as MF-100. IBM’s Norman Ken Ouchi filed a patent in 1977 for the technology, which was later named RAID 4. In 1983, Digital Equipment Corp. shipped the drives that would become RAID 1, and in 1986, another IBM patent was filed for what would become RAID 5. Patterson, Katz and Gibson also looked at what was being done by companies such as Tandem Computers, Thinking Machines and Maxstor to define their RAID taxonomies.
While the levels of RAID listed in the 1988 report essentially put names to technologies that were already in use, creating common terminology for the concept helped stimulate the data storage market to develop more RAID array products.
WHAT IS RAID?
redundant array of independent disks, RAID is an assortment of hard drives connected and set up in ways to help protect or speed up the performance of a computer’s disk storage. RAID is commonly used on servers and high performance computers. The picture of the Drobo is a good example of a device using RAID technology. RAID uses several techniques used in RAID as explained below.
Spanning and software striping
Splitting information and writing it across multiple physical disk drives. RAID 0 utilizes this technique.
Duplication of data from one disk drive to another.
Duplicates the disk drive as well as the disk controller.
Data is cached in cache memory and writes to the hard drive as the disk drive becomes available.
Failed disk drives can be replaced and data can be placed back onto the disk drive while the remainder of the system is in operation.
Disk drive is automatically initialized into the array when another fails.
Synchronization of the rotation of all disk drives in the array allowing information to be written all at once.
What is RAID used for?
Originally, the term RAID was defined as redundant array of inexpensive disks, but now it usually refers to a redundant array of independent disks. RAID storage uses multiple disks in order to provide fault tolerance, to improve overall performance, and to increase storage capacity in a system.
- RAID 0 – striping
- RAID 1 – mirroring
- RAID 5 – striping with parity
- RAID 6 – striping with double parity
- RAID 10 – combining mirroring and striping
level 0 – Striping
In a RAID 0 system data are split up into blocks that get written across all the drives in the array. By using multiple disks (at least 2) at the same time, this offers superior I/O performance. This performance can be enhanced further by using multiple controllers, ideally one controller per disk.
- RAID 0 offers great performance, both in read and write operations. There is no overhead caused by parity controls.
- All storage capacity is used, there is no overhead.
- The technology is easy to implement.
- RAID 0 is not fault-tolerant. If one drive fails, all data in the RAID 0 array are lost. It should not be used for mission-critical systems.
RAID 0 is ideal for non-critical storage of data that have to be read/written at a high speed, such as on an image retouching or video editing station.
If you want to use RAID 0 purely to combine the storage capacity of twee drives in a single volume, consider mounting one drive in the folder path of the other drive. This is supported in Linux, OS X as well as Windows and has the advantage that a single drive failure has no impact on the data of the second disk or SSD drive.
RAID level 1 – Mirroring
Data are stored twice by writing them to both the data drive (or set of data drives) and a mirror drive (or set of drives). If a drive fails, the controller uses either the data drive or the mirror drive for data recovery and continues operation. You need at least 2 drives for a RAID 1 array.
- RAID 1 offers excellent read speed and a write-speed that is comparable to that of a single drive.
- In case a drive fails, data do not have to be rebuild, they just have to be copied to the replacement drive.
- RAID 1 is a very simple technology.
- The main disadvantage is that the effective storage capacity is only half of the total drive capacity because all data get written twice.
- Software RAID 1 solutions do not always allow a hot swap of a failed drive. That means the failed drive can only be replaced after powering down the computer it is attached to. For servers that are used simultaneously by many people, this may not be acceptable. Such systems typically use hardware controllers that do support hot swapping.
RAID-1 is ideal for mission critical storage, for instance for accounting systems. It is also suitable for small servers in which only two data drives will be used.
RAID level 5
RAID 5 is the most common secure RAID level. It requires at least 3 drives but can work with up to 16. Data blocks are striped across the drives and on one drive a parity checksum of all the block data is written. The parity data are not written to a fixed drive, they are spread across all drives, as the drawing below shows. Using the parity data, the computer can recalculate the data of one of the other data blocks, should those data no longer be available. That means a RAID 5 array can withstand a single drive failure without losing data or access to data. Although RAID 5 can be achieved in software, a hardware controller is recommended. Often extra cache memory is used on these controllers to improve the write performance.
- Read data transactions are very fast while write data transactions are somewhat slower (due to the parity that has to be calculated).
- If a drive fails, you still have access to all data, even while the failed drive is being replaced and the storage controller rebuilds the data on the new drive.
- Drive failures have an effect on throughput, although this is still acceptable.
- This is complex technology. If one of the disks in an array using 4TB disks fails and is replaced, restoring the data (the rebuild time) may take a day or longer, depending on the load on the array and the speed of the controller. If another disk goes bad during that time, data are lost forever.
RAID 5 is a good all-round system that combines efficient storage with excellent security and decent performance. It is ideal for file and application servers that have a limited number of data drives.
RAID level 6 – Striping with double parity
RAID 6 is like RAID 5, but the parity data are written to two drives. That means it requires at least 4 drives and can withstand 2 drives dying simultaneously. The chances that two drives break down at exactly the same moment are of course very small. However, if a drive in a RAID 5 systems dies and is replaced by a new drive, it takes hours or even more than a day to rebuild the swapped drive. If another drive dies during that time, you still lose all of your data. With RAID 6, the RAID array will even survive that second failure.
- Like with RAID 5, read data transactions are very fast.
- If two drives fail, you still have access to all data, even while the failed drives are being replaced. So RAID 6 is more secure than RAID 5.
- Write data transactions are slower than RAID 5 due to the additional parity data that have to be calculated. In one report I read the write performance was 20% lower.
- Drive failures have an effect on throughput, although this is still acceptable.
- This is complex technology. Rebuilding an array in which one drive failed can take a long time.
RAID 6 is a good all-round system that combines efficient storage with excellent security and decent performance. It is preferable over RAID 5 in file and application servers that use many large drives for data storage.
RAID level 10 – combining RAID 1 & RAID 0
It is possible to combine the advantages (and disadvantages) of RAID 0 and RAID 1 in one single system. This is a nested or hybrid RAID configuration. It provides security by mirroring all data on secondary drives while using striping across each set of drives to speed up data transfers.
- If something goes wrong with one of the disks in a RAID 10 configuration, the rebuild time is very fast since all that is needed is copying all the data from the surviving mirror to a new drive. This can take as little as 30 minutes for drives of 1 TB.
- Half of the storage capacity goes to mirroring, so compared to large RAID 5 or RAID 6 arrays, this is an expensive way to have redundancy.