How SSD drives larger than 10 TB will change corporate storage segment

8 and 10 TB hard drives are already actively used in Enterprise systems, where high storage density is important. So far, the production technology allows such disks to be produced with a spindle speed of 7200 rpm, and it is obvious that an increase in productivity due to an increase in the rotational speed of the platters is not yet threatened by these disks. Similarly, there is no reason to believe that hard drives with such a high recording density should be more reliable than their 2 terabyte counterparts. It would seem that in the case of data storage systems, the more the better, but it seems that just in the corporate world, hard drives are approaching the edge where a large volume is no longer an advantage.

In early February, Intel promised to provide an SSD drive with a capacity of more than 10 TB within two years. And already in early March, Samsung announced the start of deliveries of its first 15.36 TB PM1633a SSD. It turns out that the last advantage of hard drives, their large capacity, melts before our eyes like snow under the bright spring sun.

So far, there are no real cases of using data arrays on large SSDs, we can think about why we shouldn't make a choice in favor of 8-10 TB hard drives now, but immediately aim at SSDs of the same size.

The main question is the cost of 1 GB

The only major business advantage of using large hard drives is the cost of storing 1 GB of data. But, and this advantage fades away with the advent of large SSDs. How is this possible, since an 8TB hard drive costs about $ 500, and a 16TB SSD is expected to cost around $ 8,000? How can they be compared at the price of a gigabyte? But as they say, it's all about the details.

If we compare SSD and HDD head-on at the price of a gigabyte, then SSD will lose. But if you look at the finished project implemented on an SSD or HDD, then it is possible that SSD will benefit from savings on caching devices. Let me explain.

For example, we are dealing with a newfangled Big Data project that collects and processes petabytes of information. How will it be implemented in hardware? Roughly, it looks like this: there are two cabinets with disk shelves crammed with 8 TB hard drives, where data is stored. But next to it there are 3-4 shelves with fast flash drives, where data is transferred for their processing, and processing nodes are connected to these shelves. This is not exactly caching in the usual sense. The choice of what to store on the HDD and what on the SSD is made by the application itself or the operating system. It is somewhat reminiscent of the SSD cache, which even entry-level NAS devices now have, only a little more complicated.

So, it turns out that you can't do without an SSD, even if you store your data on large hard drives, unless, of course, you have an archive of online backups. Split data into hot data and "cold" will still have to.

Cold Data

By choosing large SSD drives, you simplify the infrastructure: you do not need cache shelves, because SSDs of this size, in terms of performance expressed in IOPS (IOPS), are about 1000 times faster than server ones HDD with a disk rotation speed of 15,000 RPM. For comparison, an HDD with a spindle speed of 15,000 rpm can provide performance in the region of 200-300 IOPS, depending on the load. And Samsung PM1633A produces 200,000 IOPS for reading and 32,000 IOPS for writing. The linear read and write speed is about 1100 MB/s, 5 times higher than that of a 15K HDD. Therefore, now you do not need to move data from one medium to another, you can connect compute nodes directly to an SSD shelf. Today there are already 2U servers with the ability to install 48 2.5-inch hard drives (Supermicro 2028R-E1CR48L). When using SSDs such as the Samsung PM1633A, their storage space will be 737.28 TB. And a 42-unit cabinet loaded with such servers with SSDs will give you 15.4 PB of disk space.

As a result, instead of two hard drive cabinets and 3-4 caching shelves, you get 1 cabinet with disk shelves or servers using SSDs. And, of course, in this case, even with the same amount of stored data, SSDs will win at the price of a gigabyte. And this is not to say that we do not need to rewrite the application to work with "hot" (on a caching device) and "cold" (on slow HDD) data.

RAID arrays can be used again

In the case of large HDDs, RAID arrays are contraindicated, because as you know, trouble does not come alone, and if your hard drive in a RAID array breaks, then the second one is probably on the way. The recovery time for RAID 5 on 5 1TB disks takes approximately 6-7 hours. The more disk space you supply, the longer the Rebuild operation will take. This operation may take several days, and needless to say that the second hard drive that failed during the "rebuild" will take the RAID 5 along with all the data into the abyss? Therefore, when it comes to big data and large disks, RAID arrays prefer simple duplication or distribution of data across different nodes, but again, at the application level. Technologies for distributing data across physical nodes and hard drives work on the same principles as RAID, and resemble something between RAID 1 and RAID 5, but, as a rule, space efficiency is less here.

So, large SSD disks will not have such a problem with array recovery. Their read and write speeds are still limited by the relatively "slow" 12Gbps SAS interface, which gives a little less than 1 gigabyte per second. In real life, it is not yet clear how modern RAID controllers will perform at such speeds - will they have enough power of the built-in processor to reveal the advantages of SSD in RAID 5 arrays? But it is already clear that the slowed down HDD speeds will not be there, which means that there is no need to fence software distribution of data - you can use the time-tested, reliable RAID 5, or RAID 6 to be sure. In both cases, the efficiency will be higher than when trying to programmatically distribute data across different nodes.

A new era? She is the most!

I can compare the importance of the appearance of SSD drives with a capacity of more than 10 TB with the release of the iPhone or iPad. Just as these gadgets have changed the way we think about mobility, so large SSDs will change how we store and process data. First, all data is hot. You can store fantastic volumes in a single, fast device with a direct server host connection. Even in the server itself! And these are terabyte databases, through which full-text search flies dashingly, these are face recognition systems that search for a person in the archives of hundreds of surveillance cameras for several months at once, this, plus everything, is an opportunity to study Big Data applications in educational institutions, and experiment how it is said on live hardware.

Can 15K RPM HDD do anything?

Definitely not. The era of hard drives has come to an end, and for some time manufacturers will supply their hard drives as spare parts for installed storage systems and servers, as well as react with horror to every decrease in the price of SSD. 3D Nand technology, which made it possible to create such huge SSDs, will continue to grow and become cheaper. The need for 10k and 15k rpm drives will disappear every day, so you shouldn't expect any progress there.

15K HDD

The only thing that still remains for hard drives is entry-level video surveillance systems and NAS for small businesses. It does not require high speeds due to the limitations of the Ethernet interface, it requires a large volume for little money. Therefore, 7200 RPM SATA drives will not disappear anywhere in the foreseeable future.

Mikhail Degtyarev (aka LIKE OFF)
10/03.2016