Just a few years ago, the volume of data produced worldwide exceeded 1 zetabytes. This is equal to the amount of information needed to fill 57.5 billion 32GB Apple iPads or over 200 billion HD movies.
According to a study, the data volumes in the current decade will increase by 50 times leading to a shortage of storage space for more than 60% of the data generated.
Generating new data now is much cheaper today than before: the cost of storage and processing has decreased by 6 times since 2005. During the same period budgets for IT have increased by one and a half times. By 2020, the number of data generating devices will increase by eight times - starting with smart phones and cameras with higher resolution and ending with different sensors and smart personal devices. Additional information is generated as derivative from that already existing; first of all these are backups, as well as logs and digital, audio and video archives.
The lack of affordable storage space is due to the fact that the hardware Data Storage Systems (DSS) have evolved for a long time on the principle of faster, higher, stronger. DSS optimization has been customized for the needs of companies with large budgets, i.e. fast storage for virtualization, super fast storage for processing data in real time, smart storage with optimization for certain business applications.
A review of approaches to storing large amounts of data will not be complete without mentioning the solutions based on software, but supplied to the market in form of software and hardware sets (appliances). In some cases, this allows you to quickly deploy the solution and can be good for a not very large company with limited resources. However, the use of a predefined hardware configuration limits the ability to tune the system and, of course, sets higher threshold prices than for pure software solutions, which already include the hardware. And, of course, such an approach inherits many specific hardware DSS as part of upgrading a server.
It is clear that creating a commercial solution on the basis of open source is a complex and risky experiment, and only a large company or system integrator having sufficient expertise and resources to deal with difficulties in installation, integration, and open source code support can take it on and have sufficient commercial motivation to do so. The main motivation of commercial vendors is aimed at such high-budget areas as a high-speed data storage system for virtualization or parallel processing of data.
Closer to solving the problem of inexpensive and reliable storage the startups were focused on providing cloud backup. While some of them went out of the race, others who gambled on unfolding cloud storage in their own data centers based on standard components managed to gain a foothold in the market by their cloud services have made the best progress. Although they, too, due to very high competition in their main market, do not proactively promote storage technology as an individual solution of the Software Defined Storage class; firstly, not to create competitors, and secondly not to disperse their resources in completely different business directions.
As a result DSS administrators responsible inter alia for storage of backups, logs, archives of video surveillance systems, TV shows, voice calls records, encounter a problem: of having on the one hand convenient but costly solutions that in the event of having a sufficient budget are capable of solving current needs in the storage of 100-150 TB of data. It will be reliable and secure.
However, once the DSS capacity exceeds the threshold of 150-200 TB of data, the problems of further scalability occur, i.e. uniting all hardware into a single file system, freely reallocating the space, upgrading hard discs with discs of larger capacity, there emerge extra expenses for migration, costly components and special software for 'DSS virtualization'.
As a result through the cost of ownership such a system with time becomes far from the optimal one for 'cold data'. Another option is as follows: composing DSS on its own based on Linux and JBOD is possible, it's good for a specialized company such as a hoster or telecom-provider having experienced and qualified specialists who can assume responsibility for the workability and reliability of their own solutions.
An ordinary company of average or small size with a principle business not related to data storage most probably has no budget for expensive hardware and qualified specialists. An interesting option for such companies may be on a software solution that allows you to quickly deploy a highly reliable and easily extendable DSS.
The DSS is on inexpensive typical mountings and drives that can be freely combined with each other, changed one by one on a 'hot system', increasing the space of arbitrary blocks from a few terabytes to tens or hundreds of terabytes using essentially only the skills of PC assembling and intuitive web interface for configuration and monitoring of the entire Data Storage System and its individual components and drives. This development is the result of a cloud storage for backups which now has expanded to multiple petabytes in three data centers.
According to a Forrester reporter, in 20% of companies backup volume has been increasing by 100TB per year, and the complexity of expanding DSS according to the needs of backups has become a problem for 42% of the companies.
This data forces professionals to think about long-term planning of DSS capacity that may be needed in their organization over several years.