Data dilemma for Google: expensive flash or cheap hard drives?

google datacenter

Google still primarily relies on hard drives for storage, but has ‘drastically improved’ its own storage.

Almost all of Google‘s products rely on Colossus, a self-built automated data storage system that makes hard disk drives (HDD) work as fast as solid state drives (SSD). A developer and storage technologist from the tech giant shared its workings in a blog.

Colossal system

Colossus is the foundation for Youtube, Gmail, Google’s cloud storage services, and other applications. The system manages exabytes of data spread across various data centers. “Most data centers have one cluster and thus one Colossus file system, regardless of how many workloads are running,” the duo writes. The system operates at very high speeds of up to 50 TB/s read and 25 TB/s write speeds.

read also

Data dilemma for Google: expensive flash or cheap hard drives?

Of course, Google uses flash drives alongside disk storage. Relying solely on them is still a bit too expensive. Therefore, a combination of hard drives and SSDs works perfectly. “The challenge is to put the right data on SSDs, while keeping the majority on hard drives.”

Automation

Google solves this challenge by using an automated cache system called ‘L4’. It uses machine learning to dynamically choose which data is most suitable for SSD and which for HDD.

However, Colossus has a “big weakness”, and that is the use of hard drives. They work well when the same data is frequently updated, but less so with files that regularly receive small additions. “It’s better to completely skip hard drives for those use cases.”

Combining HDD and SSD remains a challenging task for many storage hardware suppliers. HDD is still the cheapest and most established option for now, but it comes at the cost of SSD speed.