Google indexes hundreds of thousands of Gigabytes per day

Google just announced that they have rolled out a new indexing system for their search engine, called Caffeine. Here are some interesting quotes:

Some background for those of you who don’t build search engines for a living like us: when you search Google, you’re not searching the live web. Instead you’re searching Google’s index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need. (Here’s a good explanation of how it all works.)

What’s even more intriguing is the amount of data they process:

Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.

I can’t even fathom that amount of data. To read the official Google announcement, check it out here.

Google Indexes Hundreds of Thousands of Gigabytes Per Day

Comments