Tonight on Sixty Minutes I watched Leslie Stahl’s two part piece on Google¹. I didn’t learn anything new, and was pleased to see many of the good things I had heard being reinforced.
There is one part missing from that story… something I had never seen touched upon.
In order to do what it does, Google must cache the entire Internet. So, if Google says it can search 8,058,044,651 pages (as it does right now), those pages must be stored (possibly more than once) in Google’s server farms.
I would hope they have a sophisticated way of compressing the data for fast indexing and easy storage – still that’s more than a daunting task. And Microsoft, Yahoo and a few others do much the same.
Last year, when my webhost had a server crash and lost a few weeks of data, Google’s cache allowed me to go back and reconstruct nearly everything I had lost. And if they do it with my site, they’re doing it with every site, whether the info is good, bad, useful or useless.
¹ – The piece was produced by Rome Hartman, who probably doesn’t remember me. Back in the early 70s I worked for his father, also Rome Hartman, at a radio station in West Palm Beach, Florida. On the weekends, we sometimes went out in the station’s boat as he did fishing reports. He’s done well for himself in spite of having known me.