As some of you might remember, I was hit hard in December by a hacker who made it look like my site was hosting close to 100,000 bogus pages¹!
These phantom webpages were linked to other pages, often on other (probably compromised) sites. Google took note and, at first, began to index them. Then, they removed me entirely, trying to preserve their integrity.
The dust has cleared. Those pages are disappearing from Google’s database. It’s interesting to watch how this happens. With the tools I have, I can see that in nearly real time.
Webpages and links are crawled by Google, Yahoo! and others on a regular basis. Even with hundreds of pages seen daily, individual pages go weeks or more between crawls.
I’ve uploaded a file to my server setting a roadmap, so the bad pages will no longer be included. I thought Google would pull them immediately, since it looks at this file every 24 hours. Instead, the process has taken weeks.
Google shows the number of pages they have in their files which no longer exist is down to around 10,000. That number is reduced every day as they continue to ‘crawl’ my site.
I think of computers as being fast, often instantaneous, machines. However, when you deal with as much info as Google does, even fast takes a lot of time and instantaneous doesn’t exist.
As of yesterday, around 75% of my website traffic was being drawn by people searching for the crap the bad pages held. People are still finding me when they search fror: “bs haker free download ” or “free mobile mouse key generator virtuallab professional 5.” Soon, that should stop.
By far, the most visited piece on my site today is my “Oops” page, where I send all traffic looking for pages that don’t exist!
I’m not sure where my traffic will be when all this ends. My new pages have been optimized to make them friendlier to search engines, but that’s still under 1% of this site. I’ve added more detailed sitemaps, which help focus the search engine resources to look at pages I want seen. The templates for my blog have been rewritten to move more important content higher in the page.
Yahoo! has begun to look at my pages in a more aggressive manner. So far this month, they’ve looked at more stuff on geofffox.com than Google. That’s a huge change. Google is still sending more traffic (excluding the traffic looking for hacker inserted pages), but only on a 3:2 ratio over Yahoo!
Microsoft’s search engines are still mostly MIA. Even with sitemaps and robots files, this month they have seen 1/8 the pages Yahoo! has. Google sends around 100 times more traffic my way than MSN.
It almost looks as if Microsoft isn’t trying. Maybe they’re not.
Why do I care? This site isn’t here to make money.
Having ads on the site and watching how they are placed and work with my content is an education in how the Internet works as a business model. I still have a lot more to learn.
¹ – I originally posted lower number and never saw the need to update them as they changed on a daily basis.