The Google Cycle

My new pages have been optimized to make them friendlier to search engines, but that’s still under 1% of this site. I’ve added more detailed sitemaps, which help focus the search engine resources to look at pages I want seen. The templates for my blog have been rewritten to move more important content higher in the page.

As some of you might remember, I was hit hard in December by a hacker who made it look like my site was hosting close to 100,000 bogus pages&#185!

These phantom webpages were linked to other pages, often on other (probably compromised) sites. Google took note and, at first, began to index them. Then, they removed me entirely, trying to preserve their integrity.

The dust has cleared. Those pages are disappearing from Google’s database. It’s interesting to watch how this happens. With the tools I have, I can see that in nearly real time.

Webpages and links are crawled by Google, Yahoo! and others on a regular basis. Even with hundreds of pages seen daily, individual pages go weeks or more between crawls.

I’ve uploaded a file to my server setting a roadmap, so the bad pages will no longer be included. I thought Google would pull them immediately, since it looks at this file every 24 hours. Instead, the process has taken weeks.

Google shows the number of pages they have in their files which no longer exist is down to around 10,000. That number is reduced every day as they continue to ‘crawl’ my site.

I think of computers as being fast, often instantaneous, machines. However, when you deal with as much info as Google does, even fast takes a lot of time and instantaneous doesn’t exist.

As of yesterday, around 75% of my website traffic was being drawn by people searching for the crap the bad pages held. People are still finding me when they search fror: “bs haker free download ” or “free mobile mouse key generator virtuallab professional 5.” Soon, that should stop.

By far, the most visited piece on my site today is my “Oops” page, where I send all traffic looking for pages that don’t exist!

I’m not sure where my traffic will be when all this ends. My new pages have been optimized to make them friendlier to search engines, but that’s still under 1% of this site. I’ve added more detailed sitemaps, which help focus the search engine resources to look at pages I want seen. The templates for my blog have been rewritten to move more important content higher in the page.

Yahoo! has begun to look at my pages in a more aggressive manner. So far this month, they’ve looked at more stuff on geofffox.com than Google. That’s a huge change. Google is still sending more traffic (excluding the traffic looking for hacker inserted pages), but only on a 3:2 ratio over Yahoo!

Microsoft’s search engines are still mostly MIA. Even with sitemaps and robots files, this month they have seen 1/8 the pages Yahoo! has. Google sends around 100 times more traffic my way than MSN.

It almost looks as if Microsoft isn’t trying. Maybe they’re not.

Why do I care? This site isn’t here to make money.

Having ads on the site and watching how they are placed and work with my content is an education in how the Internet works as a business model. I still have a lot more to learn.

&#185 – I originally posted lower number and never saw the need to update them as they changed on a daily basis.

3 thoughts on “The Google Cycle”

  1. Hi Geoff,

    I am one of those who comes by once or twice a month and read what you have posted.

    In this posting, I have found it to tinkle my funny bone. I have been doing computers since 1970, one or two years 🙂 I noted that the systems have gotten better at obtaining data (Garbage) on the first go round, than cleaning it up from the second, third, etc. visits. This appears to apply to all systems not just search engines.

    What this brings us to is and internet in which one has to be careful about what we trust. The Garbage in Garbage out principle appears to be in full operation from the ‘I want it Now!’ generation that we are currently living in.

    The advantage to being slower on picking things up is you get to filter out some of the garbage.

    Secondly, Microsoft has never been great pushers on their MSN and Live Search engines. (Maybe they miss a lot of the garbage  ) I am hoping that the influence of Yahoo with perk them up a bit.

    Just some thoughts from another HAM. BTW, if interested, http://www.fuller.net/wx/wx.htm is the weather locally. Currently, 2 inches of snow in my back yard and melting.

    Jim Fuller N7VR

  2. Regarding MSN’s lackluster site crawling, if Microsoft buys Yahoo!, then your problem is solved. I’m just not so sure that deal is going to go through, based on the same offer proffered over a year ago.

    I think you made some friends over at Google with your past problem, maybe that will benefit you in the future.

  3. Regarding Jim’s comment (and he has a very pretty weather system page on his site), I think we’ve outgrown the ability of humans to supervise the computers. So, any time something happens that hasn’t been foreseen, the user ends up suffering – in Google’s case, with poorer search results.

    To Gary: I think the opposite might be true. If Microsoft buys Yahoo! it’s probably for the name, not the technology. Microsoft has a very self assured attitude toward their products, even when it’s not deserved.

    As for friends at Google… in my dreams (though those who helped out online were very gracious).

Leave a Reply

Your email address will not be published. Required fields are marked *