Blog
Defensive Web Programming
Date: 27/7/2004
I had to do some more defensive PHP programming today to stop the crazy GoogleBot from loading random non-existant URI's on my site and filling the stats mechanism up with junk. It seems to be the worst of the bots in that regard. If it was anything other than Google i'd just block the bot in the robots.txt or even by hostname but I guess I like my site being indexed by Google. ;)

I still remember the day that GoogleBot went postal on me and got stuck in a loop loading weird URI's off my site made up of combinations of bits of path and script names. 61000 times. Thats a lot of page loads in one day.

These days it's just a few pages here and there but still annoying enough to detect and fix. Btw I found a bug in PHP to, if you go to a page in the form 'http://site/page.php/' the $PHP_SELF variable has '/' in it. Instead of '/page.php/' or something. Dumb PHP.
 
Reply
From:
Email (optional): (Will be HTML encoded to evade harvesting)
Message:
 
Remember username and/or email in a cookie.
Notify me of new posts in this thread via email.
BBcode:
[q]text[/q]
[url=link]description[/url]
[img]url_to_image[/img]
[pre]some_code[/pre]
[b]bold_text[/b]