Block spam bots and evil web scrapers

I noticed on one server today a huge CPU load. A quick look at Netstat showed that most of the current traffic was coming from someone in Africa. I crossed referenced the IP address from Netstat with the access log files on the various sites on the server and saw that it had a UserAgent of ‘DTS Agent’.  A quick Google showed this to be a scraper for email contacts.

It was time that I added a little something for these spam bots to help reduce their ability to see my sites. Simply adding the following code to an .htaccess file in the root of the site structure would help to curb evil good for nothing spam bots:

[Last Updated: April 7th 2009]

RewriteEngine On

RewriteCond %{HTTP:User-Agent} (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWebWalker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu’s|Zeus|DTS\sAgent) [NC]
RewriteRule .* - [F]


ErrorDocument 403 /403.html
 
# IF THE UA STARTS WITH THESE
SetEnvIfNoCase ^User-Agent$ .*(aesop_com_spiderman|backweb|bandit|batchftp|bigfoot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) HTTP_SAFE_BADBOT

SetEnvIfNoCase ^User-Agent$ .*(cosmos|crescent|diibot|dittospyder|dragonfly) HTTP_SAFE_BADBOT    
SetEnvIfNoCase ^User-Agent$ .*(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) HTTP_SAFE_BADBOT

SetEnvIfNoCase ^User-Agent$ .*(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(grafula|harvest|hloader|hmview|httrack|humanlinks|ilsebot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(infonavirobot|infotekies|interget|iria|jennybot|jetcar) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(likse|linkscan|linkwalker|lnspiderguy|magnet|mag-net|markwatch) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(net.?vampire|netants|netmechanic|netspider|nextgensearchbot) HTTP_SAFE_BADBOT

SetEnvIfNoCase ^User-Agent$ .*(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(offline.?navigator|openfind|outfoxbot|pagegrabber|pavuk) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(psbot|pump|queryn|recorder|realdownload|reaper|true_robot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(repomonkey|internetseer|sitesnagger|siphon|slysearch|smartdownload) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(turingos|urly.?warning|vacuum|voideye|whacker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|reaper|sauger|site.?quester|whack) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(dts\sagent) HTTP_SAFE_BADBOT

This code is a cleaned up version of the code found here and here. Note that you really should look through all the User Agents and be sure you are not blocking someone or some software that you would like to keep. The list above is far from complete, but is a good start.

For a fairly up-to-date list of User Agents you will find a useful User Agent database here.

What is my IP Address

Lately I have been wanting to find what my outside IP address is. Videotron has recently started to actually expire the lease on my IP address more frequently than once a year, so I can’t remember what my IP Address is all the time. I have whipped up this little ‘What is my IP Address’ page so that I may easily get my address whenever I want.

Hopefully you will find my What is my IP address page useful too.

What is my IP Address.

Edit navigation menu in MediaWiki

First off, my apologies for not posting recently. Things have been crazy here, and to top if off I decided to build a shed in the back yard. It will give me a place to be banished too when I misbehave at home!

Many of my clients use Mediawiki. I have inherited many old version of MediaWiki installs and have learnt that upgrading MediaWiki today is relatively painless. Part of my experience is learning how to use MediaWiki. When I first started using MediaWiki, which is what Wikipedia uses, I looked at it as some sort of archaic application that would not survive much into the future. Now that I am learning how to actually use the wiki software, I have a new found respect for it.

Today I learnt how easy it is to fix the navigation sidebar. Apparently a few people have had problems so I felt it worth posting here. I found the answer here.

Simply enter ‘Mediawiki:sidebar’ in the search field and click ‘Go’. You can now edit this as a normal page, changing, adding and removing links. Links can either point to other wiki pages or point to links outside the wiki.

Hope this helps someone out!

ThePlanet Data center outage from explosion

Life at The Planet left May with a Bang! At some point Saturday afternoon May 31st, The Planet’s Houston H1 data center suffered a generator melt down. The fire department quickly determined that the data center needed to be brought offline for a few hours until the situation could be properly assessed. Unfortunately, it was not enough. Shortly after the Planet’s data center was brought off-line there was a major explosion which took out three walls of their data center power room.

From The Planet forums:

This evening at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room. Thankfully, no one was injured. In addition, no customer servers were damaged or lost.

We have just been allowed into the building to physically inspect the damage. Early indications are that the short was in a high-volume wire conduit. We were not allowed to activate our backup generator plan based on instructions from the fire department.

This is a significant outage, impacting approximately 9,000 servers and 7,500 customers. All members of our support team are in, and all vendors who supply us with data center equipment are on site. Our initial assessment, although early, points to being able to have some service restored by mid-afternoon on Sunday. Rest assured we are working around the clock.

We are in the process of communicating with all affected customers. we are planning to post updates every hour via our forum and in our customer portal. Our interactive voice response system is updating customers as well.

You can follow the progress of their recovering of the Houston Data Center on this thread.

Unfortunately I had a server in H1 data center in Houston. As of tonight, Sunday at 9:15PM EST there is still sign of life from my server. Fortunately I had not brought the server online for real usage yet. It is more of a play thing for me.

When it comes back I hope to continue my list of how-to’s and learnings on getting Exim setup, which is all working now, but of course, I want to see how well I can protect myself from spam now.

Night all.

Billing software for hosting - a winner!

For a few years now I have used Modernbill’s hosting billing and management software for all my hosting. It has not come without a lot of issues and quirks. Problem was, each time I got a quirk I ended up having to pay Modernbill to get the quirk fixed. More often than not the problem wasn’t actually fixed, instead I was just given a work around to keep the software going.

Well, a few days ago I sat down to send out an invoice. That, in theory should take about 10 minutes, including writing in the details and ensuring everything was correct. Then click send and off I go to do my other work.

STOP!

I clicked send and got a Failed message from Modernbill’s cache. Those of you who have used Modernbill probably know what I mean, or so all the forums say. So I checked all my bits and pieces out, they looked good, tried again. No go.

So I started down a path that was 2 hours long. This path is that all too familiar trouble shooting path of searching Google, reading posts, trying things and full circle back to Google. Finally, after my forehead turned black and blue from pounding it on the desk, I sent a trouble ticket into Modernbill.

Their response, “Should take less than an hour, so $75 will cover the fix.”

Two years ago my response to that email would have sent shivers down anyones spine, and I would have resulted in me been told to piss off. But no, I kept my cool and sent back a nice response explaining my situation, and exactly why I was against paying anymore money.

They responded with a nice email that said they understood my dilemma and a win-win solution would be to buy a 6 month support plan, and then they could fix the problem. Yeah, that is sorta of win win, but I lose my money still. Keep in mind that at this point I have now spent close to $300 on a software package that never really did everything it was supposed to. At least I couldn’t get it to do everything it was supposed to.
At this point I am late for a breakfast date with the wife, but still in a fowl mood not worthy of breakfast. Back to Google. This time on a trek to find a replacement program for my online billing needs.

I should interject here that I only use the billing software to handle subscriptions, fast invoice sending that is to be paid by Paypal, and in theory to keep track of my profit margins.

After less than 30 minutes I found a pretty slick looking replacement app. Albeit only for cPanel, and I have Ensim, but as I mentioned two seconds ago, I don’t manage the actual server with this software. Not yet! So I looked at the time, 30 minutes late for breakfast, what’s 30 more minutes.

Within 30 minutes (30 more minutes that is) I downloaded, installed, configured, imported all my Modernbill clients and accounts, tweaked all the accounts, and sent out my invoice that I started out to do almost three hours ago.

Can you say ... WoW! I did. And I still am.

If you have read this far, then you deserve to know what software I am referring to. WHMCompleteSolutions is the place I found my reprieve from billing insanity.  Not only is their interface slick and really fast, but their pre-sale customer support was amazing. They were able to answer everyone of my questions within 30 minutes (there it is again!) and sometimes within 10 minutes. WoW! From the website they say this:

“The complete client management, billing & support system”

WHMCompleteSolution is the complete client management solution for Web Hosts & Dedicated Server Providers looking for Online Automated Recurring Billing, Flexible and Easy to Use Client Management and Integrated Client Support Center including Support Tickets, Knowledgebase, Announcements & Live Server Status.

  • Powerful & Flexible Billing & Client Management System
  • Support Ticketing System with Full Email Piping Support
  • Automated Account Creation, Suspension & Termination
  • Payment Tracking, Accounting Features & Statistical Reports
  • Multi-Language Support
  • 100% Customisable using Templates

It is four days later, and I have purchased my full license, complete with refugee pricing having come from a competitor (you know who so I won’t repeat myself).

Do yourself a favour, if you are looking for a billing package for a small hosting business, check out WHMCompleteSolutions. The 15 day full functioning trial is FREE, so you have nothing to lose but a little time. It might even save your marriage! Speaking of which, the wife did not shoot me as she had other things to do, and we ended up having a great breakfast over Boccacinos in the West Island of Montreal.

DISCLAIMER: I realize that this solution is not for everyone. A lot of people probably get along well with Modernbill and their support staff. This rant is strictly from my point of view is not meant to say that MB sucks.

PS: But I really do like this new solution and really didn’t like having to work with MB.

Page 2 of 3 pages  < 1 2 3 >