{deprecated} New Google SearchWiki - Diggification gone wild

This morning I noticed a new feature on Google’s Search engine when I am logged into my Google account. I didn’t sign up for this feature, and apparently there is no way to turn this feature off. If you don’t love it, you will hate Google for adding this feature.

What I am talking about is the new Google SearchWiki. In a nutshell, you can promote, demote, and comment on search results. However, any promoting you do will only show up for you. This can have its benefits and I see how I might like it. For example, I often do searches for our family history, and get annoyed at having to sift through the same non-related sites all the time. Using the demote feature, I would now be able to simply demote each link which has nothing to do with my family.

The downside to this is that if one of those sites later adds something relevant about our family, I would never know.

Now of course, we all know Big Brother Google will be watching with great interest to see what we all promote and demote. They will add all these statistics up into a neat little bundle and use it in their efforts for world domination…err…did I say world domination? What I meant was, they will use this data to better serve the users of their search engine.

My final thoughts? Hopefully Google will offer a way to turn this feature off, as it is just too invasive for my teeny tiny little mind to handle today.

Happy searching!

Addendum: You can additionally see what others have done with search results by clicking on the ‘See all notes for this SearchWiki’. For example, if you search for ‘hosting’ and then click on the See all notes link at the bottom of the page, you can see what others have promoted or demoted. Much like Digg does for news stories.

You can also view all your changes or remove your own changes by clicking on the ‘See all my SearchWiki notes’.

The last feature lets you add additional URL’s to your SearchWiki search results. For example, you search for the keyword ‘hosting’, you can then click on the ‘Add a result’ link at the bottom of the page and add an additional URL. Now whenever you do a search for ‘hosting’, the results will include the URL that you just added. Most interesting here is that anyone who clicks on the ‘All notes for this searchwiki’ will now see my URL. Yeah sure, a small victory for me, as the SEO folks will spam this to death making it yet another useless feature.

Block spam bots and evil web scrapers

I noticed on one server today a huge CPU load. A quick look at Netstat showed that most of the current traffic was coming from someone in Africa. I crossed referenced the IP address from Netstat with the access log files on the various sites on the server and saw that it had a UserAgent of ‘DTS Agent’.  A quick Google showed this to be a scraper for email contacts.

It was time that I added a little something for these spam bots to help reduce their ability to see my sites. Simply adding the following code to an .htaccess file in the root of the site structure would help to curb evil good for nothing spam bots:

[Last Updated: April 7th 2009]

RewriteEngine On

RewriteCond %{HTTP:User-Agent} (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\sExplorer|Offline\sNavigator|Openfind|Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWebWalker|ProWebWalker|psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|WebReaper|WebSauger|Website\seXtractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|Xaldon\sWebSpider|Xenu’s|Zeus|DTS\sAgent) [NC]
RewriteRule .* - [F]


ErrorDocument 403 /403.html
 
# IF THE UA STARTS WITH THESE
SetEnvIfNoCase ^User-Agent$ .*(aesop_com_spiderman|backweb|bandit|batchftp|bigfoot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) HTTP_SAFE_BADBOT

SetEnvIfNoCase ^User-Agent$ .*(cosmos|crescent|diibot|dittospyder|dragonfly) HTTP_SAFE_BADBOT    
SetEnvIfNoCase ^User-Agent$ .*(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) HTTP_SAFE_BADBOT

SetEnvIfNoCase ^User-Agent$ .*(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(grafula|harvest|hloader|hmview|httrack|humanlinks|ilsebot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(infonavirobot|infotekies|interget|iria|jennybot|jetcar) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(likse|linkscan|linkwalker|lnspiderguy|magnet|mag-net|markwatch) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(net.?vampire|netants|netmechanic|netspider|nextgensearchbot) HTTP_SAFE_BADBOT

SetEnvIfNoCase ^User-Agent$ .*(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(offline.?navigator|openfind|outfoxbot|pagegrabber|pavuk) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(psbot|pump|queryn|recorder|realdownload|reaper|true_robot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(repomonkey|internetseer|sitesnagger|siphon|slysearch|smartdownload) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(turingos|urly.?warning|vacuum|voideye|whacker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|reaper|sauger|site.?quester|whack) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(dts\sagent) HTTP_SAFE_BADBOT

This code is a cleaned up version of the code found here and here. Note that you really should look through all the User Agents and be sure you are not blocking someone or some software that you would like to keep. The list above is far from complete, but is a good start.

For a fairly up-to-date list of User Agents you will find a useful User Agent database here.

What is my IP Address

Lately I have been wanting to find what my outside IP address is. Videotron has recently started to actually expire the lease on my IP address more frequently than once a year, so I can’t remember what my IP Address is all the time. I have whipped up this little ‘What is my IP Address’ page so that I may easily get my address whenever I want.

Hopefully you will find my What is my IP address page useful too.

What is my IP Address.

Edit navigation menu in MediaWiki

First off, my apologies for not posting recently. Things have been crazy here, and to top if off I decided to build a shed in the back yard. It will give me a place to be banished too when I misbehave at home!

Many of my clients use Mediawiki. I have inherited many old version of MediaWiki installs and have learnt that upgrading MediaWiki today is relatively painless. Part of my experience is learning how to use MediaWiki. When I first started using MediaWiki, which is what Wikipedia uses, I looked at it as some sort of archaic application that would not survive much into the future. Now that I am learning how to actually use the wiki software, I have a new found respect for it.

Today I learnt how easy it is to fix the navigation sidebar. Apparently a few people have had problems so I felt it worth posting here. I found the answer here.

Simply enter ‘Mediawiki:sidebar’ in the search field and click ‘Go’. You can now edit this as a normal page, changing, adding and removing links. Links can either point to other wiki pages or point to links outside the wiki.

Hope this helps someone out!

ThePlanet Data center outage from explosion

Life at The Planet left May with a Bang! At some point Saturday afternoon May 31st, The Planet’s Houston H1 data center suffered a generator melt down. The fire department quickly determined that the data center needed to be brought offline for a few hours until the situation could be properly assessed. Unfortunately, it was not enough. Shortly after the Planet’s data center was brought off-line there was a major explosion which took out three walls of their data center power room.

From The Planet forums:

This evening at 4:55pm CDT in our H1 data center, electrical gear shorted, creating an explosion and fire that knocked down three walls surrounding our electrical equipment room. Thankfully, no one was injured. In addition, no customer servers were damaged or lost.

We have just been allowed into the building to physically inspect the damage. Early indications are that the short was in a high-volume wire conduit. We were not allowed to activate our backup generator plan based on instructions from the fire department.

This is a significant outage, impacting approximately 9,000 servers and 7,500 customers. All members of our support team are in, and all vendors who supply us with data center equipment are on site. Our initial assessment, although early, points to being able to have some service restored by mid-afternoon on Sunday. Rest assured we are working around the clock.

We are in the process of communicating with all affected customers. we are planning to post updates every hour via our forum and in our customer portal. Our interactive voice response system is updating customers as well.

You can follow the progress of their recovering of the Houston Data Center on this thread.

Unfortunately I had a server in H1 data center in Houston. As of tonight, Sunday at 9:15PM EST there is still sign of life from my server. Fortunately I had not brought the server online for real usage yet. It is more of a play thing for me.

When it comes back I hope to continue my list of how-to’s and learnings on getting Exim setup, which is all working now, but of course, I want to see how well I can protect myself from spam now.

Night all.

Page 2 of 3 pages  < 1 2 3 >