Really Stopping Scrapers

by Kabitzin on July 20, 2010 in Blogging Tips

Really Stopping Scrapers scraperdown

First kill, woot!

Background:

If you’ve been following my Twitter, you know I recently lashed out against a scraper who was stealing our content, removing the copyright link from the feed, and hotlinking our images. In the past, I had tolerated scrapers because as long as they kept the copyright link, the effort of fighting scrapers outweighed the benefits.  Fighting against scrapers in the real world usually centers around making sure everyone knows that your site is the real source of the content, tracking who is scraping your content, and making the scrapers want to take your content off their site.  DMCA’s are, in my experience, useless because of the high burden placed on the victim and the ability of the scraper to simply ignore the DMCA.  Even though “it’s obvious” which site is the source, ISPs and webhosts don’t want to make tht call or do the research.  Furthermore, I didn’t want to turn off hotlinking or switch to summaries, because that screws with legit fans reading our posts in feed-readers. Fortunately, rather than use a whitelisting strategy, one can use a blacklisting strategy against specific scrapers.

Really Stopping Scrapers raichu

Sorry Raichu, but no amount of crying will wash the image from your mind's eye

The strategy I used has been employed before against hotlinkers, but (compared to some other types of blogs) anime blogs have an advantage against scrapers in that the images are both numerous in and very important to the post. Most scrapers simply hotlink because it’s easy, and this makes them vulnerable to .htaccess protection that switches out the image they were trying to steal for the image of your choosing.

Fighting Back:

The process is quite simple.  Create a .htaccess file with the below code, and upload it to the directory where your images reside (probably /wp-content/uploads).  It will protect all your sub-directories, too.

# Stop Hotlinking
RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?animeblogonline\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?resuck\.com/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?anime1\.info/ [NC,OR]
RewriteCond %{HTTP_REFERER} ^http://(.+\.)?kokidokom\.wordpress\.com/ [NC]
RewriteRule .*\.(jpe?g|gif|bmp|png)$ http://annoying_image_of_your_choosing.jpg [L]

You can add as many sites as you want.  Notice you have to use the backslash to escape the periods in the URL of the Rewrite Condition.  Remember to insert your own URL for the image you want to redirect the scraper to.  I think you may have to match image formats, to avoid confusing the browser.  Depending on what FTP program you are using, you may have to name the file htaccess.txt, upload it, and then rename it to .htaccess to get the file to upload correctly.

Additional Thoughts:

Really Stopping Scrapers animeblogonline

Target acquired

You can redirect the scraper to hotlink the disgusting image or an annoying blinking image on someone else’s site. I’ve seen first-hand that nice warning images do not stop scrapers, as they don’t have any problems stealing animeblogger.net posts and images.  Blinking animated images are supremely annoying and don’t get curious readers into trouble. 

This hotlink protection is a good tool in limiting the damage scrapers can do to you.  There is not much you can do to protect your RSS feed without hurting your regular visitors, but this method prevents scrapers from stealing your bandwidth.  It’s also a good way to show any readers that happen upon the scraper site that the scraper site is a phony (although it is usually fairly easy to tell and this is less of a concern in most cases).  Redirecting to deeply disturbing or explicit pictures (hosted elsewhere, of course) is extremely cruel but gives you the option of complaining to the scraper’s webhost that they were displaying material that violates the webhost’s TOS.  This is pretty toxic stuff, but sometimes you have to fight poison with poison (one of my favorite Iron Monkey quotes).

The best part about the .htaccess redirect is that it only serves up the nasty stuff to the specific sites you mention, so rather than build a wall that keeps a lot of people out, you defend yourself by dropping laser-guided bombs on your enemies.  Plus, scrapers often have to remove the defaced posts manually, and the lazier they were in the first place, the harder your images will be to remove.  I cannot tell you how much I enjoyed watching the scraper that inspired this article try to remove our posts one by one, only to get shut down by WordPress before the horror had been completely exorcised.

Suddenly, the Uncommon Uses feature in Feedburner becomes so much more useful.  Most scrapers don’t credit the author anyway, so this process might not ruin your good name as quickly as you think.  Is this childish?  Yes.  Is it deeply satisfying and amusing?  Yes.

Related posts:

  1. Learning to Love RSS
  2. Two More Reasons To Give Thanks
  3. Love Slugs

This post was written by...

– who has written 1940 posts on Sea Slugs! Anime Blog.

One of the founders of Sea Slugs, I handle most of the blog admin tasks while wearing my I AM BOSS shirt. I like my action series well choreographed, and my romance series extra trashy. I also have a soft spot for puns.

{ 16 comments… read them below or add one }

ghostlightning July 20, 2010 at 5:53 am

Epic win. Full marks. Good job.

Reply

quillon July 20, 2010 at 6:37 am

I guess my referer settings hides those shock images from my eyes.

http://img706.imageshack.us/img706/4407/animeblogonline12796217.png

Reply

Nazarielle July 20, 2010 at 7:08 am

Is it deeply satisfying and amusing? Yes.

Very entertaining, well worth the effort imo :3

Reply

Mentar July 20, 2010 at 7:54 am

*lol*

^_^

Reply

Gudo July 20, 2010 at 11:13 am

Really useful post. It says something about the quality of your aniblog that your content is worth stealing. And it also says something that you’re clever enough to put a stop to it and explain to others how to follow suit. Quality.

Reply

TJ July 20, 2010 at 1:17 pm

Haha, good job. Scrapers are pretty annoying, but good to know you can fight back (in a self-hosted environment).

I can’t do much about it on a Blogger blog other than a copyright link in the feed, but at least hotlinkers aren’t stealing my own bandwidth.

Reply

Josh S. July 20, 2010 at 4:37 pm

holy crap (cringe). I saw the replaced pictures on those other scrapper sites. It revolted me and made me laugh at the same time. Don’t think that’s ever happened before.

Reply

Steven Den Beste July 20, 2010 at 9:20 pm

What I use is an animated GIF that’s 1024*1024 and blinks alternately blue and red. Supremely annoying, I can tell you. And though it’s lots of pixels, the file itself isn’t very large at all. (I keep it on my own server.)

Reply

Kabitzin July 20, 2010 at 11:11 pm

I like that idea! At least curious readers won’t get into trouble that way.

Reply

7 July 20, 2010 at 11:25 pm

You have my thanks for saving me the trouble of having to deal with these bastards under the guise of Kokidokom.

Reply

Yi July 21, 2010 at 7:14 am

Haha, good job!

Reply

Kabitzin July 25, 2010 at 8:37 am

So far I’ve gotten posts taken off two scraper sites. The rest don’t seem to care what displays on their site :blank: . Interestingly, I have seen some of the comments from our posts also scraped and put on the scraper sites!

Reply

Darknives July 26, 2010 at 8:59 am

I wish I had someone to take care of that too in my blog lol.
Anyway, nice to know people are fighting this scum.

Reply

asdfjkl; July 26, 2010 at 4:13 pm

This image is much more annoying than your swirls:

http://sadpanda.us/images/176138-6R7AUSL.gif

(or you could make it bigger)

Reply

Kabitzin October 9, 2010 at 3:43 pm

I recently checked and I think the blinking was finally enough to get us taken off a few more scraper sites!

Reply

Jacob Chapel August 1, 2010 at 2:42 pm

Yeah this is a great way to keep people hot linking your images and possibly humiliate them.

If you are having trouble with scraper bots, ones that just scrape your site and post it elsewhere with no attribution. There is an interesting php script that you can run that blocks most of them as well as all kinds of unsavory people or bots. It is called ZB Block and it has done wonders blocking all kinds of spammers, scrapers, and unsavory types on my forums. I’m not affiliated with it in any way, just really enjoying it and thought I would share.

I like the site, keep up the good work. I may have to bookmark this place.

Reply

Leave a Comment

Previous post:

Next post: