What is Hotlinking and how do I prevent it?

If you notice a surge in the traffic coming to your site, it may not be because you are receiving more visitors. In fact, it could be the complete opposite - someone is using your images/files and your bandwidth for their visitors and their web site!


Unfortunately this is a common practice among some types of sites; known as hotlinking, there is a property of HTTP requests which makes it easy to spot, and therefore (relatively) easy to prevent.


Throughout this article, experience with (or knowledge of) regular expressions and Apache's configuration files will be useful. Although we do try to explain how each part works and what you can do to modify it to your needs, you will probably still find it confusing.

If you are confused by any part of this, or would like assistance, please Submit a Ticket to our Support Team who will do their best to help you out.

For information on how Apache's configuration files work, please our article How do .htaccess files work in Plesk?


Referring


Every modern browser, when making a request to a web site, sends a few lines lines of information to the server, giving details about what it's for, where it's from and what it can handle.


For example, suppose we open the browser and enter www.jabwebsolutions.co.uk (our home page) into the address bar. It would send the following information to the server:


GET / HTTP/1.1
Host: www.jabwebsolutions.co.uk
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;...

Normally it sends more than this (around ten or more lines, depending on the request). What we've shown are the most important parts (what it wants, from which site and who's making the request).


Once the browser has received the page, it's processed to see if it needs to be downloaded anything else (e.g. images). If so, a new request must be made to download each one. The example below is for the 'Sign Up' image at the top of each page of our pages (Notice that this request has an extra line - Referer).


GET /img/common/head/signup.gif HTTP/1.1
Host: www.jabwebsolutions.co.uk
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;...
Referer: http://www.jabwebsolutions.co.uk/

This is an important addition, and vital to prevent hotlinking. When any browser requests an image (or when you click on a link to move from page to page) it sends a Referer value to say where it's come from. If someone is 'hotlinking' to your site, then Referer page won't contain your domain name!


Also, Referer can only switched off - it can't be changed by the browser to another value. This is what makes Referer so useful in preventing hotlinking.


Rejecting


Now we know what to target in a hotlinking request, we need to devise a method to prevent the image from being downloaded. But, any request referred from your site is acceptable. We only need to block out requests where Referer is not from your domain.


Apache has a very powerful tool called mod_rewrite. This module, available on all our servers, allows you to configure the server to dynamically re-write a URL based on a set of conditions and values. While normally based on the URL, these conditions can also include information about the request - this is what we're going to use here.


To reject hotlinking, we need two things:



  • A .htaccess file with the configuration settings below (tailored to your domain); and

  • The location of the page you are going to redirect the request to (and hence return instead of the image).


It is important that you choose this page carefully. When the browser makes the request for an image, and it receives a web page instead, then the image won't be displayed, as the content returned isn't a valid image. However, the content of the page you return won't be displayed either. All they'll see is a broken image placeholder.


While the standard option is to just redirect them to your home page, if you run a website were this is a very big script (e.g. you're using Mambo to run your website), returning that page every time instead of an image massively increases the work-load on the server. It takes much more processing power to produce and send a dynamically-generated web page than to just send an image off the hard drive.


Not only that, complex web pages with lots of text, tables, etc., can be bigger in size than many images and initially may even increase bandwidth usage! A good option would be to use the following HTML page instead (for example, saving it as hotlink.html in your httpdocs/ directory), changing http://www.domain.tld to your website's address/home page:


<html>
<head>
<title>Redirecting...</title>
<meta http-equiv="refresh" content="0;url=http://www.domain.tld">
</head>
<body></body>
</html>

This works in two ways. First, if the request is for an image to display on the web site (i.e. using the <img /> tag), then when the browser received the web page, the image won't be displayed (and as the file is much smaller than the image in can save bandwith, while using much less processing time than a dynamically-generated page).


Secondly, if they are sent to view the image directly (i.e. using the <a /> tag), they'll see the web page and be automatically redirected to your home page instead.


Configuration


Be careful when making changes involving a .htaccess file. If you make a mistake, you can temporarily disable your entire website entire until the file is removed or the error corrected. Only do this if you're sure you know what you are doing.

For assistance or advice please Submit a Ticket to our Support Team.


In order to tell Apache to rewrite hotlinking requests, you need to create a .htaccess file with the following contents:


<IfModule mod_rewrite.c>
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} .*(jpe?g|png|gif|swf)$ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^https?://(www\.)?domain\.tld [NC]
RewriteRule (.*) /hotlink.html
</IfModule>

The first and last line form part of an if statement. It prevents the Rewrite* directives from being processed if the mod_rewrite module hasn't been loaded with Apache. If the directives are processed without mod_rewrite, Apache won't understand them for and just return a 500 error.


The second line simply makes sure the mod_rewrite engine is switched on and running, otherwise the rules will be ignored.


The third line defines which files the redirection will apply to. In this we're saying we only want files that end in jpg (or jpeg), png, gif or swf (Flash files) to be redirected if the Referer isn't valid. If someone links to a normal web page inside your site, we will want them to view it as normal. This will only stop other sites from using your images and hence your bandwidth.


The fourth and fifth lines control which requests are re-written. The first case is the empty case. When you type an address into the address bar and click Go, or you click on a bookmark, there is no previous page, so, the browser doesn't send a Referer header (as we saw earlier). Therefore, if someone has disabled the Referer from being sent with their request, or directly requests an image, the image can still be shown.


At the moment, the configuration is allowing through the empty case (it says re-write the URL if the Referer is NOT empty, and does not start with our domain). If you comment it out (put a # at the start of the line), then you will only allow through Referer's which start with your domain, and hence disable the empty case.


The second case defines what your domain name is and therefore which domains should NOT be redirected - these are the Referer's you want to be able to use the images.



Both the third and the fourth lines use a system called Regular Expressions (a powerful pattern matching tool, common in Unix and Linux as well as modern programming languages including Perl, PHP and Python). Split into different lines, here is what each part the expression for last RewriteCont means:


!              # Inversion - make the result true when we
# don't match the following pattern
^ # At Start - the pattern must be at the start
# of the string (i.e. nothing is before it)
https?:// # All referer's start with http:// (or https://
# if we're on a secure site). The ? tells
# mod_rewrite that the preceeding character
# may or may not be there
(www\.)? # look for www. (however, the period has a
# special meaning in regular expressions - it
# means match anything except a new line, so
# we put a \ in front to 'escape' it and
# switch off it's special meaning. The ?
# has the same effect at the https?:// above,
# but we've grouped the text in a ( ) so the
# ? is valid for everything in ( ). Without
# the ( ) it would only affect the period
domain\.tld # this is the domain we're looking for. Again,
# the . has to be escaped each time we use it
# (don't forget to change it to your domain)

Now that we have defined what is a valid Referer for our site, we need to tell mod_rewrite what we want to do. This is the sixth line and, again using Regular Expressions to match the URLs we can run the redirect on (the command (.*) means match any value of any length, so we can catch all URLs) we tell it to be re-written to /hotlink.html.


Variations & Tips


Q. I have more than one domain name hosting the site (I have one or more domain aliases). How do I include them all?


A. The difficulty of this depends on the names of the different domains you want to include.

For example, if you have the same actual domain name, but different extensions for it, or the same extension but different domain names, then you just need the single OR case. If you have different domain names and extensions, you'll need to start stacking the OR cases to build up a regular expression.


An example of how this is done is shown for the first RewriteCond statement where we match the different file extensions that the rule will be run for (see above). An OR statement is simply two (or more) options separated by the pipe, "|", character, surrounded by brackets (for easier reading, and for grouping the sets of OR's), e.g. (a|b) reads as match a OR b.


Conditions can also be stacked, so one of the options in the OR case can itself be another OR statement: (a|(b|c)), although that's the same as (a|b|c) in logic terms.


Using this construct, we can create groups for the RewriteCond statement:


(www\.)?example\.(com|net|org)
# matches example.com, example.net and example.org (with
# or without www. at the start)

(www\.)?(example\.com|testing\.com)
(www\.)?(example|testing)\.com
# both would match either example.com or testing.com
# (again, with or without www. at the start)

(www\.)?(example\.(com|net|co\.uk)|testing\.(net|org))
# would match example.com, example.net, example.co.uk,
# testing.net or testing.org (with or without www.
# at the start)

Q. What if I wanted to allow another site to link to my images?


A. This is the same as if you have different domain aliases on your site. You just need to include the domain in as part of the list, building the OR statements as necessary


Q. How do I prevent hotlinking from a sub-domain?


A. As a sub-domain doesn't have the www. at the start, you can replace the (www\.)? bit at the start with the sub-domain. And, just like the main site, you can build up the domain list using the OR statement. For example:


forum\.example\.com
# just matches the forum.example.com

forum\.example\.(com|org|net)
# matches forum.example.com, forum.example.org
# or forum.example.net
  • 118 Kasutajad peavad seda kasulikuks
Kas see vastus oli kasulik?

Seotud artiklid

Can I access my website before the DNS has propagated?

Normally, when working with websites, you have to use the settings provided by the server...

Why does a change to my DNS take so long to appear?

DNS, or Domain Name Servers, are a group of servers which provide a distributed management of...

How do .htaccess files work in Plesk?

Programs in Linux, such as Apache (the server daemon) and qmail (the mail daemon) are...

What are all the directories for in Plesk?

When you connect to the server via FTP, SFTP or SSH, or enter the File Manager in Plesk, you...

How do redirects work with websites?

Depending on the type of redirect you can use (there are five main types, plus a sixth based on...