How To Resolve The Referrer Spam Problem

As promised, we have been working on the Analytics referral spam problem that we discussed previously here at Freelance SEO Essex. Referrer spam is now clogging up the Google Analytics accounts of most websites, and if you use this tool regularly, you are probably already wondering how to get rid of the spam and return accuracy to your reports.

If you’ve read up on the topic you might already have come to the conclusion that it isn’t as straightforward as it sounds. Advice is conflicting and often confusing. We’re going to try to make that simpler.

To help develop a solution we spoke with Mat Bennett from OKO Digital. Many of OKO’s clients rely on advertising as their main revenue stream and therefore must provide accurate data to their advertisers, so Mat and his team have developed a track record for improving the quality of data that comes out of Google Analytics.

Referrer spam comes in multiple flavours

One of the reasons for confusion is that there are two types of referrer spam plaguing site owner’s analytics.  Methods that stop one don’t necessarily stop the other.  The goal of both types is the same: fill your reports with fake referrers so that you check out the spammer’s website.  The method differs, though.

Type 1 referrer spam:  solid spam

The first type of referrer spam hits your site with automatic requests to simulate your pages being called by a user.  This passes the referrer into your tracking.

The good news is that this is easier to control. Because they are hitting your site you can use your server to control the requests.  The bad news is that this type of referrer spam might actually be slowing your site down by creating extra load.

Type 2 referrer spam: ghost spam

Ghost spam differs because there is nothing really there.  Rather than creating a request for your web page, the spammers instead request the analytics script directly and totally bypass your website.

This does mean that they are not creating load on your server, but it also means that they are bypassing a lot of the systems that you could otherwise use to defend your site against the problem.

Methods that won’t work (but might still be worth trying!)

Block using robots.txt

Robots.txt is a file that sits on your server and tells automatic crawlers (aka robots) whether you want them on your website or not.  This isn’t a solution to referrer spam, as most bad bots don’t honour the instructions in a robots.txt file.  It might still be worth trying to block them via your robots.txt, though, as doing so could remove a few and prevent other annoying bots that can skew your analytics and create load on your server.

Related:   Content Marketing And SEO: Are They Really One And The Same?

Use the referral exclusion list in analytics

Lots of blog posts and forum posts recommend using the “referral exclusion list” in Google Analytics to tackle this problem.  The name might suggest that this is the tool for the job, but it will not actually help. If you use this method you will stop seeing the spammers in the referral lists, but the fake visits they create will be classed as direct traffic and will still make all of your data inaccurate.

This tool is actually designed to increase report accuracy when you run your site from multiple domains.

Use the “bot filtering” in Analytics

The bot filtering feature in Google Analytics offers the option to “exclude all hits from known bots and spiders”.  This sounds like the perfect solution and probably would be if Analytics “knew” a few more.  It’s a good feature to enable, but frustratingly doesn’t remove the big offenders that are currently affecting most people’s websites.

What does work then?

Because we’re dealing with multiple issues we need to take multiple steps to solve them. The three step approach below will eliminate the worst problems that are being experienced by webmasters today.  This is an arms race though. Referrer spam appears to work for some of these firms, and so we can expect their methods to adapt to the steps that webmasters take to stop them.

Block known offenders in .htaccess

This method will not stop the ghosts and isn’t even needed if you only want to remove the fake referrals from Analytics.  The main benefit is that it will stop the bots that might slow your site down.  The principal is simple: instruct your server to block traffic originating from known bad-players.

For sites hosted on apache servers (which is most sites) this can easily be done by modifying the .htaccess file.  Simply create a list of offenders and add something like the following to your .htaccess file. Please note the slight format change on the last item.


Related:   How A Web CMS Can Boost Your Enquiries

If you want to speed up the process and have defences waiting for other common offenders, there is a list of common bad players that you could use here.

Insist on a valid hostname

This method blocks the ghost referrals very effectively and has the advantage that it will work against new spammers that you haven’t yet added to your list.  Ghost referrals work by calling the analytics script using random IDs.  They are not targeting your site directly, just calling the script randomly hoping to find a live site at some point.  When a real referral is created, it is created against your website hostname. As ghost spammers don’t know your hostname they can’t include this, which gives us something to create a filter on.

In Google Analytics, go to Admin > All Filters > Add new filter, then create a new filter like the one below.   In the pattern box, carefully add every hostname where you use analytics (your main website, any subdomains you use, your third party shopping cart etc).   Replace the “.” in each hostname with “\.”  Then save.

Google Analytics

Finally, filter out the known bad player

To deal with the “solid” spam, we’ll create another filter to remove sites that we know to be spamming.  Like the .htaccess method, this is a list that you might have to update from time to time.   Again, the piwik referrer spam list referenced above can be used for this.

This time we want to exclude traffic based on campaign source.  Use the following settings, pasting in your escaped list of domains to the filter pattern box.

Google Analytics

Job done (for now)

The above steps should keep your reports clean and accurate and also stop those spammers slowing things down for your users.  It’s an ongoing battle though.  Methods change and more spammy domains are going to be used in the coming months, so it’s important to keep your lists up to date and always keep an eye out for anything new or strange happening in Analytics.

If you need help setting this up, contact FSE Online today. As well as cleaning up your analytics reports, we can help you to generate more business leads from your website.

Here at FSE we plan to continue the fight against spam (not the meat) and will keep you all posted as and when we find anything new and worth sharing. Feel free to subscribe to our blog so you don’t miss our next update!

Read more: How To Resolve a Google Manual Penalty »