(713) 568-2763

How to Get Rid of Language Spam in Google Analytics

How to Get Rid of Language Spam in Google Analytics

language-spam

Have you recently noticed spammy sources of traffic in your language report on the Google Analytics Audience Overview page? Over the past few months, we’ve seen an uptick in language spam originating around the 2016 U.S. Presidential Election season. Disregarding your political leanings, we can agree that everyone wants less spam in their GA account, right? I’ll show you how to get rid of your language spam by using an advanced segment and assist you in steering clear of skewed data.

ga-spam-language

What is Language Spam?

In general, spam isn’t new to Google Analytics. In fact, 90% of all sites using GA were affected by referrer spam in 2015.

Bots either visit your website and mimic actual users or infect Google Analytics servers and send direct hits to your account.

Once the data is recorded in the GA view, there’s nothing you can do to permanently remove it. An unfortunate circumstance, but it is what it is.

On a small to medium-sized business website, this traffic can account for a huge percentage of all daily sessions, which causes inconsistencies and issues for reporting and digital marketing campaign tracking.

How to Get Rid of Language Spam…Mostly

You may not be able to get rid of the traffic data, but you can set up an Advanced Segment to exclude the language spam visits while you’re working with your data.

There are a lot of ways to skin a cat, but in this case, the easiest way to segment out the language spam is to exclude traffic where the language value is more than 10 characters.

Add a segment by clicking +Add Segment under the report name heading in Google Analytics.

 

ga-spam-language-segment

Click the red box +New Segment and you’ll see the Language field under Demographics tab. Choose option does not match regex from the dropdown menu and set the value as .{10,}

ga-spam-language-segmentation

 

When you add this Advanced Segment, you’ll see that it excludes visits associated with language spam values.

ga-spam-language-trump

 

TIP: Before you use this Advanced Segment filter, run a language report (Audience > Geo > Language) and make sure you don’t have any legitimate traffic with language values longer than 10 characters. 

If you do, you can increase the number of characters in the filter (.{12,} .{20,} .{40,}) or experiment with excluding language values with unusual characters (e.g. exclamation points).

Whatever you end up doing, just make sure to verify what you’re excluding in the language report.  Good luck!

 

Additional reference material:

Google Analytics Segments & Regular Expressions

https://support.google.com/analytics/answer/3124493?hl=en&ref_topic=3123779

https://support.google.com/analytics/answer/1034324?hl=en

 

 

 

 


Chris Koss
Chris is the web analytics lead at Forthea. He graduated from the University of Michigan (Go Blue!) and spends his time buried in Google Analytics and Tag Manager, setting up tracking on client sites. Day to day work might include setting up eCommerce tracking, fixing cross-domain visit challenges, or processing tens of thousands of rows of data investigating discrepancies between various 3rd party digital tools. In his spare time, Chris and his wife might be at the dog park with their dogs or he might be trying to break 90 on the golf course. Ask him to talk about his favorite Excel functions, including VLOOKUP, SUMIFS, and INDEX(MATCH).

0 Comments

Leave a reply

Your email address will not be published. Required fields are marked *

*