Have you recently noticed spammy sources of traffic in your language report on the Google Analytics Audience Overview page? Over the past few months, we've seen an uptick in language spam originating around the 2016 U.S. Presidential Election season. Disregarding your political leanings, we can agree that everyone wants less spam in their GA account, right? We'll show you how to get rid of your language spam by using an advanced segment and assist you in steering clear of skewed data.

GA spam language

What is Language Spam?

In general, spam isn't new to Google Analytics. In fact, 90% of all sites using GA were affected by referrer spam in 2015. Bots either visit your website and mimic actual users or infect Google Analytics servers and send direct hits to your account. Once the data is recorded in the GA view, there’s nothing you can do to permanently remove it. An unfortunate circumstance, but it is what it is. On a small to medium-sized business website, this traffic can account for a huge percentage of all daily sessions, which causes inconsistencies and issues for reporting and digital marketing campaign tracking.

How to Get Rid of Language Spam...Mostly

You may not be able to get rid of the traffic data, but you can set up an Advanced Segment to exclude the language spam visits while you’re working with your data. There are a lot of ways to skin a cat, but in this case, the easiest way to segment out the language spam is to exclude traffic where the language value is more than 10 characters. Add a segment by clicking +Add Segment under the report name heading in Google Analytics.   Click the red box +New Segment and you'll see the Language field under Demographics tab. Choose option does not match regex from the dropdown menu and set the value as .{10,}

When you add this Advanced Segment, you'll see that it excludes visits associated with language spam values. 
TIP: Before you use this Advanced Segment filter, run a language report (Audience > Geo > Language) and make sure you don’t have any legitimate traffic with language values longer than 10 characters. If you do, you can increase the number of characters in the filter (.{12,} .{20,} .{40,}) or experiment with excluding language values with unusual characters (e.g. exclamation points). Whatever you end up doing, just make sure to verify what you’re excluding in the language report. Good luck!

Additional reference material: 
Google Analytics Segments & Regular Expressions 

GA spam language

Help! I have a Quick Question.

Desktop Tablet Mobile