12 Jan '17

Have you recently noticed spammy sources of traffic in your language report on the Google Analytics Audience Overview page? Over the past few months, we've seen an uptick in language spam originating around the 2016 U.S. Presidential Election season. Disregarding your political leanings, we can agree that everyone wants less spam in their GA account, right? We'll show you how to get rid of your language spam by using an advanced segment and assist you in steering clear of skewed data. ga-spam-language

What is Language Spam?

In general, spam isn't new to Google Analytics. In fact, 90% of all sites using GA were affected by referrer spam in 2015. Bots either visit your website and mimic actual users or infect Google Analytics servers and send direct hits to your account. Once the data is recorded in the GA view, there’s nothing you can do to permanently remove it. An unfortunate circumstance, but it is what it is. On a small to medium-sized business website, this traffic can account for a huge percentage of all daily sessions, which causes inconsistencies and issues for reporting and digital marketing campaign tracking.

How to Get Rid of Language Spam...Mostly


You may not be able to get rid of the traffic data, but you can set up an Advanced Segment to exclude the language spam visits while you’re working with your data. There are a lot of ways to skin a cat, but in this case, the easiest way to segment out the language spam is to exclude traffic where the language value is more than 10 characters. Add a segment by clicking +Add Segment under the report name heading in Google Analytics.   Click the red box +New Segment and you'll see the Language field under Demographics tab. Choose option does not match regex from the dropdown menu and set the value as .{10,} ga-spam-language-segmentation   When you add this Advanced Segment, you'll see that it excludes visits associated with language spam values.    TIP: Before you use this Advanced Segment filter, run a language report (Audience > Geo > Language) and make sure you don’t have any legitimate traffic with language values longer than 10 characters.  If you do, you can increase the number of characters in the filter (.{12,} .{20,} .{40,}) or experiment with excluding language values with unusual characters (e.g. exclamation points). Whatever you end up doing, just make sure to verify what you’re excluding in the language report.  Good luck!   Additional reference material: Google Analytics Segments & Regular Expressions https://support.google.com/analytics/answer/3124493?hl=en&ref_topic=3123779 https://support.google.com/analytics/answer/1034324?hl=en        

Older Post Blog Home Newer Post
About the Author Chris

Chris graduated from the University of Michigan (Go Blue!) and spends his time buried in Google Analytics and Tag Manager setting up tracking for client sites. Day to day work might include setting up e-commerce tracking, fixing cross-domain visit challenges, or processing tens of thousands of rows of data for an advanced analytics project. In his spare time, Chris and his wife might be at the park with their dogs or he might be trying to break 90 on the golf course. Ask him to talk about his favorite Excel functions, including VLOOKUP, SUMIFS, and INDEX(MATCH).

Read Bio
Help! I have a Quick Question.
Inc. 5000 Google Premier Partner Microsoft Advertising Houston Business Journal Best Places to Work Brightedge Certified Professional
Top Desktop Tablet Mobile