August 22 0 158

What the Recent Google Data Leak Reveals: Sandbox Filters, Shadow Bans, and More

In late May, Google’s confidential data made headlines when around 2,500 documents were leaked online. The first hint of this leak came in an anonymous email to Rand Fishkin, the founder of Moz.com, on May 5th, but it only gained traction recently.

Photo credit: ryanhanley.com

Now that the initial buzz has died down, let's explore what this leak uncovered and how it might impact SEO professionals like you.

What we learned about domain authority metrics

For a long time, Google denied the use of a “siteAuthority” metric, claiming it wasn’t used to influence search rankings. The leaked documents suggest otherwise, revealing that this metric does play a role in assessing a site’s authority.

Although the documents don’t fully clarify how it’s used or its impact, this is still a major revelation.

What We Learned About Click Data and Search Rankings

It’s not exactly breaking news since many search engines have long used click data to deliver better search results. Click tracking systems like NavBoost and Glue have been around since about 2005 and work like this:

But just clicking isn’t enough. The leaked documents reveal that clicks, referred to as “votes” from users—referred to as “voters”—are also tracked, along with:

  • Failed clicks;
  • Clicks by location (regions and cities);
  • Duration of clicks during a browsing session.

The last point shows that search engines also monitor how long users stay on a site — something Google had previously denied.

Sources: Topvisor and Twitter

Google Search Engineer Paul Haar mentioned NavBoost in his 2019 resume, despite previous denials from Google. Gary Illyes, another Google employee, had stated at SMX West 2016 that using clicks directly for rankings would be a mistake.

This misinformation could potentially lead to legal trouble for Google, particularly as Western users are sensitive about privacy concerns.

What we learned about the Sandbox filter

In August 2019, Google denied the existence of the “Sandbox” filter — a mechanism that puts new sites under extra scrutiny, temporarily preventing them from ranking well in search results.

The Sandbox effect usually lasts around three months. If a site follows Google’s guidelines during this period, the filter is eventually lifted.

What we learned about linking to authoritative sources

Links from high-authority sites, such as major news websites, carry more weight in search rankings. To boost your site’s ranking, link to reputable resources and primary sources.

For instance, if you’re writing about the Google data leak, link to the source of the leaked documents. Google evaluates the “weight” of links based on its internal metric called Homepage Trust.

What we learned about keywords in titles and descriptions

Including relevant keywords in your titles and descriptions is still crucial. Google checks how well your titles match user queries.

Also, the freshness of your content matters — not just the publication date, but also when it was last updated. It’s a good practice to note when content was updated.

What we learned about new sites and trust

New sites often get tagged as “small personal sites,” which can affect their ranking and trust level:

Sources: API Reference

The leaked documents don’t clarify whether this tag applies to all new sites or just to single-page sites and mini-blogs.

Additional resources

To dive deeper into the leaked data, check out these valuable sources:

This leak provides new insights into Google’s practices and could influence how you approach your SEO strategy. Stay updated and adjust your tactics accordingly!

How do you like the article?
#Google data leak