Tech
Google admits massive document leak related to search algorithm is authentic
Google has confirmed that a massive leak of some 2,500 internal documents related to its search engine is authentic – and one expert said the trove shows that “Google tells us one thing and they do another” when it comes to its mysterious algorithms.
The tech giant has been secretive about how its search engine works even as it has wielded outsize influence over the flow of information, traffic and ad revenue online.
Some details appeared to contradict past public statements by Google employees regarding which factors are and are not used to calculate rankings.
For example, a Google Search employee said in 2016 that the company doesn’t “have a website authority score.”
The company has also explicitly denied using Chrome data in search rankings.
Information in the documents, however, suggests that Google considers click rates, data from its Chrome web browser, website size and a factor called “domain authority” – a measure of a website’s importance or relevance on a particular subject – to guide rankings.
“The main takeaway here is Google tells us one thing and they do another,” iPullRank CEO Michael King, who published the first analysis of the trove, told The Post.
“These documents give us clarity on that,” King added. “We don’t have the recipe that Google is using for search, but we now have a really clear indication of what the ingredients are.”
Some experts, including the trade publication Search Engine Land, have noted the documents mention modules that suggest Google implements “whitelists” for certain topics, including searches related to elections (IsElectionAuthority) and the COVID-19 pandemic (IsCovidLocalAuthority).
King said the references are likely Google’s attempt to identify “quality sources” on a given subject.
Details about how the whitelists may operate are scant, but Google has faced allegations of exhibiting a left-wing bias for years. A recent analysis by media company AllSides found that 63% of articles on Google News were from left-leaning outlets, compared to just 6% from right-leaning sources.
An analysis by right-leaning watchdog Media Research Center detailed 41 alleged instances of “election interference” at the online search giant since 2008.
The report cited data from Dr. Robert Epstein, who once testified to the Senate Judiciary Committee that “biased search resulted generated by Google’s search algorithm” shifted “at least 2.6 million votes to Hillary Clinton.”
Google has long denied it is bias against conservative viewpoints and has said Epstein’s research is “widely debunked.”
The leaked search documents allegedly contain more than 14,000 ranking factors that Google considers when organizing websites – from news outlets like The Post to small business owners and beyond.
The internal data reportedly surfaced on the online code repository GitHub in March, but it did not receive public scrutiny until search engine optimization (SEO) experts Rand Fishkin and Michael Hill obtained and posted separate breakdowns.
Google tacitly confirmed that the documents are real – though it warned that they lacked important context and shouldn’t be used by the public to glean any insights about how search works.
“We would caution against making inaccurate assumptions about Search based on out-of-context, outdated or incomplete information,” Google spokesperson Davis Thompson said in a statement.
“We’ve shared extensive information about how Search works and the types of factors that our systems weigh, while also working to protect the integrity of our results from manipulation,” the statement added.
Google also warned that the documents are not a comprehensive, relevant or up-to-date view of its Search ranking algorithm.
It’s still unclear if Google has actually implemented any of the ranking factors detailed in documents or was merely testing or experimenting with them. Some may have never been used at all.
Even if they were in use, it’s essentially impossible to assess how important they are in crafting what users see in search results.
The documents did not reveal how the ranking features are weighted.
The leaked documents provide an interesting, yet incomplete view of the company’s inner workings on search, according to Barry Schwartz, a prominent SEO expert and owner of the web consultancy RustyBrick.
Schwartz said the documents are best seen as a signal of “what Google is thinking about” as it relates to online search.
“How Google does that around certain factors like links and content quality and authority and authors – all of that’s in there,” Schwartz said. “The question is, we don’t know what they’re weighted, how important are these signals, are they used at all. That’s the issue with this.”
Nevertheless, the documents amount to “the biggest leak that we’ve ever seen come out of Google for search,” according to King.
“This is the biggest, most transparent that we’ve ever seen into how Google functions,” King said.