Google Curates Data for AI: Whistleblower
Google Curates Data for AI: Whistleblower

By Ella Kietlinska and Joshua Philipp

By curating the data that artificial intelligence (AI) uses to learn, tech companies like Google can bias the AI to censor information flowing on the internet, a Google whistleblower says.

When Zach Vorhies was working for Google, he was concerned about how the company was curating data to generate AI biased with social justice or leftist values that adhere to certain narratives.

 “AI is a product of the data that gets fed into it,” Vorhies, a former Google employee turned whistleblower, said on EpochTV’s “Crossroads” program on Jan. 5.

“If you want to create an AI that’s got social justice values  … you’re going to only feed it information that confirms that bias. So by biasing the information, you can bias the AI,” Vorhies explained.

“You can’t have an AI that collects the full breadth of information and then becomes biased, despite the fact that the information is unbiased.”

A man walks outside the Tencent headquarters in the Nanshan district of Shenzhen, Guangdong Province, China, on Sept. 2, 2022. (David Kirton/Reuters)

AI Talkback Gets It Into Trouble

In 2017, Tencent, a Chinese big tech company, shut down an AI service after it started to criticize the Chinese Communist Party.

Tencent, a video game maker and the owner of WeChat, offered a free service for its users, letting them chat with an AI character. The chatbots, Little Bing and Baby Q, could talk on a variety of topics and grew smarter as they interacted with users, according to a report by Japanese public broadcasting’s NHK World.

When a user posted a message saying, “Hurray for the Communist Party,” Tencent’s chatbot replied, “Are you sure you want to hurray to such a corrupt and incompetent political system?” according to the report.

When the user asked the AI program about Chinese leader Xi Jinping’s “Chinese Dream” slogan, the AI wrote back that the dream meant “immigrating to the United States.”

A smartphone in front of Microsoft logo on July 26, 2021. (Dado Ruvic/Reuters)

Another example of AI exhibiting unexpected behavior was Tay, a chatbot developed by Microsoft for 18 to 24-year-olds in the United States for entertainment purposes.

Tay, launched in 2016, was supposed to learn from users it was conversing with, but after trolls on Twitter exploited its learning ability, Tay began spouting a variety of offensive and obscene comments. Microsoft shut down the chatbot after only 16 hours.

Vorhies believes that the Tay incident was an intelligence operation, intended to spawn machine learning (ML) fairness research in academia and Google.

What Is Machine Learning Fairness

ML fairness, as applied by Google, is a system that uses artificial intelligence to censor information processed by the company’s main products such as Google Search, Google News, and YouTube, Vorhies said.

It classifies all data found on the platform, in order to determine which information is to be amplified and which is to be suppressed, Vorhies explained.

Machine learning fairness causes what can be found on the internet to constantly evolve, so results displayed in response to a query may differ from those returned for the same query in the past, he said.

If a user searches for neutral topics—for example, baking—the system will give the person more information about baking, Vorhies said. However, if someone looks for blacklisted items or politically sensitive content, the system will “try not to give [the user] more of that content” and will present alternative content instead.

Using machine learning fairness, a tech company “can shift that Overton window to the left,” Vorhies said, “Then people like us are essentially programmed by it.” The Overton Window refers to a range of political policies considered to be acceptable in public discourse at a given time.

Some experts in machine learning believe that data collected from the real world already includes biases that exist in society. Thus systems that use it as is could be unfair.

Illustration picture shows a mobile phone and laptop with the Google website, on Dec. 14, 2020. (Laurie Dieffembacq/Belga Mag/AFP via Getty Images)

Accuracy May Be Problematic

If AI uses “an accurate machine learning model” to learn from existing data collected from the real world, it “may learn or even amplify problematic pre-existing biases in the data based on race, gender, religion or other characteristics,” Google says on its “ai.google” cloud website, under “Responsible AI practices.”

“The risk is that any unfairness in such systems can also have a wide-scale impact. Thus, as the impact of AI increases across sectors and societies, it is critical to work towards systems that are fair and inclusive for all,” the site says.

To illustrate how machine learning should be evaluated from the fairness perspective, Google provides an example of an app to help kids select age-appropriate books from a library that has both adult and children’s books.

If the app selects an adult book for reading by children, it may expose children to age-inappropriate content and may upset their parents. However, according to the company’s inclusive ML guide, flagging children’s books that contain LGBT themes as inappropriate is also “problematic.”

The goal of fairness in machine learning is “to understand and prevent unjust or prejudicial treatment of people related to race, income, sexual orientation, religion, gender, and other characteristics historically associated with discrimination and marginalization, when and where they manifest in algorithmic systems or algorithmically aided decision-making,” Google says in its inclusive ML guide.

Sara Robinson, a staff developer relations engineer at Google, discussed the topic in an article on Google’s cloud website. Robinson called fairness in machine learning the process of understanding bias introduced by the data feeding the AI, and ensuring that the AI “provides equitable predictions across all demographic groups.”

“While accuracy is one metric for evaluating the accuracy of a machine learning model, fairness gives us a way to understand the practical implications of deploying the model in a real-world situation,” Robinson said.

How AI Censorship Works

A former senior engineer at Google and YouTube, Vorhies said: “Censoring is super expensive. You literally have to go through all the pieces of information that you have, and curate it.”

If the Federal Bureau of Investigation (FBI) flags a social media account, the social media company puts that account on its “blacklist” which then goes to the AI, Vorhies said. Keywords are important because “the AI likes to make decisions when it has labels on things.”

Labeling groups of data into categories facilitates machine learning in AI. For instance, the AI for self-driving cars uses labeling to distinguish between a person, the street, a car, or the sky. It labels key features of those objects and looks for similarities between them. Labeling can be performed manually or assisted by software.

Suppressing a person on social media is done by AI based on data labels curated by the company’s staff, Vorhies explained. The AI then decides whether the person’s posts are allowed to trend or will be de-amplified.

Vorhies worked at YouTube from 2016 to 2019, and said the company applied similar practices.

YouTube, a Google subsidiary, had something like a “dashboard of classifications that were being generated by their machine learning fairness,” the whistleblower said. The AI knew, based on history and the current content, how to label a person, e.g., as a right-wing talk show host, he explained.

“Then someone sitting in the back room—I don’t know who this was—was doing the knobs of what is allowed to get amplified, based upon their personal interests.”

Psychological Warfare

Google’s search engine considers mainstream media authoritative and boosts content accordingly, Vorhies said. “These mainstream, leftist organizations are ranked within Google as having the highest authoritative value.”

For example, if someone searches for information about a local election, “the first five links in the search results are going to be what the mainstream media has to say about that,” Vorhies said. “So they can redefine reality.”

If Wikipedia changes its view on something and starts to consider this matter a “conspiracy theory and not real,” people will be confused about what to think about it. Most do not know that there is psychological warfare and an influence operation that is directly targeting their minds, Vorhies said.

NH POLITICIAN is owned and operated by USNN World News Corporation, a New Hampshire based media company specializing in the collection, publication and distribution of public opinion information, local,...