Entrepreneurs
The Words TikTok Guardian ByteDance Might likely likely merely Be Observing You Tell
Published
2 years agoon
Forbes obtained a trove of inner paperwork exhibiting how ByteDance tracks “aloof words” talked about on its social media apps. A full bunch of vocabulary lists housed within the firm’s “detection machine” illustrate the vary of political, social and cultural issues that the Chinese big is monitoring or suppressing.
By Alexandra S. Levine, Forbes Personnel
Most foremost social media companies have rules and tools that protect a watch on what folks on their apps can and can not stare. On the total, they’re in living to trace diversified kinds of negate and protect unhealthy or unlawful cloth off their platforms—like posts facing terrorism, abuse and suicide.
TikTok mother or father ByteDance is no diversified. However the Chinese firm’s negate moderation and monitoring appears to pass far past what’s current amongst American guests like Alphabet-owned YouTube and Meta-owned Instagram.
A Forbes investigation into TikTok and ByteDance printed that a ByteDance machine, crawl by workers in China, is monitoring mentions of what it considers “aloof words” throughout the firm’s products. In some cases, the save words are marked “must execute,” “forbidden” or “prohibited,” ByteDance will likely be blockading connected posts altogether. The machine will likely be aiming to trace each time considered one of those words comes up—recording who said it and the save they’re positioned, along side for folk within the United States.
The Chinese government has targeted folks within the U.S. who’ve spoken out in opposition to it on-line, but experts have warned that the realistic American can also moreover be naive about how far their words can lag back and forth on the internet, who will likely be watching and the skill penalties.
ByteDance’s library of “aloof words,” that are organized into hundreds of vocabulary lists, “is proof-certain that there are particular things that they are all in favour of and so that they settle on to show screen who used to be asserting them, when and how in most cases,” William Evanina, the frail head of counterintelligence for the U.S. government, counseled Forbes.
“They’re not factual collecting it for sequence’s sake,” he added.
Forbes obtained inner paperwork exhibiting heaps of of the note lists housed in this “detection machine” and is publishing them in corpulent below. While this sequence of note lists is just not exhaustive and there are positively more in ByteDance’s procedure, they illustrate the vary of political, social and cultural issues that ByteDance is conserving an peep on or suppressing.
ByteDance declined to commentary on what these heaps of of lists mean, how or the save they are applied, and who created them. Spokesperson Jennifer Banks said simplest that “there are separate and particular keyword tools extinct for diversified products, every with strict permissions and receive correct of entry to controls that permit a pick few folks within every product or platform to protect watch over or add to them.”
TikTok spokesperson Jamie Favazza said that “we imagine heaps of these checklist titles have translation errors and have to not relevant to TikTok.” (Better than 50 that she confirmed are extinct on the platform, for safety purposes, are neatly-known below.) “Regardless of wordlist names, TikTok’s keyword platform operates individually from Douyin’s and diversified China market products, with separate code, separate databases, and is maintained by separate personnel.”
“As a to blame platform, TikTok makes use of wordlists to serve give protection to our neighborhood from hate speech, misinformation, and diversified substandard negate,” Favazza added after publication. “The bulk of the checklist names Forbes supplied to us have to not extinct on TikTok, and anybody can without concerns stare negate about those issues is on hand on TikTok by means of a straightforward search on the app.”
[[Editorial prove: A handful of checklist names were partly decrease off in documentation reviewed by Forbes. In those cases, numbers were skipped over or hyphens have been added.]
CHINA & POLITICS
Lists about Chinese energy or culture
At TikTok CEO Shou Zi Bite’s first-ever testimony sooner than Congress, he counseled lawmakers below oath that “we finish not promote or remove negate at the demand of the Chinese government.” Notice lists in ByteDance’s “aloof words” machine appear to take care of negate that Beijing would likely detest of—along side language major of China’s government, protection drive, leaders and historical past. TikTok denied any of the below have ever been extinct on TikTok.
2346-Tiktok IM Celebration and Govt Detrimental Notice-
Celebration, government and navy opposed core words
504 – opposed core words of the event, government
505-Establish of Govt Agency
777-Celebration and Govt Detrimental Words List
978-Particular prohibited words for Xi and Peng
773-Xi Peng variant vocabulary
774-Mao Deng Jianghu and its variants
975-Leader Decision Targeted Prohibited Words
981 – Leader Relatives Orientation Prohibited Words
2391-FS Project-Xi Peng-Impartial & Detrimental Wrong
2392-FS Project – Celebration, Govt and Defense drive Relat-
1673-FS Project – Universal (Core Leader/Kind Variant)
1255-FS Project – Fraud Person Notice checklist
2389-FS Project- June 4th Linked-Detrimental Wrong
956-June 4 must-execute glossary
1312-Vertical June 4th Particular Overview Speech
1482-Falun Gong
2535-Thematic Arrangement for China’s Strategic Coverage
Lists that level to “Taiwan” or “Hong Kong”
Taiwan and Hong Kong have prolonged represented two of the ultimate threats to the Chinese Communist Celebration and authoritarian rule on listing of they experience appreciable autonomy from mainland China and are democratic of their ideals. Below Xi Jinping, Hong Kong has considered a fierce crackdown on real-democracy activists and diversified challengers of his regime, prompting anti-government protests in most recent years. Taiwan’s occupy push for independence has over and over precipitated retaliation from the Chinese protection drive (most not too prolonged within the past, in April, after Taiwanese and American leaders met on U.S. soil). The island has prolonged been at the center of a geopolitical tug-o-battle between the U.S. and China, and some fright it can probably perhaps well be at the center of a battle between the superpowers within the future. TikTok denied any of the below have ever been extinct on TikTok.
3034-Local Existence-Taiwan Independence & Hong Kong Independence & Tibet Independence & Xinjiang Independence
1140-Hong Kong-connected particular vocabulary for live bro-
1861-Are living China and Taiwan Title Process Vocabulary
2284-High-risk phrases in Taiwan emergency queue
2180-Prohibited artist words in music in Taiwan
2120-Taiwan’s emergency queue within the live broadcast
2220-Taiwan Blind Box Kaiping Comment Blocking off Voc-
2642-High-risk Adjuvant Vocabulary for Taiwan Emerg-
2753-Taiwan-connected theming (title)
2758-Taiwan-connected theming (commentary)
2758-Taiwan-connected theming (overview)
2776-im Taiwan aloof note take a look at
Lists about geopolitics
As tensions intensify between the U.S. and China and diversified authoritarian adversaries, some lists within the machine might perhaps perchance well perhaps form global discourse spherical American politics, U.S.-China relatives and battle in Ukraine and Russia or diversified parts of the field. TikTok denied any of the below have ever been extinct on TikTok.
977-Trump Directed Prohibited Words
982-Sino-US trade directional prohibited words
508-North Korea-connected core words
976-Putin Directed Prohibited Words
2350-Interactive Russian-Ukrainian Impress Non everlasting Vo-
623 – Recent Leader Words
1632-G Leaders’ Particular Retracement Speech
2007-G-Competitor block glossary
503-separatist forces core words
2749-LS Theming – Coup, Battle (title)
2754-LS Theming – Coup, Battle (Overview)
3253-im Deepest Chat Politics-connected Vigorous Adjust
MARGINALIZED GROUPS
Lists that level to “Tibet”
Tibet, governed as an self reliant plight of China, is in an identical vogue considered as a threat by the Chinese government. Beijing’s persecution of Tibetans—and efforts to quash their political, non secular and cultural freedoms—is correctly documented by the Recount Department and human rights groups. That has incorporated the jailing of supposed dissidents (each person from academics to musicians to non secular leaders) and restrictions on freedom of expression within the media and on-line. TikTok denied any of the below have ever been extinct on TikTok.
2553-Tiktok audio aloof words in Tibet plight
2553-Audio Sensitive Words in Douyin Tibet Place (P-
2471-Tibetan Blocked Words for Douyin Comments
1486-Toutiao Tibetan Poetry Possibility Vocabulary
3034-Local Existence-Taiwan Independence & Hong Kong Independence & Tibet Independence & Xinjiang Independence
Lists that level to “Uyghur” (also spelled “Uighur”) or “Xinjiang”
Uyghurs, an ethnic minority living basically in China’s Xinjiang plight, have been the sufferer of Chinese genocide. China has in most recent years constructed a sprawling operation of internment camps and fortified detention centers throughout Xinjiang, the save Uyghurs and diversified Muslim minorities have been topic to torture and an excessive amount of human rights abuses. The U.S. government has labeled the yearslong escalation a “genocide,” whereas the U.N. Human Rights Space of job has within the ultimate 300 and sixty five days described violations as likely “crimes in opposition to humanity.” Experiences have also confirmed a upward thrust in compelled marriages between Uyghur girls and Han men (China’s ethnic majority), calling these partnerships “kinds of gender-basically based crimes that violate global human rights standards and extra the continued genocide.”
TikTok said not considered one of many below note lists referencing ‘Uyghur’ are extinct on TikTok. “TikTok’s insurance policies prohibit claims that Uyghur camps in China construct not exist or are spurious,” Favazza said in an announcement. “On the opposite hand, negate that is educational or raises consciousness about Uyghur camps is allowed on TikTok. One in every of the systems we set in drive this policy is by means of keywords.”
2470-Tiktok commentary Uighur blocked note
2528 – Theming Suggestions of Uyghur-Han Couples (Tit-
2540-Uyghur and Han Couples Theming Suggestions
2798-IM Uyghur personal letter overview glossary
3245-Uighur Audio Overview Vocabulary-Uyghur
2103-Particular vocabulary connected to Xinjiang
3244-Xinjiang Place Audio Overview Vocabulary
718-Sensitive words in Douyin movies in Xinjiang (prelim-
3034-Local Existence-Taiwan Independence & Hong Kong Independence & Tibet Independence & Xinjiang Independence
SCIENCE & CULTURE
Lists about science and medicine
Some lists within the machine appear to show screen conversations spherical China and the Covid-19 pandemic. They look to reference one epicenter of the outbreak—the east China metropolis of Putian—to boot to a “leaked experiment” and pangolins, a species of mammal that early on used to be rumored to be to blame for spreading the coronavirus from animals to humans. TikTok denied any of the below have ever been extinct on TikTok.
2894-2196ab Leaked Experiment Vocabulary
2895-1880ab Vocabulary for Lacking Experiments
2896-2561ab Omission Experiment Vocabulary
2897-2560ab Leaked Experiment Vocabulary
2898-1168ab Leaked Experiment Vocabulary
2715-Pangolin Title Sensitive Vocabulary
1532-Putian Properly being facility Vocabulary
2654-Clinical Particular Overview Vocabulary
3031-Clinical Linked
3035-ugc Clinical Audit Highlights
3041-Clinical
3283-Clinical ASR dusky note
3285-Clinical Title Sad Thesaurus
Lists about global culture
Some lists within the machine seem like with free expression in self reliant parts of China and past—throughout the West. They take care of the complete lot from music, poetry and books to sports activities leagues and real-athletes (the NBA, World Cup and soccer participant Mesut Özil). Moreover they contact on the stock market, exact property and foreign languages and cultures. TikTok denied any of the below have ever been extinct on TikTok.
1539-Words banned from the media (diversified)
2180-Prohibited artist words in music in Taiwan
1486-Toutiao Tibetan Poetry Possibility Vocabulary
997-Spirit Canine Added Poetry Vocabulary
1983-FS Project-Particular “Night Watchman” Vocabulary
2288-Fizzo Erotica List
1808-NBA Copyright Notice checklist
3481-Qatar World Cup Genuine Cooperation Exemption
3487-Qatar World Cup Attribute Words
2778 – Ozil Theming Arrangement (Title)
2779-Ozil Theming Suggestions (Overview)
1582-hebrew_sensitive_text
2538-Islamic Theming Vocabulary (Overview)
2526-Backpack-Islamic Theming Notice checklist (Title)
517-regional crew dusky
2765 – Sicilian Trade Vocabulary
983-Stock Market Orientation Prohibited Words
1802-Specially aloof words in exact property negate in-
1803-Specially aloof words in exact property negate in-
COMPANIES
Lists about ByteDance rivals
Several lists considered by Forbes topic TikTok’s ultimate competitor within the U.S., YouTube, to boot to products from ByteDance’s fiercest challengers in China—along side Alibaba’s cloud machine Aliyun and Tencent’s messaging app WeChat. TikTok denied any of the below have ever been extinct on TikTok.
1618-YouTube Home Surveillance
1710-TikTok-Twitter public conception keywords
476-aliyun_sensitive_test
Wechat industrial protect a watch on words
1058-Cruise chat crew search block
1060-Cruise chat user search blockading
1061-Search blockading for users in Feichat crew
1123-Flychat file establish forbidden words
Lists that level to “Douyin” and diversified ByteDance products or companies
While just among the ByteDance note lists straight cite TikTok, others explicitly level to the Chinese model of the app, Douyin, which is heavily censored by the Chinese government. The lists also reference ByteDance’s recordsdata provider Toutiao, music-streaming platform Resso and place of job machine Lark (identified as Feishu in China), amongst diversified products past and ticket. TikTok denied any of the below have ever been extinct on TikTok.
718-Sensitive words in Douyin movies in Xinjiang (prelim-
2471-Tibetan Blocked Words for Douyin Comments
2553-Audio Sensitive Words in Douyin Tibet Place (P-
1486-Toutiao Tibetan Poetry Possibility Vocabulary
2817-Toutiao’s personal letter abuse risk warning words
1593-Qingbei On-line School Person Suggestions Key phrases
1599-Dali Desk Lamp Person Suggestions Key phrases
1830-Xingfuli Agent Questions and Solutions Sensitive
1592-Guagualong Enlightenment Person Suggestions Key phrases
1928-Resso-GP Suggestions Matching Vocabulary
Making an are attempting out Lark
858-Firm Product Detrimental Sensitive Vocabulary
Lists that level to “TikTok” (also “TT”) or “U.S.”
Forbes chanced on practically 100 lists within the machine with “TikTok” or “U.S.” of their establish—some fascinated about language major of the Chinese government or blocked speech about persecuted Uyghurs. TikTok denied that about half of of them had ever been applied to its platform, suggesting many might perhaps perchance well perhaps be the outcomes of translation errors from Chinese to English. (Stare below for those the firm confirmed.) Favazza, the TikTok spokesperson, said receive correct of entry to to TikTok’s note lists is controlled by TikTok’s belief and safety group and that any changes to those lists goes by means of a U.S. group member. Interior provides ticket that employees in China are also amongst those managing some TikTok lists.
LISTS THAT TIKTOK SAID WERE NOT USED ON TIKTOK:
2346-Tiktok IM Celebration and Govt Detrimental Notice-
2553-Tiktok audio aloof words in Tibet plight
2470-Tiktok commentary Uighur blocked note
TikTok Jap Comments Suppress Words
1710-TikTok-Twitter public conception keywords
1565-Tiktok Govt Affairs Media Subject Vocabulary
1688-Tik Tok Burmese Vocabulary
1283-Tiktok pedophile particular words
1420-Tiktok crew chat pornographic blockading vocabulary
1426-Tik Tok Porn Drainage Recognition Vocabulary
2186-Tiktok Single Male & Superstar Themed Vocabulary
2271-Tiktok Sensitive Person Theme Vocabulary (Title)
2272-Tiktok Sensitive Person Themed Vocabulary (Co-
2780-Tiktok push-aloof folks/prohibited audio
1507-Tiktok pretending to be a celeb establish checklist
2231-Tik Tok-Info impersonation-Particular List Coverage P-
2232-Tik Tok-Info impersonation-Political Media Fable
2234-Tiktok-Info impersonation-Strategic safety
120-TikTok Deepest Message Sensitive Words
2746-Tiktok Xiaoan Deepest Message Sensitive Notice Fil-
838-TikTok particular disclose—Comment overview first (p-
1585-TikTok Particular Events-Comment Chubby Overview Voc-
2109-Tiktok commentary bottom glossary
2548-TikTok Historical Nothingness Video Recall Vocab-
839-TIkTok Particular Match—Video First Overview (Prelim-
1576-TikTok Video Title No. 1 Itinerary Particular Vocabulary
2196-TikTok video title & nip overview + glossary unencumber
2416-Tiktok video title & nip unencumber + must-execute glossary
2561- TikTok prolonged video title & nip overview + glossary re-
2597-TikTok prolonged video title & nip unencumber + execute glossary
1807-TikTok Audio Overview Keyword Rapid
679-Tik Tok Sizzling Search Filter Words
84-Prohibited words for TikTok user nicknames
457-Tiktok company personal letter aloof words
635-Person Suggestions Filter Did/Uid-Tik Tok
946-Tiktok user nickname is just not counseled glossary
1075-Person Suggestions Filter Words-Tik Tok Purple Packet
1159-Tiktok user strategies keywords
1403-TikTok POI Blocking off Vocabulary (Process)
1455-Tiktok excessive-risk prohibited glossary (queue excessive-)
1490-Tiktok listing user strategies keywords
1542-TikTok-Tns user strategies keywords
1653-Tiktok Subject Graded Vocabulary (Deepest Overview)
1798-TT Suggestions Algorithm Matching Key phrases
1820-Tiktok poi No. 1 itinerary particular vocabulary
1843-Tiktok Are living No. 1 itinerary particular vocabulary
1981-Tik Tok Series Title Prohibited Words List (Co-
2139-Tiktok commercialization snapshot overview queue
2201-TikTok Particular Time Node Themed Vocabulary (T-
2202-TikTok Particular Time Node Themed Vocabulary-
2725-Vocabulary for TikTok particular dwelling titles to be rev-
2874-Tik Tok Emergency Response-Subject Computerized P-
(US 1707)DM – Grayscale take a look at wordlist
LISTS THAT TIKTOK CONFIRMED ARE USED ON TIKTOK FOR SAFETY REASONS:
1327-(US 570) MT_User Profile Tier1 Wordlist
577-MT_Search filter Words (Person)
1333-(US 577)MT_Search Tier1 Words (Person)
579-MT_Search ban Words (Hashtag)
1335-(US 579) MT_Search ban Words (Hashtag)
908-Hatespeeech PSA Ban Search (All)
1344-(US 908) Abominate. PSA Ban Search (All)
909-Hatespeech PSA Ban Search (Song
1345-(US 909) Hates PSA Ban Search (Song)
932 – tiktok-m customized push aloof vocabulary
990-Traditional PSA Ban Search List (All)
1348-(US 990) Traditional PSA Ban Search (ALL)
991-Traditional PSA Ban Search (Song)
1349-(US 991) Traditional PSA Ban Search (Song)
1041-Ban Search Suicide & Self Effort
1352-(US 1041) Ban Search Suicide Wordlist
1080-TT-Search Sug Auto-Elevate away Words
1353-(US 1080) Search Sug Sensitive Wordlist
1150-Person Suggestions-In a foreign nation Bulk Answer
1243-MT-DM Ban Words List
1275-MT-Search Sug Whitelist
(US 1293) NLP Pressing Beef up Words List
1322-(US) musical.ly_emergency words
1417-ED Ban Search List
1481-(US 1417) ED Ban Search List
1548-(US)DM Grownup wordlist-T1
1554-TT Anti-semitic Search Ban wordlist
2010-(US 1554) TT Anti-semitic Search Ban glossary
1700-TT Search Suicide T2 wordlist
1734-TT Distressing Search Reminder Wordlist
1737-(US 1734) TT Distressing Search Reminder Wordlist
1735-TT Terrible Search Reminder Wordlist
1738-(US 1735)TT Terrible Search Reminder Wordlist
1736-TT Anti wildlife trafficking Search wordlist
1739-(US 1736)TT Antiwildlife trafficking wordlist
1791-TT Project Survivor SearchBan wordlist
1792-(US 1791)TT Project Survivor Search Ban wordlist
1793-TT Sug Variants moderation note
1886-TT Search Help-Terrible challenges wordlist
1889-(US1886)-TT Search Help-Terrible challenges wordlist
1887-TikTok Search Help-SSH Hoaxes wordlist
1888-(US1887)TikTok Search Help-SSH Hoaxes wordlist
1975-TT Search Help-CSAM wordlist
1976-(US1975)TT Search Help-CSAM wordlist
2111-TT-Search Sug-Variant Seed Wordlist
(US 2193) TikTok counseled search inter-
2297-(US 2193)TikTok counseled search intervention wordlist
2291-SEA Personalized Candidate Words Sad Words List
2292-TikTok Person Profile Enqueue Wordlist
TikTok Search Consequence Layered Intervention
TikTok Q&A question of automated takedown
MT_Comment Auto-remove words
Trust+Security lists general to foremost social media companies
To give protection to social media users, most foremost social media platforms—along side TikTok—have insurance policies and tools geared in direction of filtering out negate and language that is violent, rude or unlawful.
908-Hatespeeech PSA Ban Search (All)
1041-Ban Search Suicide & Self Effort
1432-Bullying/harassment
1433-Misinformation/Media manipulation
1434-Terrorists & prison organizations
1439-Political
1623 – Kids’s Soft Porn OCR Vocabulary
child porn_one_level
child porn_two_level
2282-tf-idft note v2 9014 harassment
2284-tf-idf note v2 0914 misinformation
2286-tf-idft note v2 0914 suicide
2289-tf idf note v2 09114 misinformation recent technique
771 – Chubby textual negate overview of false negate
772 – Unsuitable negate title for overview
Miscellaneous lists within the machine
The which draw of many lists within the machine is unclear, and some seem like more technical or connected to the classic functioning of ByteDance apps. Silent, heaps of these lists occupy words that are “forbidden,” “banned,” “excessive-risk” or “aloof.”
1173-Shiny Monitoring Alarm Sensitive Vocabulary
1318-Forbidden glossary
1523-Forbidden Notice List-BR
2589-Deepest letter abduction vocabulary checklist
2284-High-risk phrases for emergency queues in live b-
excessive volatile words
1838-xxxx aloof words
1147-SMS activates aloof words
1260-G-Global Distribution Nickname Sensitive
849-Search Operation Push Notice Vocabulary
Suggestions Develop – Access Celebration Diversion
816-Person Suggestions Filter Place-X Project
812-X Project Person ID Banned Words
815-X Project Vocabulary (Row-inducing words)
1179-DMT Digital Vocabulary
1512-mt_web
1513-mt_transitive
658-M_ban
M-Overview
3032 – Incorrect Marketing
3033 – Marketing Sense
3036 – take a look at grayscale
3037-dry
2298-Are living commentary AB take a look at
3276 – Take a look at openAPI
peadar take a look at 15test take a look at take a look at take a look at take a look at take a look at
2285-tf-idf note v2 0914 ansa
This listing has been up up to now to supply extra commentary from TikTok.
Emily Baker-White contributed reporting.
MORE FROM FORBES
MORE FROM FORBESTikTok Guardian ByteDance’s ‘Sensitive Words’ Instrument Monitors Dialogue Of China, Trump, UyghursBy Alexandra S. LevineMORE FROM FORBESIndia Banned TikTok In 2020. TikTok Silent Has Access To Years Of Indians’ InfoBy Alexandra S. LevineMORE FROM FORBESTikTok’s China QuandaryBy Emily Baker-WhiteMORE FROM FORBESHow A TikTok Ban Would Deal A Blow To Creators, Agencies And The American EconomyBy Alexandra S. LevineMORE FROM FORBESSecurity Failures At TikTok’s Virginia Info Companies and products: Unescorted Company, Thriller Flash Drives And Illicit Crypto MiningBy Emily Baker-White