Google Translate’s Latest Update Is Reshaping The World — for Better or Worse

COMMENTARY

Written by Marvin Nauendorff

Published July 2, 2024


If you have opened Google Translate in the past few days, you might have noticed a significant number of new languages added. Google Translate, owned by Alphabet (Google), had its biggest update to date, incorporating 110 new languages, including many minority and endangered ones. As one of the largest global corporations and with a near-monopoly in this field, Google is responsible for languages and their communities.

Before this update, Google Translate supported 133 languages, nearly doubling its offerings with this expansion. The new languages encompass over 614 million speakers, covering around 8% of the global population. The update includes both major world languages and those spoken by small indigenous communities, with about 25% of the new languages from Africa, such as Fon, Kikongo, Luo, Ga, Swati, Venda, and Wolof. In 2022, Google announced the 1,000 Languages Initiative, aiming to develop AI models to support the 1,000 most spoken languages worldwide.

Why This is Important

Most minority languages are in a constant position of marginalization and underrepresentation. Their inclusion in Google Translate helps to preserve these languages and make them more accessible to younger generations. This is particularly important as many minority languages are in danger of disappearing due to a lack of resources and attention. 

Adding minority languages to Google Translate can be a valuable resource for educational institutions and individuals who wish to learn and write in their native languages. Students, and researchers can use the tool to translate learning materials and documents, which will support literacy and education within these communities.

As with any language, the accuracy of Google Translate translations depends on the availability of sufficient data. For minority languages, which often have multiple varieties, it is particularly important to ensure quality training of the models. 

This can be challenging, but with the support of linguists and native speakers, the accuracy of translations will improve over time. One of the key aspects of the successful implementation of minority languages in Google Translate is the involvement of the communities that speak these languages. Community members can provide feedback, help with translations, and participate in developing language resources. This collaboration is essential for creating authentic and accurate translations. By using new technological tools, minority language speakers can become more involved in the digital age, access more resources, and participate in the global dialogue. This promotes social and economic development and creates new opportunities for young people in these communities. Adding languages like Tibetan, which have faced prolonged oppression, to Google Translate offers a measure of freedom. For Tibetans in exile, it provides a vital tool to maintain their native language in environments where the majority of people don't even know of its existence, helping them adapt and communicate more effectively.

However, not all communities want visibility and understanding in the digital space.

Interactive generative graphic displaying the 110 new languages on Google Translate. Created by Marvin Nauendorff, 2024.

Click here to try it out yourself.

Why It's Not That Easy

Adding languages to Google Translate is a step towards digital freedom and independence for many language communities, but it also faces some criticism. The Romani language, or rromani ćhib, an Indo-Aryan language spoken in Europe since the Middle Ages, has been the focus of efforts by activists and linguists for recovery and recognition. However, some Romani communities consider it a closed or "secret" language, preferring it to remain exclusive to Romani people. Speaking the language is seen as an integral part of Romani identity, and teaching it to non-Romani people is discouraged. Google has not addressed this controversy, though organizations like the European Roma Grassroots Organisations (ERGO) support the update and welcome it eagerly.

The Google Translate update arrived so quickly that many Romani communities did not have time to properly voice their opinions. Adding a language to Google Translate greatly influences its community, and it is crucial to listen to the communities’  perspectives.

The Technical Side

For this, Google used an AI model called PaLM 2, which stands for "Pathways Language Model 2." PaLM 2 can learn different languages by analyzing large amounts of text in those languages. One of its core features is zero-shot learning, which allows it to translate between languages even if it hasn't seen specific examples before. It uses patterns and similarities between languages to make educated guesses. PaLM 2 is particularly efficient at learning closely related languages. For example, it can quickly learn to translate between similar languages like Awadhi and Hindi or different French creoles.

 

Full List of Added Languages

Disclaimer: We strive to offer precise and culturally respectful information regarding the world's languages and language varieties. If you notice any inaccuracies or culturally insensitive content in this list, please let us know.

  1. Abkhaz is a Northwest Caucasian language spoken mainly in Abkhazia, where it is an official language, with about 190,000 speakers worldwide.

  2. Acehnese is an Austronesian language spoken by the Acehnese people in the Aceh region of Sumatra, Indonesia​.

  3. Acholi is a Luo Nilotic language spoken by the Acholi people in northern Uganda and South Sudan​​.

  4. Afar is a Cushitic language spoken by the Afar people in Djibouti, Eritrea, and Ethiopia​​.

  5. Alur is a Southern Luo dialect spoken by the Alur people in northwestern Uganda and northeastern Democratic Republic of the Congo​.

  6. Avar is a Northeast Caucasian language spoken by the Avar people in Dagestan, Russia​.

  7. Awadhi is an Indo-Aryan language spoken in the Awadh region of Uttar Pradesh, India​​.

  8. Balinese is an Austronesian language spoken by the Balinese people on the Indonesian island of Bali​.

  9. Baluchi is an Iranian language spoken by the Baloch people in Pakistan, Iran, and Afghanistan​​.

  10. Baoulé is a Kwa language spoken by the Baoulé people in Côte d'Ivoire​​.

  11. Bashkir is a Turkic language spoken by the Bashkir people in Bashkortostan, Russia​.

  12. Batak Karo Karo is an Austronesian language spoken by the Karo people in North Sumatra, Indonesia​.

  13. Batak Simalungun is an Austronesian language spoken by the Simalungun people in North Sumatra, Indonesia​​.

  14. Batak Toba is an Austronesian language spoken by the Toba Batak people in North Sumatra, Indonesia​​.

  15. Bemba is a Bantu language spoken by the Bemba people in Zambia​.

  16. Betawi is a Malay-based creole language spoken by the Betawi people in Jakarta, Indonesia​​.

  17. Bikol is an Austronesian language spoken in the Bicol Region of the Philippines​.

  18. Breton is a Celtic language spoken in Brittany, France.

  19. Buryat is a Mongolic language spoken by the Buryat people in Russia, Mongolia, and China.

  20. Cantonese is a Sinitic language spoken in several parts of Mainland China, Hong Kong, Macau, and among overseas Chinese communities​.

  21. Chamorro is an Austronesian language spoken by the Chamorro people in Guam and the Northern Mariana Islands​.

  22. Chechen is a Northeast Caucasian language spoken by the Chechen people in Chechnya, Russia​​.

  23. Chuukese is an Austronesian language spoken in the Chuuk State of the Federated States of Micronesia​​.

  24. Chuvash is a Turkic language spoken by the Chuvash people in the Chuvash Republic, Russia​.

  25. Crimean Tatar is a Turkic language spoken by the Crimean Tatars in Crimea and diaspora communities​.

  26. Dari, also known as Afghan Persian, is one of the two official languages of Afghanistan​​.

  27. Dinka is a Nilotic language spoken by the Dinka people in South Sudan​.

  28. Dombe is a language spoken in the Democratic Republic of the Congo; however, specific details on its classification and speakers are limited​.

  29. Dyula is a Mande language spoken in Côte d'Ivoire, Burkina Faso, and Mali​.

  30. Dzongkha is the national language of Bhutan, belonging to the Sino-Tibetan language family​.

  31. Faroese is a North Germanic language spoken in the Faroe Islands.

  32. Fijian is an Austronesian language spoken in Fiji, where it is an official language​.

  33. Fon is a Gbe language spoken mainly in Benin​.

  34. Friulian is a Romance language spoken in the Friuli region of northeastern Italy.

  35. Fulani, also known as Fula, is a Niger-Congo language spoken by the Fulani people across West Africa.

  36. Ga is a Kwa language spoken in and around Accra, Ghana​.

  37. Hakha Chin is a Kuki-Chin language spoken in Chin State, Myanmar​.

  38. Hiligaynon is an Austronesian language spoken in the Western Visayas region of the Philippines​.

  39. Hunsrik is a Germanic language spoken by the Hunsrückisch people in southern Brazil​​.

  40. Iban is an Austronesian language spoken by the Iban people in Malaysia and Indonesia​​.

  41. Jamaican Patois is an English-based creole language spoken in Jamaica​.

  42. Jingpo is a Sino-Tibetan language spoken by the Jingpo people in Myanmar and China​.

  43. Kalaallisut is the Greenlandic language spoken by the Inuit people in Greenland​.

  44. Kanuri is a Saharan language spoken by the Kanuri people in Nigeria, Niger, Chad, and Cameroon.

  45. Kapampangan is an Austronesian language spoken in the Pampanga province of the Philippines.

  46. Khasi is an Austroasiatic language spoken in the Meghalaya state of India​.

  47. Kiga, also known as Chiga, is a Bantu language spoken by the Kiga people in Uganda​.

  48. Kikongo is a Bantu language spoken in the Democratic Republic of the Congo, Republic of the Congo, and Angola​.

  49. Kituba is a creole language based on Kikongo, spoken in the Democratic Republic of the Congo and Republic of the Congo​​.

  50. Kokborok is a Sino-Tibetan language spoken by the Tripuri people in the Indian state of Tripura​.

  51. Komi is a Uralic language spoken by the Komi people in Russia​.

  52. Latgalian is a Baltic language spoken in the Latgale region of Latvia​.

  53. Ligurian is a Romance language spoken in the Liguria region of Italy​.

  54. Limburgish is a West Germanic language spoken in the Limburg region of the Netherlands and Belgium.

  55. Lombard is a Romance language spoken in the Lombardy region of Italy.

  56. Luo is a Nilotic language spoken by the Luo people in Kenya and Tanzania.

  57. Madurese is an Austronesian language spoken on the island of Madura and in parts of East Java, Indonesia​.

  58. Makassar is an Austronesian language spoken in the South Sulawesi region of Indonesia​.

  59. Malay (Jawi): Malay written in the Jawi script is used in Brunei, Malaysia, Indonesia, and southern Thailand​.

  60. Mam is a Mayan language spoken by the Mam people in Guatemala and Mexico​​.

  61. Manx is a Goidelic Celtic language spoken historically on the Isle of Man, which became extinct as a first language in 1974 but has since been revived and is now spoken as a second language by a small number of people​​.

  62. Marshallese: Kajin M̧ajel‌̧ is an Austronesian language spoken in the Marshall Islands, consisting of two major dialects, Ralik and Ratak​.

  63. Marwadi, also known as Marwari, is a Rajasthani language spoken in the Indian state of Rajasthan, known for its rich literary tradition​​.

  64. Mauritian Creole is a French-based creole language spoken in Mauritius, which developed from the contact between French colonizers and enslaved Africans​.

  65. Meadow Mari is a Uralic language spoken by the Mari people in the Mari El Republic, Russia​.

  66. Minang, or Minangkabau, is an Austronesian language spoken by the Minangkabau people in West Sumatra, Indonesia, and recognized for its matrilineal culture​.

  67. Nahuatl (Eastern Huasteca): Eastern Huasteca Nahuatl is a variety of Nahuatl spoken in the Huasteca region of Mexico, part of the Uto-Aztecan language family​.

  68. Ndau is a Bantu language spoken by the Ndau people in Mozambique and Zimbabwe​​.

  69. Ndebele (South): Southern Ndebele is a Bantu language spoken by the Ndebele people in South Africa​​.

  70. Nepalbhasa (Newari): Nepalbhasa, also known as Newari, is a Sino-Tibetan language spoken by the Newar people in the Kathmandu Valley of Nepal.

  71. N'Ko: N'Ko is a script devised for the Manding languages of West Africa, particularly used for writing the Bambara, Mandinka, and Dioula languages.

  72. Nuer is a Nilotic language spoken by the Nuer people in South Sudan and Ethiopia​​.

  73. Occitan is a Romance language spoken in southern France, Italy's Occitan Valleys, Monaco, and the Aran Valley in Spain​.

  74. Ossetian is an Eastern Iranian language spoken by the Ossetian people in the Caucasus region, primarily in North Ossetia-Alania (Russia) and South Ossetia (Georgia).

  75. Pangasinan is an Austronesian language spoken in the Pangasinan province of the Philippines​​.

  76. Papiamento is a creole language spoken in the Caribbean islands of Aruba, Bonaire, and Curaçao, combining elements of Spanish, Portuguese, Dutch, African languages, and Arawakan.

  77. Portuguese (Portugal) is a Romance language spoken in Portugal and its former colonies, known for its global influence and extensive literature​.

  78. Punjabi (Shahmukhi): Punjabi in Shahmukhi script is written in a variant of the Persian script and is used primarily by Punjabi speakers in Pakistan.

  79. Q'eqchi' is a Mayan language spoken by the Q'eqchi' people in Guatemala and Belize​​.

  80. Romani is an Indo-Aryan language spoken by the Romani people across Europe and the Americas, with numerous dialects reflecting its diverse speaker population​.

  81. Rundi, also known as Kirundi, is a Bantu language spoken in Burundi and neighboring countries​.

  82. Sami (North): Northern Sami is a Uralic language spoken by the Sami people in northern Norway, Sweden, and Finland​​.

  83. Sango is a creole language based on the Ngbandi language, spoken in the Central African Republic as a national language.

  84. Santali is a Munda language spoken by the Santal people in India, Bangladesh, and Nepal​.

  85. Seychellois Creole, also known as Seselwa, is a French-based creole language spoken in the Seychelles​.

  86. Shan is a Tai language spoken by the Shan people in Myanmar, Thailand, and China​​.

  87. Sicilian is a Romance language spoken on the island of Sicily and in parts of southern Italy​.

  88. Silesian is a West Slavic language spoken by the Silesian people in Poland and the Czech Republic.

  89. Susu is a Mande language spoken by the Susu people in Guinea and Sierra Leone​​.

  90. Swati, or siSwati, is a Bantu language spoken by the Swazi people in Eswatini and South Africa​.

  91. Tahitian is an Austronesian language spoken in French Polynesia.

  92. Tamazight: refers to the Berber languages spoken by the Berber people in North Africa​​.

  93. Tamazight (Tifinagh): Tamazight written in the Tifinagh script is used for writing Berber languages, especially in Morocco​.

  94. Tetum is an Austronesian language spoken in East Timor and parts of Indonesia​​.

  95. Tibetan is a Sino-Tibetan language spoken by the Tibetan people, with several dialects and a rich literary tradition​​.

  96. Tiv is a Bantu language spoken by the Tiv people in Nigeria​.

  97. Tok Pisin is an English-based creole language spoken in Papua New Guinea​​.

  98. Tongan is an Austronesian language spoken in Tonga​.

  99. Tswana, or Setswana, is a Bantu language spoken by the Tswana people in Botswana and South Africa​​.

  100. Tulu is a Dravidian language spoken by the Tulu people in the southwestern part of Karnataka, India.

  101. Tumbuka is a Bantu language spoken in Malawi, Zambia, and Tanzania.

  102. Tuvan is a Turkic language spoken by the Tuvan people in the Republic of Tuva, Russia​​.

  103. Udmurt is a Uralic language spoken by the Udmurt people in Russia​.

  104. Venda, or Tshivenda, is a Bantu language spoken by the Venda people in South Africa​.

  105. Venetian is a Romance language spoken in the Veneto region of Italy​​.

  106. Waray is an Austronesian language spoken in the Eastern Visayas region of the Philippines​​.

  107. Wolof is a Niger-Congo language spoken in Senegal, Gambia, and Mauritania.

  108. Yakut, or Sakha, is a Turkic language spoken by the Yakut people in the Sakha Republic, Russia​​.

  109. Yucatec Maya is a Mayan language spoken by the Maya people in the Yucatán Peninsula of Mexico.

  110. Zapotec is an Oto-Manguean language spoken by the Zapotec people in Oaxaca, Mexico​.

 

Written by

Marvin Nauendorff

Imagery

Marvin Nauendorff

Edited by

Alice Pol

Cite this Article

Nauendorff, Marvin. 2024. "Google Translate’s Latest Update Is Reshaping The World – for Better or Worse." Linguaphile Magazine, July 2, 2024. https://www.linguaphilemagazine.org/editorial/google-translate-update.

 

Bibliography

  1. Catalan News. 2024. "Recovering the Romani Identity Through Language." Accessed June 30, 2024. https://www.catalannews.com/society-science/item/recovering-the-romani-identity-through-language.

  2. ERGO Network. 2024. "Romani." LinkedIn. Accessed June 30, 2024. https://www.linkedin.com/posts/ergo-network_romani-activity-7212428163358863360-y2jJ?utm_source=share&utm_medium=member_desktop.

  3. Google. 2024. "Google Translate Adds New Languages." Google Blog. Accessed June 30, 2024. https://blog.google/intl/en-in/google-translate-new-languages-2024/.

  4. Google. 2024. "Google Translate." Accessed June 30, 2024. https://translate.google.com/.

  5. Google. 2024. "Translate Help." Google Support. Accessed June 30, 2024. https://support.google.com/translate/answer/15139004.

  6. Google. 2024. "3 ways AI is scaling helpful technologies worldwide." Google Blog. Accessed June 30, 2024. https://blog.google/technology/ai/ways-ai-is-scaling-helpful/.

  7. Szalai, Andrea. 2006. "Romani: A Linguistic Introduction, Yaron Matras. Cambridge University Press, Cambridge (2002), (pp. viii–xiii and 1–291, hardback, ISBN 0 521 63165 3)." Lingua 116: 2238–2253. https://doi.org/10.1016/j.lingua.2005.03.010.

  8. @florida.florian. 2024. "Romani on Google Translate? #romani #language #linguistics #culture #gypsy #googletranslate #roma." YouTube video, June 28, 2024. https://www.youtube.com/watch?v=QcInDX8bfR4.

Previous
Previous

TE REO MĀORI: From Suppression to Celebration

Next
Next

手話という言語:私の人生にもたらしたもの