How geoTLDs promote European minority languages online
- 2 days ago
- 4 min read
The digital world reflects our global diversity, but not equally—while a few languages dominate online, countless others struggle for representation. GeoTLDs can help! For regions with minority languages, having dedicated spaces online where these languages can be used, protected, and shared, is a celebration of human communication. Read on as we explore the data behind languages on the web, together with Dr. Daniel Pimienta and his research.
Around two years ago, we wrote a blog article about languages on the web. Dr Daniel Pimienta, a language researcher, reached out to find out more about the data behind that article and how we determine languages on the web: his enthusiasm for language data led to a fantastic collaboration! Daniel has since dug deep in our database, extracting all kinds of insights. Based on his report and recently published research, we’re sharing Daniel’s findings about European minority languages on the internet.
Online languages across the world
Depending on your global location, you may mostly be presented with content in your own language—thanks to geotracking—and this is certainly true for most dominant languages like English, Spanish, or Mandarin Chinese. But what if you speak a minority language? Can you access digital content in that language? Well, based on Dr. Pimienta’s research, it depends on which minority language you’re looking for.
Previously, we showed that English is the dominant primary language online, with approximately 51% of websites created in English. This prominence is particularly striking when juxtaposed against other major languages: Chinese, for instance, accounted for roughly 10% of websites, despite being the most spoken language worldwide. German followed with 7%, while Spanish and Japanese each comprised about 4% of online content.
Another approach to measuring languages on websites involves calculating the percentage of languages used across all webpages. This method considers the fact that many websites incorporate multiple languages beyond just their primary one. By focusing on the overall distribution of languages across content, this metric offers a different perspective on the multilingual nature of the web, potentially resulting in a different proportion of language representation.
While the internet offers a platform for countless voices, many languages struggle for representation.
Yiddish, Pashto, Laothian, Amharic, and Punjabi are among the least common languages online.
This scarcity is concerning, especially for languages UNESCO classifies as 'endangered', raising concerns about preservation and transmission to future generations.
GeoTLDs: carving digital niches for local languages
In an effort to promote cultural and linguistic communities online, geographical top-level domains (geoTLDs) were introduced in 2012. These domains aim to provide localized digital identities, fostering a sense of belonging and representation.
For the present analysis, we focus on 10 European geoTLDs that have a unique language associated with them, beyond just cultural and geographical boundaries. Specifically, the analysis focuses on two aspects: the representation of the primary language as part of the TLD and the prevalence of multilingualism, i.e., whether a website is offered in more than one language.
We see in Figure 1 that .eus, the TLD associated with the Basque Country in Spain, has the highest share of multilingualism. This means that all the websites are offered in more than one language—as you would expect, Spanish and Basque are the most common offering, but not exclusively. Next in line for multilingualism are .gal, representative of Galicia, and .cymru, one of the TLDs for Wales, with 60% and 63% of websites being offered in multiple languages, respectively. On the other end, .frl (associated with Friesland), .irish, and .scot exhibit the lowest share of multilingual websites.
Next, the analysis investigated the correlation between the geoTLDs and the availability of associated websites in their associated minority languages. The three TLDs with top representation of the language are .cat, .gal, and .cymru, showing 72% of websites offered in Catalan, 66% offered in Galician and 45% offered in Welsh, respectively. In contrast, the .scot domain sees a mere 1% of its websites using Scottish Gaelic. Ultimately, this highlights that the impact of geoTLDs on language promotion is variable, and is likely shaped by the level of community engagement and the presence of supportive policies.

Finally, the research examined how well the TLD is represented geographically. It essentially looked at how the overall size of the TLD represents its people (accounting for population), and represents multilingualism, considering the number of people speaking more than one language as well as the number of people speaking the primary language. The table in Figure 2 summarizes the findings. Both .eus and .cat perform best in terms of multilingualism and promoting their respective languages but are moderately represented given the number of speakers. In contrast, .corsica is well represented online given its overall (speaker) population, but scores low on multilingualism representation of the Corsu language.

Why Linguistic Diversity Online Matters
The internet is a living, evolving entity that shapes and is shaped by its users. Linguistic diversity online supports cultural preservation and inclusivity, enabling individuals to access information in their native or new languages. While geoTLDs offer an opportunity toward greater linguistic diversity online, significant differences in the way linguistic diversity is promoted remains. The dominance of a few major languages often sidelines others, creating a more uniform digital environment. Addressing this requires coordinated action: governments can enact policies to support minority languages in digital spaces; developers can build tools that make multilingual content creation easier; and communities themselves can drive change by producing and promoting content in underrepresented languages, ensuring these voices are not lost in the online landscape.
The internet's linguistic landscape is a reflection of our world's diversity and disparities. While English and a handful of other languages dominate, countless others strive for recognition and representation. By acknowledging these imbalances and championing initiatives that promote linguistic inclusivity, we can aspire towards a digital world that truly mirrors the rich tapestry of human language and culture.

This blog post was written in collaboration with Dr. Daniel Pimienta founder of the Observatory of the Linguistic and Cultural Diversity on the Internet (OBDILCI). As the current head of OBDILCI, Dr. Pimienta develops indicators that measure the presence of various languages on the web. His work challenges biases in language data and advocates for multilingualism in digital spaces. For more information on the current study and methods, access the publication here.