Abstract
The advent of generative artificial intelligence at the user level, particularly through the development of Large Language Models (LLMs), prompts us to reflect on the proliferation of biases in the construction, development, use, and representation of these models based on linguistic data. This article first reviews the initiatives developed for Spanish in the field of AI from Latin America and Spain, with special attention to linguistic resources and LLMs. The composition of the current major LLMs for Spanish is examined and compared with other LLMs for peninsular languages (Catalan, Basque, Galician, and Valencian). Subsequently, the term Digital Linguistic Bias (DLB)is introduced to identify the linguistic hybridity generated by AI both at an interlinguistic level (e.g., in relation to the English base used to train these models) and an intralinguistic level (in relation to the different varieties of the language). Finally, it is suggested that a digitally aware user can intervene mitigating the effects of the DLB. In conclusion, the need for coordinated action by institutional agents to preserve the diversity of the Spanish-speaking linguistic heritage in the development of LLMs is emphasized.
Translated title of the contribution | The Digital Linguistic Bias (DLB) in Artificial Intelligence: Implications for Large Language Models in Spanish |
---|---|
Original language | Spanish |
Pages (from-to) | 623-647 |
Number of pages | 25 |
Journal | Lengua y Sociedad. Revista de Lingüística Teórica y Aplicada |
Volume | 23 |
Issue number | 2 |
DOIs | |
Publication status | Published - 30 Dec 2024 |
Keywords
- Inteligencia Artificial
- Modelos de Lenguaje Masivos
- Sesgo Lingüístico Digital
- diversidad de la lengua
- variación dialectal
- español