El Sesgo Lingüístico Digital (SLD) en la inteligencia artificial: implicaciones para los modelos de lenguaje masivos en español

Translated title of the contribution: The Digital Linguistic Bias (DLB) in Artificial Intelligence: Implications for Large Language Models in Spanish

Javier Muñoz-Basols, Maria del Mar Palomares Marin, Francisco Moreno Hernández

Research output: Contribution to journalArticlepeer-review

Abstract

The advent of generative artificial intelligence at the user level, particularly through the development of Large Language Models (LLMs), prompts us to reflect on the proliferation of biases in the construction, development, use, and representation of these models based on linguistic data. This article first reviews the initiatives developed for Spanish in the field of AI from Latin America and Spain, with special attention to linguistic resources and LLMs. The composition of the current major LLMs for Spanish is examined and compared with other LLMs for peninsular languages (Catalan, Basque, Galician, and Valencian). Subsequently, the term Digital Linguistic Bias (DLB)is introduced to identify the linguistic hybridity generated by AI both at an interlinguistic level (e.g., in relation to the English base used to train these models) and an intralinguistic level (in relation to the different varieties of the language). Finally, it is suggested that a digitally aware user can intervene mitigating the effects of the DLB. In conclusion, the need for coordinated action by institutional agents to preserve the diversity of the Spanish-speaking linguistic heritage in the development of LLMs is emphasized.

Translated title of the contributionThe Digital Linguistic Bias (DLB) in Artificial Intelligence: Implications for Large Language Models in Spanish
Original languageSpanish
Pages (from-to)623-647
Number of pages25
JournalLengua y Sociedad. Revista de Lingüística Teórica y Aplicada
Volume23
Issue number2
DOIs
Publication statusPublished - 30 Dec 2024

Keywords

  • Inteligencia Artificial
  • Modelos de Lenguaje Masivos
  • Sesgo Lingüístico Digital
  • diversidad de la lengua
  • variación dialectal
  • español

Fingerprint

Dive into the research topics of 'The Digital Linguistic Bias (DLB) in Artificial Intelligence: Implications for Large Language Models in Spanish'. Together they form a unique fingerprint.

Cite this