TY - GEN
T1 - Overview of the clef 2008 multilingual question answering track
AU - Forner, Pamela
AU - Peñas, Anselmo
AU - Agirre, Eneko
AU - Alegria, Iñaki
AU - Forǎscu, Corina
AU - Moreau, Nicolas
AU - Osenova, Petya
AU - Prokopidis, Prokopis
AU - Rocha, Paulo
AU - Sacaleanu, Bogdan
AU - Sutcliffe, Richard
AU - Tjong Kim Sang, Erik
PY - 2009
Y1 - 2009
N2 - The QA campaign at CLEF 2008 [1], was mainly the same as that proposed last year. The results and the analyses reported by last year's participants suggested that the changes introduced in the previous campaign had led to a drop in systems' performance. So for this year's competition it has been decided to practically replicate last year's exercise. Following last year's experience some QA pairs were grouped in clusters. Every cluster was characterized by a topic (not given to participants). The questions from a cluster contained co-references between one of them and the others. Moreover, as last year, the systems were given the possibility to search for answers in Wikipedia as document corpus beside the usual newswire collection. In addition to the main task, three additional exercises were offered, namely the Answer Validation Exercise (AVE), the Question Answering on Speech Transcriptions (QAST), which continued last year's successful pilots, together with the new Word Sense Disambiguation for Question Answering (QA-WSD). As general remark, it must be said that the main task still proved to be very challenging for participating systems. As a kind of shallow comparison with last year's results the best overall accuracy dropped significantly from 42% to 19% in the multi-lingual subtasks, but increased a little in the monolingual sub-tasks, going from 54% to 63%.
AB - The QA campaign at CLEF 2008 [1], was mainly the same as that proposed last year. The results and the analyses reported by last year's participants suggested that the changes introduced in the previous campaign had led to a drop in systems' performance. So for this year's competition it has been decided to practically replicate last year's exercise. Following last year's experience some QA pairs were grouped in clusters. Every cluster was characterized by a topic (not given to participants). The questions from a cluster contained co-references between one of them and the others. Moreover, as last year, the systems were given the possibility to search for answers in Wikipedia as document corpus beside the usual newswire collection. In addition to the main task, three additional exercises were offered, namely the Answer Validation Exercise (AVE), the Question Answering on Speech Transcriptions (QAST), which continued last year's successful pilots, together with the new Word Sense Disambiguation for Question Answering (QA-WSD). As general remark, it must be said that the main task still proved to be very challenging for participating systems. As a kind of shallow comparison with last year's results the best overall accuracy dropped significantly from 42% to 19% in the multi-lingual subtasks, but increased a little in the monolingual sub-tasks, going from 54% to 63%.
UR - http://www.scopus.com/inward/record.url?scp=70549090211&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-04447-2_34
DO - 10.1007/978-3-642-04447-2_34
M3 - Conference contribution
AN - SCOPUS:70549090211
SN - 3642044468
SN - 9783642044465
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 262
EP - 295
BT - Evaluating Systems for Multilingual and Multimodal Information Access - 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008, Revised Selected Papers
T2 - 9th Workshop of the Cross-Language Evaluation Forum, CLEF 2008
Y2 - 17 September 2008 through 19 September 2008
ER -