Abstract
This paper presents an extractive summarization approach for multilingual clinical case reports submitted to the MultiClinSUM 2025 shared task. We focused on selecting the ten most important sentences from each report while preserving the original text to ensure factual consistency. Our method compares four extractive techniques: graph based, concept based, topic based and clustering based summarization, tested on English, Spanish, French and Portuguese. Our experiments show that the clustering based summarization using multilingual BERT consistently outperforms the other methods in all languages, with the strongest semantic similarity seen in English. This suggests that multilingual BERT embeddings are effective at capturing the central meaning of clinical texts across different languages.
| Original language | English |
|---|---|
| Pages (from-to) | 534-543 |
| Number of pages | 10 |
| Journal | CEUR Workshop Proceedings |
| Volume | 4038 |
| Publication status | Published - 2025 |
| Event | 26th Working Notes of the Conference and Labs of the Evaluation Forum, CLEF 2025 - Madrid, Spain Duration: 9 Sep 2025 → 12 Sep 2025 |
Keywords
- Clinical case reports
- Clinical text summarization
- Extractive summarization
- Multilingual text
- Sentence selection