Subjective Scoring Framework for VQA Models in Autonomous Driving

Kaavya Rekanar, Abbirah Ahmed, Reenu Mohandas, Ganesh Sistu, Ciaran Eising, Martin Hayes

Research output: Contribution to journalArticlepeer-review

Abstract

The development of vision and language transformer models has paved the way for Visual Question Answering (VQA) models and related research. There are metrics to assess the general accuracy of VQA models but subjective assessment of the answers generated by the models is necessary to gain an in-depth understanding and a framework for subjective assessment is required. This work develops a novel scoring system based on the subjectivity of the question and analyses the answers provided by the model using multiple types of natural language processing models (bert-base-uncased, nli-distilBERT-base, all-mpnet-base-v2 and GPT-2) and sentence similarity benchmark metrics (Cosine Similarity). A case study detailing the use of the proposed subjective scoring framework on three prominent VQA models- ViLT, ViLBERT, and LXMERT using an automotive dataset is also presented. The framework proposed aids in analyzing the shortcomings of the discussed VQA models from a driving perspective and the results achieved help determine which model would work best when fine-tuned on a driving-specific VQA dataset.

Original languageEnglish
Pages (from-to)141306-141323
Number of pages18
JournalIEEE Access
Volume12
DOIs
Publication statusPublished - 2024

Keywords

  • Semantic analysis
  • VQA models
  • scoring framework
  • subjective assessment

Fingerprint

Dive into the research topics of 'Subjective Scoring Framework for VQA Models in Autonomous Driving'. Together they form a unique fingerprint.

Cite this