Vehicle Feature VQA: Visual Question Answering for Vehicle Feature

Authors

  • Pa Pa Tun
  • Khin Mar Soe

Keywords:

Visual Question Answering, Vehicle Feature, RestNet50, Convolutional Neural Networks, feature extraction, Long Short-Term Memory, Blue Score

Abstract

Visual Question Answering (VQA) can automatically produce the predict answers for questions and real-world images. In this paper, we propose the VQA dataset for Vehicle Feature to know the knowledge of Vehicle. We develop the VQA model using RestNet50 in Convolutional Neural Networks (CNN) for feature extraction of images and Long Short-Term Memory (LSTM) for question feature extraction and answer generation. The experimental result describes the training loss, evaluation loss, Blue Score, and VQA accuracy for epochs 20 and epochs 30. In epochs 20, after VQA model generated the training loss 1.1949 , evaluation loss 1.7953, Blue Score 0.6180, and VQA accuracy 0.0493, this model predicted the one correct answer for question and image. In epochs 30, the VQA model predicted the five correct answers in fifteen test data for vehicle feature questions and image according to generate the training loss 0.8780, evaluation loss 1.6634, Blue Score 0.6775 and VQA accuracy 0.0627.

Author Biographies

  • Pa Pa Tun

    Faculty of Computer Science, University of Computer Studies Yangon,Yangon, Myanmar

  • Khin Mar Soe

    Faculty of Computer Science, University of Computer Studies Yangon,Yangon, Myanmar

References

[1] S. Chowdhury, B. Soni, “eaVQA: An Experimental Analysis on Visual Question Answering Models”, in Proc. of the 18th International Conference on Natural Language Processing, 2021, pp. 550-554.

[2] Z.Wang, S. Ji, “Learning Convolutional Text Representations for Visual Question Answering”, in Proc. the 2018 SIAM International Conference on Data Mining, 2018, pp. 594-602.

[3] S. Antol et al., "VQA: Visual Question Answering," in Proc. 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 2425-2433.

[4] Y. Goyal, T. Khot, D. Summers-Stay, D. Batra and D. Parikh, "Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering," in Proc. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 6325-6334.

[5] J. Guo et al., "From Images to Textual Prompts: Zero-shot Visual Question Answering with Frozen Large Language Models," in Proc. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 2023, pp. 10867-10877.

[6] E. Borisova, N. Rauscher, G. Rehm, “SciVQA 2025: Overview of the First Scientific Visual Question Answering Shared Task”, in Proc. of the Fifth Workshop on Scholarly Document Processing (SDP 2025), Vienna, Austria, 2025, pp. 182-210.

[7] C. Zhou, G. Chen, X. Bai, M. Dong, “On the Human-level Performance of Visual Question Answering”, in Proc. of the 31st International Conference on Computational Linguistics, Abu Dhabi, UAE, 2025, pp. 4109-4113.

[8] Z. Zhang, “Enhanced Textual Feature Extraction for Visual Question Answering: A Simple Convolution Approach”, arXiv:2405.00479v2[online], pp.1-12, https://arxiv.org/html/2405.00479v2 [11.Nov.2024].

[9] Huynh, N.D., Bouadjenek, M.R., Aryal, S., Razzak, I. and Hacid, H. “Visual question answering: from early developments to recent advances--a survey”. arXiv preprint arXiv:2501.03939[on-line], https://arxiv.org/abs/2501.03939, [11.Jan.2025].

[10] S. Gautam, V. Thambawita, M. Riegler, P. Halvorsen, S. Hicks, “Medico 2025: Visual Question Answering for Gastrointestinal Imaging”, arXiv preprint arXiv:2508. 10869 [on-line], https://arxiv.org/abs/2508.10869 [14.Aug.2025].

[11] I. Allaouzi, M. B. Ahmed, B. Benamrou, “An Encoder-Decoder model for visual question answering in the medical domain”, CEUR-WS.org[on-line], vol. 2380, pp.124-132, https://ceur-ws.org/Vol-2380/paper_124.pdf [9-12 September 2019].

[12] R. Pal, S. Kar, D. K. Prasad, “NorVivqA: Visual Question Answering for Visually Impaired in Norwegian Language”, CEUR-WS.org [on-line], Vol.3975, pp.3-13, https://ceur-ws.org/Vol-3975/paper3.pdf [17-18, June 2025].

Downloads

Published

2026-02-21

Issue

Section

Articles

How to Cite

Pa Pa Tun, & Khin Mar Soe. (2026). Vehicle Feature VQA: Visual Question Answering for Vehicle Feature. International Journal of Computer (IJC), 57(1), 107-115. https://ijcjournal.org/InternationalJournalOfComputer/article/view/2484