Reproducing "Show, attend and tell: Neural image caption generation with visual attention"

Liu, Haixia; Brailsford, Tim

doi:10.1088/1742-6596/2589/1/012012

Reproducing "Show, attend and tell: Neural image caption generation with visual attention"

Liu, Haixia; Brailsford, Tim

Authors

Dr Haixia Liu Haixia.Liu@uwe.ac.uk
Senior Lecturer in Computer Science

Tim Brailsford Tim.Brailsford@uwe.ac.uk
Professor of Computer Science

Abstract

This paper replicates the experiment presented in the work of Xu et al. [1], and
examines errors in the generated captions. The analysis of the identified errors aims to provide
deeper insight into the underlying causes. This study also encompasses subsequent experiments
aiming at investigating the feasibility of rectifying these errors via a post-processing stage. Image
recognition and object detection models, as well as a language probability computational model
were explored. The findings presented in this paper aim to contribute towards the overarching
objective of Explainable Artificial Intelligence (XAI), thereby providing potential pathways to
improve image captioning.

Presentation Conference Type	Conference Paper (published)
Conference Name	The 16th International Conference on Computer and Electrical Engineering
Acceptance Date	Apr 30, 2023
Online Publication Date	Sep 13, 2023
Publication Date	Sep 13, 2023
Deposit Date	May 17, 2023
Publicly Available Date	Oct 3, 2023
Journal	Journal of Physics: Conference Series
Print ISSN	1742-6588
Electronic ISSN	1742-6596
Publisher	IOP Publishing
Peer Reviewed	Peer Reviewed
Volume	2589
Issue	1
Article Number	012012
Series ISSN	1742-6596
DOI	https://doi.org/10.1088/1742-6596/2589/1/012012
Public URL	https://uwe-repository.worktribe.com/output/10797312