Deep Learning based Automatic Image Caption Generation

Written by

2/ Sector, Sector: Economy, Sector: Media, Topic: AI Tech, Type: Research, Varsha Kesavan, M.Sc.

The paper aims at generating automated captions by learning the contents of the image. At present images are annotated with human intervention and it becomes nearly impossible task for huge commercial databases. The image database is given as input to a deep neural network (Convolutional Neural Network (CNN)) encoder for generating “thought vector” which extracts the features and nuances out of our image and RNN (Recurrent Neural Network) decoder is used to translate the features and objects given by our image to obtain sequential, meaningful description of the image. In this paper, we systematically analyze different deep neural network-based image caption generation approaches and pretrained models to conclude on the most efficient model with fine-tuning. The analyzed models contain both with and without `attention’ concept to optimize the caption generating ability of the model.

More Information

Deep Learning based Automatic Image Caption Generation

More posts

Aula in the News @ Davos 2026: Swissinfo

Sponsor a Hard Question on AI in Society

Media booking with Tammy

Deep Learning based Automatic Image Caption Generation

More posts

Aula in the News @ Davos 2026: Swissinfo

Sponsor a Hard Question on AI in Society

Media booking with Tammy

Aula Convening Guidelines 2025 Ed.