Skip to content

We implement an encoder-decoder architecture using CNNs and sequential models to general image captions

Notifications You must be signed in to change notification settings

iafarhan/image-captioning-transduction-models

Repository files navigation

image-captioning-transduction-models

We implement RNNS(vanilla, LSTM) and use them to train a model that can generate novel captions for images.

We have used Microsoft COCO dataset which has become the standard testbed for image captioning. The dataset consists of 80,000 training images and 40,000 validation images, each annotated with 5 captions written by workers on Amazon Mechanical Turk.

MobileNet is used for image feature extraction. For vanilla RNN and LSTM, we use the pooled CNN feature activation.

Please check notebook file for detailed explanation of overall process.

About

We implement an encoder-decoder architecture using CNNs and sequential models to general image captions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published