Visual Question Answering through Modal Dialogue

We’re already seeing incredible applications of object detection in our daily lives. One such interesting application is Visual Question Answering. It is a new and upcoming problem in Computer Vision where the data consists of open-ended questions about images. In order to answer these questions, an effective system would need to have an understanding of “vision, language and common-sense.”

Before proceeding further, I would highly encourge you to quickly read the full VQA Post here.

Try it now on FloydHub

Click this button to open a Workspace on FloydHub that will train this model.

Do remember to execute run_me_first_floyd.sh inside a terminal everytime you restart your workspace to install relevant dependencies.

This post will first dig into the basic theory behind the Visual Question Answering task. Then, we’ll discuss and build two approaches to VQA: the “bag-of-words” and the “recurrent” model. Finally, we’ll provide a tutorial workflow for training your own models and setting up a REST API on FloydHub to start detecting objects in your own images. The project code is in Python (Keras + TensorFlow). You can view my experiments directly on FloydHub, as well as the code (along with the weight files and data) on Github.

Since I've already preprocessed the data & stored everything in a FloydHub dataset, here's what we're going to do -

Checkout the preprocessed data from the VQA Dataset.
Build & train two VQA models using Keras & Tensorflow.
Assess the models on the VQA validation sets.
Run the model to generate some really cool predictions.

Serving Models on FloydHub

I've created a separate repository here to serve models since it avoids the overhead of pushing the entire code/data in the training repo to Floyd over & over again.

For Offline Execution

The following are a couple of instructions that must be gone through in order to execute different (or all) sections of this project. You will need a NVIDIA GPU to train these models.

Clone the project, replacing VQAMD with the name of the directory you are creating:
```
 $ git clone https://github.com/sominwadhwa/vqa_floyd.git VQAMD
 $ cd VQAMD
```
Make sure you have python 3.5.x running on your local system. If you do, skip this step. In case you don't, head head here.
virtualenv is a tool used for creating isolated 'virtual' python environments. It is advisable to create one here as well (to avoid installing the pre-requisites into the system-root). Do the following within the project directory:
```
 $ [sudo] pip install virtualenv
 $ virtualenv --system-site-packages VQAMD
 $ source VQAMD/bin/activate
```

To deactivate later, once you're done with the project, just type deactivate.

Install the pre-requisites from requirements.txt & run tests/init.py to check if all the required packages were correctly installed:
```
 $ pip install -r requirements.txt
 $ bash run_me_first_on_floyd.sh
```

Contributing to VQA

I welcome contributions to this little project. If you have any new ideas or approaches that you'd like to incorporate here, feel free to open up an issue.

Please refer to each project's style guidelines and guidelines for submitting patches and additions. In general, we follow the "fork-and-pull" Git workflow.

Fork the repo VQAMD on GitHub
Clone the project to your own machine
Commit changes to your own branch
Push your work back up to your fork
Submit a Pull request so that we can review your changes

NOTE: Be sure to merge the latest from "upstream" before making a pull request!

Issues

Feel free to submit issues and enhancement requests.

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
examples		examples
preprocessed		preprocessed
src		src
.floydignore		.floydignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
baseline_mlp.json		baseline_mlp.json
floyd.yml		floyd.yml
keras.json		keras.json
lstm_structure.json		lstm_structure.json
requirements.txt		requirements.txt
res.txt		res.txt
run_me_first_on_floyd.sh		run_me_first_on_floyd.sh
test1.jpg		test1.jpg
test2.jpg		test2.jpg
test3.jpg		test3.jpg
vqa_baseline_mlp.ipynb		vqa_baseline_mlp.ipynb
vqa_lstm.ipynb		vqa_lstm.ipynb

License

sominw/vqamd_floyd

Folders and files

Latest commit

History

Repository files navigation

Visual Question Answering through Modal Dialogue

Try it now on FloydHub

Serving Models on FloydHub

For Offline Execution

Contributing to VQA

Issues

About

Topics

Resources

License

Stars

Watchers

Forks

Languages