Skip to content

pengbohua/VQGDuoLinguo

Repository files navigation

Visual Question Generation

This repo will soon adapt to VQG task


Contents


Setup

Install the COCO Python API , for data preparation.


Dataset

Given the VQA Dataset's annotations & questions file, generates a dataset file (.txt) in the following format:

image_name \t question \t answer

  • image_name is the image file name from the COCO dataset
  • question is a comma-separated sequence
  • answer is a string (label)

Sample Execution:

$ python3 prepare_data.py --balanced_real_images -s val \
-a ./Data/raw/v2_mscoco_val2014_annotations.json \
-q ./Data/raw/v2_OpenEnded_mscoco_val2014_questions.json \
-o ./Data/processed/helper_val2014.txt \
-v ./Data/processed/vocab_count_5_K_1000.pickle -c 5 -K 1000  # vocab flags (for training set)

Stores the dataset file in the output directory -o and the corresponding vocab file -v.
For validation/test sets, remove the vocabulary flags: -v, -c, -K.


Architecture

Baseline

The architecture can be summarized as:-

Image --> CNN_encoder --> image_embedding
Question --> LSTM_encoder --> question_embedding

(image_embedding * question_embedding) --> MLP_Classifier --> answer_logit

Baseline


Hierarchical Co-Attention

The architecture can be summarized as:-

Image --> CNN_encoder --> image_embedding
Question --> Word_Emb --> Phrase_Conv_MaxPool --> Sentence_LSTM --> question_embedding

ParallelCoAttention( image_embedding, question_embedding ) --> MLP_Classifier --> answer_logit

Parallel


Training

Run the following script for training:

$ python3 main.py --mode train --expt_name K_1000_Attn --expt_dir ./results_log \
--train_img ./Data/raw/train2014 --train_file./Data/processed/vqa_train2014.txt \
--val_img ./Data/raw/val2014 --val_file ./Data/processed/vqa_val2014.txt\
--vocab_file ./Data/processed/vocab_count_5_K_1000.pickle --save_interval 1000 \
--log_interval 100 --gpu_id 0 --num_epochs 50 --batch_size 160 -K 1000 -lr 1e-4 --opt_lvl 1 --num_workers 6 \
--run_name O1_wrk_6_bs_160 --model attention

Specify --model_ckpt (filename.pth) to load model checkpoint from disk (resume training/inference)

Select the architecture by using --model ('baseline', 'attention').

Note: Setting num_cls (K) = 2 is equivalent to 'yes/no' setup.
For K > 2, it is an open-ended set.

TODO List


  • Baseline & HieCoAttn
  • VQA w/ BERT
  • Attention Visualization

References

[1] VQA: Visual Question Answering
[2] [Hierarchical Question-Image Co-Attention for Visual Question Answering]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages