Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Low accuracy when trianing your MAC model in clevr dataset #4

Open
xiaohythu opened this issue Dec 22, 2019 · 17 comments
Open

Low accuracy when trianing your MAC model in clevr dataset #4

xiaohythu opened this issue Dec 22, 2019 · 17 comments

Comments

@xiaohythu
Copy link

123

As the image shows, the training accuracy is 0.7 and the val accuracy is 0.549. I think both of the two accuracies are much lower than the MAC network in https://github.com/stanfordnlp/mac-network. Any instructions?

@xiaohythu
Copy link
Author

And I just follow your training command: scripts/train/mac_flatqa.sh --data_dir $DATA/sqoop-variety_1-repeats_30000 --checkpoint_path model.pt
--num_iterations 100000 and change only the feature dimension to [1024,14,14 ]

@rizar
Copy link
Owner

rizar commented Dec 22, 2019

How long have you been training the model?

@xiaohythu
Copy link
Author

How long have you been training the model?

As my running command shows, num_iterations is 100000

@xiaohythu
Copy link
Author

How long have you been training the model?

The training procedure lasts about 10 hours

@rizar
Copy link
Owner

rizar commented Dec 22, 2019

OK, I will run this experiment later today myself.

@xiaohythu
Copy link
Author

OK, I will run this experiment later today myself.

Thank you for your reply, waiting for your results

@rizar
Copy link
Owner

rizar commented Dec 23, 2019 via email

@rizar
Copy link
Owner

rizar commented Dec 23, 2019

While I am tinkering with my setup, could one of you try to run this experiment multiple (like 5) times, please?

@rizar
Copy link
Owner

rizar commented Dec 23, 2019

I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time?

@xiaohythu
Copy link
Author

I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time?
Did you change your setup,code or running command?

@xiaohythu
Copy link
Author

Still,I obtain the lower performance as I stated in the question. Maybe I need some detailed information about your training. Here my setup is CUDA10.1 and torch 1.3.1

@xiaohythu
Copy link
Author

Before running your MAC model,I utilize Resnet101 to extract features from Clevr dataset and convert them to . h5 file. Also I preprocess the questions. Is my way correct?

@xiaohythu
Copy link
Author

I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time?

Hi rizar!
I found that when I reproduce your mac model in clevr dataset. Such an error occured:
Traceback (most recent call last):
File "/home/xhy/systematic-generalization-sqoop-master/scripts/train_model.py", line 1271, in
main(args)
File "/home/xhy/systematic-generalization-sqoop-master/scripts/train_model.py", line 393, in main
train_loop(args, train_loader, val_loader)
File "/home/xhy/systematic-generalization-sqoop-master/scripts/train_model.py", line 530, in train_loop
for batch in train_loader:
File "/home/xhy/anaconda3/envs/sqoop1/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 264, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/xhy/anaconda3/envs/sqoop1/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 264, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/xhy/systematic-generalization-sqoop-master/vr/data.py", line 130, in getitem
program_json = self.program_converter.prefix_to_list(program_json_seq)
File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 109, in prefix_to_list
return self.tree_to_list(self.prefix_to_tree(program_prefix))
File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 105, in prefix_to_tree
return helper()
File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 103, in helper
'inputs': [helper() for _ in range(self.get_num_inputs(cur))],
File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 137, in get_num_inputs
return self._vocab['program_token_arity'][f]
KeyError: 'program_token_arity'

It seems that the clevr dataset is different from your sqoop dataset. Can you give me some instructions?

@xiaohythu
Copy link
Author

I have trained the MAC model in clevr dataset for more than 10 times. All the results are similar with what I mentioned in my question. I believe that you changed something in training but I did not! Need help

@rizar
Copy link
Owner

rizar commented Dec 27, 2019

I am sorry to hear the code doesn't work for you. For now all I can do is to give an extra info w.r.t the environment. I run the code in a Docker image that is based on "nvidia/cuda:9.1-cudnn7-devel-ubuntu16.04". I build the conda environment in the image. Here is the the output of conda list:

(sysgen) dzmitry@a574659fd138:/workspace$ conda list
# packages in environment at /home/dzmitry/miniconda2/envs/sysgen:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
blas                      1.0                         mkl  
ca-certificates           2019.11.27                    0  
certifi                   2019.11.28               py36_0  
cffi                      1.13.2           py36h2e261b9_0  
cuda90                    1.0                  h6433d27_0    pytorch
cudatoolkit               10.1.243             h6bb024c_0  
freetype                  2.9.1                h8a8886c_1  
h5py                      2.9.0            py36h7918eee_0  
hdf5                      1.10.4               hb1b8bf9_0  
intel-openmp              2019.4                      243  
jpeg                      9b                   h024ee3a_2  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_0  
mkl                       2019.4                      243  
mkl-service               2.3.0            py36he904b0f_0  
mkl_fft                   1.0.15           py36ha843d7b_0  
mkl_random                1.1.0            py36hd6b4f25_0  
ncurses                   6.1                  he6710b0_1  
ninja                     1.9.0            py36hfd86e86_0  
nmn-iwp                   0.1                       <pip>
numpy                     1.17.4           py36hc1035e2_0  
numpy-base                1.17.4           py36hde5b4d6_0  
olefile                   0.46                       py_0  
openssl                   1.1.1d               h7b6447c_3  
pillow                    6.2.1            py36h34e0f95_0  
pip                       19.3.1                   py36_0  
pycparser                 2.19                       py_0  
python                    3.6.9                h265db76_0  
pytorch                   1.3.1           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
readline                  7.0                  h7b6447c_5  
scipy                     1.3.2            py36h7c811a0_0  
setuptools                42.0.2                   py36_0  
six                       1.13.0                   py36_0  
sqlite                    3.30.1               h7b6447c_0  
termcolor                 1.1.0                    py36_1  
tk                        8.6.8                hbc83047_0  
torchvision               0.4.2                py36_cu101    pytorch
tqdm                      4.40.2                     py_0  
wheel                     0.33.6                   py36_0  
xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.3.7                h0b5b093_0  

I can give you more info on Monday.

@xiaohythu
Copy link
Author

As I mentioned in this issue,An error occured:
File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 137, in get_num_inputs
return self._vocab['program_token_arity'][f]
KeyError: 'program_token_arity'.
I guess the vocab.json of clevr is different from your sqoop dataset. How should I solve this?

@rizar
Copy link
Owner

rizar commented Jan 7, 2020

I have looked at both vocab.json files, and the both seem to have program_token_arity keys in them. Can you please tell me what keys you have in your vocab.json file and also where you got it from?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants