Low accuracy when trianing your MAC model in clevr dataset #4

xiaohythu · 2019-12-22T07:39:14Z

As the image shows, the training accuracy is 0.7 and the val accuracy is 0.549. I think both of the two accuracies are much lower than the MAC network in https://github.com/stanfordnlp/mac-network. Any instructions?

xiaohythu · 2019-12-22T07:44:53Z

And I just follow your training command: scripts/train/mac_flatqa.sh --data_dir $DATA/sqoop-variety_1-repeats_30000 --checkpoint_path model.pt
--num_iterations 100000 and change only the feature dimension to [1024,14,14 ]

rizar · 2019-12-22T14:52:04Z

How long have you been training the model?

xiaohythu · 2019-12-22T14:58:22Z

How long have you been training the model?

As my running command shows, num_iterations is 100000

xiaohythu · 2019-12-22T15:00:26Z

How long have you been training the model?

The training procedure lasts about 10 hours

rizar · 2019-12-22T15:02:41Z

OK, I will run this experiment later today myself.

xiaohythu · 2019-12-22T15:03:34Z

OK, I will run this experiment later today myself.

Thank you for your reply, waiting for your results

rizar · 2019-12-23T13:49:26Z

I am working on it. I presume I broke the model at some point, or maybe a PyTorch change is to blame. If I don't find the issue today, this will have to wait until January though.

…

On Sun, 22 Dec 2019 at 23:37, songyy14 ***@***.***> wrote: OK, I will run this experiment later today myself. Any updates? I also miss the same problem — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=AAE7YYRZM24EB5NJFLLWMZTQ2A6BPA5CNFSM4J6KSQ62YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHQFXWQ#issuecomment-568351706>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAE7YYTOLBLOOXBTJYMLRXDQ2A6BPANCNFSM4J6KSQ6Q> .

rizar · 2019-12-23T14:50:49Z

While I am tinkering with my setup, could one of you try to run this experiment multiple (like 5) times, please?

rizar · 2019-12-23T16:07:31Z

I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time?

xiaohythu · 2019-12-24T03:20:36Z

I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time?
Did you change your setup,code or running command?

xiaohythu · 2019-12-24T11:52:40Z

Still，I obtain the lower performance as I stated in the question. Maybe I need some detailed information about your training. Here my setup is CUDA10.1 and torch 1.3.1

xiaohythu · 2019-12-24T13:34:10Z

Before running your MAC model，I utilize Resnet101 to extract features from Clevr dataset and convert them to . h5 file. Also I preprocess the questions. Is my way correct？

xiaohythu · 2019-12-24T17:45:07Z

I could not reproduce your issue. I have just trained 10 models, and they all worked fine. Can you please try running the experiment many times and tell me if the issue occurs all the time, or some of the time?

Hi rizar!
I found that when I reproduce your mac model in clevr dataset. Such an error occured:
Traceback (most recent call last):
File "/home/xhy/systematic-generalization-sqoop-master/scripts/train_model.py", line 1271, in
main(args)
File "/home/xhy/systematic-generalization-sqoop-master/scripts/train_model.py", line 393, in main
train_loop(args, train_loader, val_loader)
File "/home/xhy/systematic-generalization-sqoop-master/scripts/train_model.py", line 530, in train_loop
for batch in train_loader:
File "/home/xhy/anaconda3/envs/sqoop1/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 264, in next
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/xhy/anaconda3/envs/sqoop1/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 264, in
batch = self.collate_fn([self.dataset[i] for i in indices])
File "/home/xhy/systematic-generalization-sqoop-master/vr/data.py", line 130, in getitem
program_json = self.program_converter.prefix_to_list(program_json_seq)
File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 109, in prefix_to_list
return self.tree_to_list(self.prefix_to_tree(program_prefix))
File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 105, in prefix_to_tree
return helper()
File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 103, in helper
'inputs': [helper() for _ in range(self.get_num_inputs(cur))],
File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 137, in get_num_inputs
return self._vocab['program_token_arity'][f]
KeyError: 'program_token_arity'

It seems that the clevr dataset is different from your sqoop dataset. Can you give me some instructions?

xiaohythu · 2019-12-26T09:36:48Z

I have trained the MAC model in clevr dataset for more than 10 times. All the results are similar with what I mentioned in my question. I believe that you changed something in training but I did not! Need help

rizar · 2019-12-27T23:45:33Z

I am sorry to hear the code doesn't work for you. For now all I can do is to give an extra info w.r.t the environment. I run the code in a Docker image that is based on "nvidia/cuda:9.1-cudnn7-devel-ubuntu16.04". I build the conda environment in the image. Here is the the output of conda list:

(sysgen) dzmitry@a574659fd138:/workspace$ conda list
# packages in environment at /home/dzmitry/miniconda2/envs/sysgen:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main  
blas                      1.0                         mkl  
ca-certificates           2019.11.27                    0  
certifi                   2019.11.28               py36_0  
cffi                      1.13.2           py36h2e261b9_0  
cuda90                    1.0                  h6433d27_0    pytorch
cudatoolkit               10.1.243             h6bb024c_0  
freetype                  2.9.1                h8a8886c_1  
h5py                      2.9.0            py36h7918eee_0  
hdf5                      1.10.4               hb1b8bf9_0  
intel-openmp              2019.4                      243  
jpeg                      9b                   h024ee3a_2  
libedit                   3.1.20181209         hc058e9b_0  
libffi                    3.2.1                hd88cf55_4  
libgcc-ng                 9.1.0                hdf63c60_0  
libgfortran-ng            7.3.0                hdf63c60_0  
libpng                    1.6.37               hbc83047_0  
libstdcxx-ng              9.1.0                hdf63c60_0  
libtiff                   4.1.0                h2733197_0  
mkl                       2019.4                      243  
mkl-service               2.3.0            py36he904b0f_0  
mkl_fft                   1.0.15           py36ha843d7b_0  
mkl_random                1.1.0            py36hd6b4f25_0  
ncurses                   6.1                  he6710b0_1  
ninja                     1.9.0            py36hfd86e86_0  
nmn-iwp                   0.1                       <pip>
numpy                     1.17.4           py36hc1035e2_0  
numpy-base                1.17.4           py36hde5b4d6_0  
olefile                   0.46                       py_0  
openssl                   1.1.1d               h7b6447c_3  
pillow                    6.2.1            py36h34e0f95_0  
pip                       19.3.1                   py36_0  
pycparser                 2.19                       py_0  
python                    3.6.9                h265db76_0  
pytorch                   1.3.1           py3.6_cuda10.1.243_cudnn7.6.3_0    pytorch
readline                  7.0                  h7b6447c_5  
scipy                     1.3.2            py36h7c811a0_0  
setuptools                42.0.2                   py36_0  
six                       1.13.0                   py36_0  
sqlite                    3.30.1               h7b6447c_0  
termcolor                 1.1.0                    py36_1  
tk                        8.6.8                hbc83047_0  
torchvision               0.4.2                py36_cu101    pytorch
tqdm                      4.40.2                     py_0  
wheel                     0.33.6                   py36_0  
xz                        5.2.4                h14c3975_4  
zlib                      1.2.11               h7b6447c_3  
zstd                      1.3.7                h0b5b093_0

I can give you more info on Monday.

xiaohythu · 2019-12-28T01:42:40Z

As I mentioned in this issue，An error occured:
File "/home/xhy/systematic-generalization-sqoop-master/vr/programs.py", line 137, in get_num_inputs
return self._vocab['program_token_arity'][f]
KeyError: 'program_token_arity'.
I guess the vocab.json of clevr is different from your sqoop dataset. How should I solve this？

rizar · 2020-01-07T21:07:48Z

I have looked at both vocab.json files, and the both seem to have program_token_arity keys in them. Can you please tell me what keys you have in your vocab.json file and also where you got it from?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low accuracy when trianing your MAC model in clevr dataset #4

Low accuracy when trianing your MAC model in clevr dataset #4

xiaohythu commented Dec 22, 2019

xiaohythu commented Dec 22, 2019

rizar commented Dec 22, 2019

xiaohythu commented Dec 22, 2019

xiaohythu commented Dec 22, 2019

rizar commented Dec 22, 2019

xiaohythu commented Dec 22, 2019

rizar commented Dec 23, 2019 via email

rizar commented Dec 23, 2019 •

edited

Loading

rizar commented Dec 23, 2019

xiaohythu commented Dec 24, 2019

xiaohythu commented Dec 24, 2019

xiaohythu commented Dec 24, 2019

xiaohythu commented Dec 24, 2019

xiaohythu commented Dec 26, 2019

rizar commented Dec 27, 2019

xiaohythu commented Dec 28, 2019

rizar commented Jan 7, 2020

Low accuracy when trianing your MAC model in clevr dataset #4

Low accuracy when trianing your MAC model in clevr dataset #4

Comments

xiaohythu commented Dec 22, 2019

xiaohythu commented Dec 22, 2019

rizar commented Dec 22, 2019

xiaohythu commented Dec 22, 2019

xiaohythu commented Dec 22, 2019

rizar commented Dec 22, 2019

xiaohythu commented Dec 22, 2019

rizar commented Dec 23, 2019 via email

rizar commented Dec 23, 2019 • edited Loading

rizar commented Dec 23, 2019

xiaohythu commented Dec 24, 2019

xiaohythu commented Dec 24, 2019

xiaohythu commented Dec 24, 2019

xiaohythu commented Dec 24, 2019

xiaohythu commented Dec 26, 2019

rizar commented Dec 27, 2019

xiaohythu commented Dec 28, 2019

rizar commented Jan 7, 2020

rizar commented Dec 23, 2019 •

edited

Loading