Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi GPU support? #20

Open
Minotaur-CN opened this issue Nov 8, 2017 · 9 comments
Open

multi GPU support? #20

Minotaur-CN opened this issue Nov 8, 2017 · 9 comments

Comments

@Minotaur-CN
Copy link

Minotaur-CN commented Nov 8, 2017

Dear All,

I have problem when using multi-GPU to train the model. How to set multi-gpu to boost the training procedure.

I tried following methods:

  1. set GPUdeviceNumber=2 or GPUdeviceNumber=0,1
  2. add os.environ["CUDA_VISIBLE_DEVICES"] = "2"
    but can not work.

Thanks for your reply!
@michalfaber

@anatolix
Copy link

anatolix commented Nov 8, 2017

config has GPUdeviceNumber variable but looks like it is never used actually.
I.e. keras calls tensorflow and tensorflow decides which cards to use.

On my machine TF always use both unless limited in environment variables.
check nvidia-smi may be you already use both. If not check environment variables CUDA_VISIBLE_DEVICES
nb: if you want override CUDA_VISIBLE_DEVICES from python code it should be done before importing keras. after importing it is already late - session already created

@Minotaur-CN
Copy link
Author

Thanks for your advice!@antolix
The nvidia-smi showed the both GPU were precessed but only one actually running to 90% during training, the other GPU are 0%.

TF could use both gpus but keras processed on two gpus, but only one actually working.

@ksaluja15
Copy link

export CUDA_VISIBLE_DEVICES=GPU_ID in the terminal would make sure that only that GPU is used.

@Minotaur-CN
Copy link
Author

Thanks for your reply! @ksaluja15
Yes ,it works for one GPU, how to utilize two GPU for trainning?

@anatolix
Copy link

You could patch model a bit using this feature of keras
https://keras.io/utils/#multi_gpu_model

@Minotaur-CN
Copy link
Author

@anatolix
Cool Method!
I tested the examples in the link. It works fine.

Thank your very much! lol

@anatolix
Copy link

anatolix commented Nov 15, 2017

I've implemented it in my fork.
set use_multiple_gpus in train_pose.py to number of cards.

side effect is weight_LY_SX_loss renamed to concatenate_ZZZ_loss, but total loss is calculated ok.

for my setup it doesn't give a large boost to speed one of the card are always underloaded, and load jumps, but I have 2 different cards, may be your will be better.

It needs last keras and TF 1.4

@Minotaur-CN
Copy link
Author

Minotaur-CN commented Nov 16, 2017

Thanks for your advice.@anatolix.
I implement as you said, https://keras.io/utils/#multi_gpu_model
it loss decreas faster.
weight_LY_SX_loss changeed to concatenate_ZZZ_loss and the tensorboard show more loss than before.

The two GPU did not work full speed as before, or as the training performance as caffe code. May be the data preprocessing? or the mutli-gpu did not work well.

@anatolix
Copy link

Mostly looks like multi-gpu problem. Data preprocessing could feed 5 gpus with current speed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants