multi GPU support? #20

Minotaur-CN · 2017-11-08T01:14:25Z

Dear All,

I have problem when using multi-GPU to train the model. How to set multi-gpu to boost the training procedure.

I tried following methods:

set GPUdeviceNumber=2 or GPUdeviceNumber=0,1
add os.environ["CUDA_VISIBLE_DEVICES"] = "2"
but can not work.

Thanks for your reply!
@michalfaber

anatolix · 2017-11-08T22:24:42Z

config has GPUdeviceNumber variable but looks like it is never used actually.
I.e. keras calls tensorflow and tensorflow decides which cards to use.

On my machine TF always use both unless limited in environment variables.
check nvidia-smi may be you already use both. If not check environment variables CUDA_VISIBLE_DEVICES
nb: if you want override CUDA_VISIBLE_DEVICES from python code it should be done before importing keras. after importing it is already late - session already created

Minotaur-CN · 2017-11-09T05:05:16Z

Thanks for your advice!@antolix
The nvidia-smi showed the both GPU were precessed but only one actually running to 90% during training, the other GPU are 0%.

TF could use both gpus but keras processed on two gpus, but only one actually working.

ksaluja15 · 2017-11-09T17:52:41Z

export CUDA_VISIBLE_DEVICES=GPU_ID in the terminal would make sure that only that GPU is used.

Minotaur-CN · 2017-11-11T07:20:37Z

Thanks for your reply! @ksaluja15
Yes ,it works for one GPU, how to utilize two GPU for trainning?

anatolix · 2017-11-13T22:57:51Z

You could patch model a bit using this feature of keras
https://keras.io/utils/#multi_gpu_model

Minotaur-CN · 2017-11-14T03:40:23Z

@anatolix
Cool Method!
I tested the examples in the link. It works fine.

Thank your very much! lol

anatolix · 2017-11-15T20:55:45Z

I've implemented it in my fork.
set use_multiple_gpus in train_pose.py to number of cards.

side effect is weight_LY_SX_loss renamed to concatenate_ZZZ_loss, but total loss is calculated ok.

for my setup it doesn't give a large boost to speed one of the card are always underloaded, and load jumps, but I have 2 different cards, may be your will be better.

It needs last keras and TF 1.4

Minotaur-CN · 2017-11-16T05:26:39Z

Thanks for your advice.@anatolix.
I implement as you said, https://keras.io/utils/#multi_gpu_model
it loss decreas faster.
weight_LY_SX_loss changeed to concatenate_ZZZ_loss and the tensorboard show more loss than before.

The two GPU did not work full speed as before, or as the training performance as caffe code. May be the data preprocessing? or the mutli-gpu did not work well.

anatolix · 2017-11-16T06:49:53Z

Mostly looks like multi-gpu problem. Data preprocessing could feed 5 gpus with current speed

Minotaur-CN closed this as completed Nov 14, 2017

Minotaur-CN reopened this Nov 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi GPU support? #20

multi GPU support? #20

Minotaur-CN commented Nov 8, 2017 •

edited

Loading

anatolix commented Nov 8, 2017

Minotaur-CN commented Nov 9, 2017

ksaluja15 commented Nov 9, 2017

Minotaur-CN commented Nov 11, 2017

anatolix commented Nov 13, 2017

Minotaur-CN commented Nov 14, 2017

anatolix commented Nov 15, 2017 •

edited

Loading

Minotaur-CN commented Nov 16, 2017 •

edited

Loading

anatolix commented Nov 16, 2017

multi GPU support? #20

multi GPU support? #20

Comments

Minotaur-CN commented Nov 8, 2017 • edited Loading

anatolix commented Nov 8, 2017

Minotaur-CN commented Nov 9, 2017

ksaluja15 commented Nov 9, 2017

Minotaur-CN commented Nov 11, 2017

anatolix commented Nov 13, 2017

Minotaur-CN commented Nov 14, 2017

anatolix commented Nov 15, 2017 • edited Loading

Minotaur-CN commented Nov 16, 2017 • edited Loading

anatolix commented Nov 16, 2017

Minotaur-CN commented Nov 8, 2017 •

edited

Loading

anatolix commented Nov 15, 2017 •

edited

Loading

Minotaur-CN commented Nov 16, 2017 •

edited

Loading