learning rate decay is depending on batch size (was: Training never achieves results of original caffe models) #39

anatolix · 2017-11-27T18:08:54Z

Update: I've probably found the problem, see last comments

I tried to train model with original C++ augmentation (rmpe_server) and my own python implementation (py_rmpe_server) it never train correctly.

To prove my point this is demo.py output with weights.best.h5 converted from caffe

Note it is perfect joint match and no additional unconnected points.

anatolix · 2017-11-27T18:13:29Z

These are training with C++ rmpe_server picture for each 10 generations

there is perfect skeletons but additional points

near perfect but double dots in legs of center guy on the background

more overfitted models later

anatolix · 2017-11-27T18:14:36Z

These are my results with py_rmpe_server so far:

anatolix · 2017-11-27T18:16:36Z

The questions is did this project are able to achieve results of original training at all?
It may be caught somewhere around 70th generation or may be it wasn't achieved at all?
Do original authors know some magic trick? Do we have same learning rate, initialization, etc?

anatolix · 2017-11-27T19:57:51Z

I check val_stage6_Lx losses which inflience qualities. For Cpp augmentation best are

gen 117 for L2 loss

gen 107 for L1 loss

They are almost perfect again. May be training step decay made better

anatolix · 2017-11-27T20:25:21Z

May be I've found problem:

0 4e-05
1 4e-05
2 4e-05
3 4e-05
...
50 4e-05
51 4e-05
52 1.3320000000000001e-05
53 1.3320000000000001e-05
...
102 1.3320000000000001e-05
103 1.3320000000000001e-05
104 4.435560000000001e-06
105 4.435560000000001e-06
...
148 4.435560000000001e-06
149 4.435560000000001e-06

Note: Actually we achieve small enough learning rate after generation 100.

Original code has 25 generations, each twice size as ours(meta.write_number: 121000), btw why? Did they had more images or made more agmentation?
Learning rate changed after epoch 17 (ours will be 36) and training finished by epoch 25 (ours 50)

anatolix · 2017-11-27T20:32:06Z

And all of this affected by batch size, I have batch size = 20, so probably there is my problems in training

anatolix · 2017-11-27T21:11:30Z

Probably I need help here - should be learning rate changed with batch size?
On one thought larger batch size is larger sum of gradients, I do't remember did the use sum or mean.
On other thought larger batch mean less stochastic.
Haven't found answer in google.
Was the dependency of lr decay on batch size intentional?

anatolix mentioned this issue Nov 27, 2017

py_rmpe_server is in working condition #31

Closed

anatolix changed the title ~~Training never achieves results of original caffe models~~ learning rate decay is depending on batch size (was: Training never achieves results of original caffe models) Nov 27, 2017

anatolix mentioned this issue Dec 20, 2017

Is there any mask for some people? #43

Open

anatolix mentioned this issue Jan 17, 2018

Critical bug in generate hdf5 #41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

learning rate decay is depending on batch size (was: Training never achieves results of original caffe models) #39

learning rate decay is depending on batch size (was: Training never achieves results of original caffe models) #39

anatolix commented Nov 27, 2017 •

edited

Loading

anatolix commented Nov 27, 2017

anatolix commented Nov 27, 2017

anatolix commented Nov 27, 2017 •

edited

Loading

anatolix commented Nov 27, 2017

anatolix commented Nov 27, 2017 •

edited

Loading

anatolix commented Nov 27, 2017

anatolix commented Nov 27, 2017 •

edited

Loading

learning rate decay is depending on batch size (was: Training never achieves results of original caffe models) #39

learning rate decay is depending on batch size (was: Training never achieves results of original caffe models) #39

Comments

anatolix commented Nov 27, 2017 • edited Loading

anatolix commented Nov 27, 2017

anatolix commented Nov 27, 2017

anatolix commented Nov 27, 2017 • edited Loading

anatolix commented Nov 27, 2017

anatolix commented Nov 27, 2017 • edited Loading

anatolix commented Nov 27, 2017

anatolix commented Nov 27, 2017 • edited Loading

anatolix commented Nov 27, 2017 •

edited

Loading

anatolix commented Nov 27, 2017 •

edited

Loading

anatolix commented Nov 27, 2017 •

edited

Loading

anatolix commented Nov 27, 2017 •

edited

Loading