Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy is ~80 after 350 epochs #4

Open
ChesterAiGo opened this issue Aug 4, 2017 · 5 comments
Open

Accuracy is ~80 after 350 epochs #4

ChesterAiGo opened this issue Aug 4, 2017 · 5 comments

Comments

@ChesterAiGo
Copy link

ChesterAiGo commented Aug 4, 2017

hi vibrantabhi19 :

Thank you for sharing your code! That's very helpful for me to understand All-CNN.

In addition, I've trained it last with your model night with 350 epochs, however found its accuracy (i.e. val_acc) became stable (about 0.81) after epoch 49 and remained the same to the end

Any ideas? :) 👍

The model I used:

`
model = Sequential()

model.add(Conv2D(96, (3, 3), padding="same", input_shape=(32, 32, 3)))
model.add(Activation('relu'))
model.add(Conv2D(96, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(96, (3, 3), padding="same", strides=2))
model.add(Dropout(0.5))

model.add(Conv2D(192, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(192, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(192, (3, 3), padding="same", strides=2))
model.add(Dropout(0.5))

model.add(Conv2D(192, (3, 3), padding="same"))
model.add(Activation('relu'))
model.add(Conv2D(192, (1, 1), padding="valid"))
model.add(Activation('relu'))
model.add(Conv2D(10, (1, 1), padding="valid"))

model.add(GlobalAveragePooling2D())
model.add(Activation('softmax'))

sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])`

@iabhi7
Copy link
Contributor

iabhi7 commented Aug 4, 2017

Hi @ChesterAiGo
Thanks.
As far as I can tell, you should try with a different set of learning parameter, maybe try Adam as your optimizer because the network is not able to converge.
Also in the original paper scheduler S = "e1 ,e2 , e3" were used in which γ is multiplied by a fixed multiplier of 0.1 after e1. e2 and e3 epochs respectively. (where e1 = 200, e2 = 250, e3 = 300).
Maybe you can have a go at that.
What's your training_accuracy? A measure of training accuracy might ensure that the model is not overfitting.

@ChesterAiGo
Copy link
Author

Hi @vibrantabhi19

Thanks for your prompt reply! I will have a try of different optimizers as well as try vary γ during training(I think that's probably why)

In addition, there was something very interesting about the accuracies..i.e. the training accuracy keeps increasing steadily (from epoch 1 to epoch 350) while the validation accuracy became stable (was not increasing but was not decreasing as well..that's weird xD) after epoch 49..

Something looks like:

Epoch 1: Val: 0.1, Train: 0.1
...
Epoch 49: Val: 0.8, Train: 0.8
...
Epoch 450: Val: 0.8, Train: 0.94

Thanks again ! :)

@iabhi7
Copy link
Contributor

iabhi7 commented Aug 4, 2017

Oh, that's weird, the network cannot overfit, we are already using a dropout of 0.5.
Since the network is converging (train_acc=0.94 is a proof of that), I don't think trying out different optimizers will help, anyways go ahead with the experiment and post your results here.
I will try investigating on my end (the same code has worked for a lot people so I am not able to figure the exact error)

@marcj
Copy link

marcj commented Sep 23, 2017

I can confirm, that using the original code (with the fix in #5) and removal of multi_gpu code reveals an accuracy over 81%. My best after 350 epochs using the code of this repository was 90.88%. However, it cracked 90% already in epoch 140.

See accuracy (as CSV):

screen shot 2017-09-23 at 21 48 16

And loss (as CSV):

screen shot 2017-09-23 at 21 48 30

The learning rate decay produced this (as CSV):
screen shot 2017-09-23 at 21 49 23

See also full console log.

and all source code + weights here: https://aetros.com/marcj/keras:all-conv/view/refs/aetros/job/92fcd671c6814c375edd404a65edc66c00ba5aec or in the analytics tool at https://trainer.aetros.com/model/marcj/keras:all-conv/job/92fcd671c6814c375edd404a65edc66c00ba5aec (requires login first)

Hyper parameter and other information here:

screen shot 2017-09-23 at 21 51 37

So what I can say: I can not reproduce the stuck at 81%. @ChesterAiGo, you can fork my model at https://aetros.com/marcj/keras:all-conv and try to run it on your hardware, so we have all information to debug it.

However, I'd also like to know why this code does not produce the results from the linked paper and what you need concretely to achieve 95.59% for cifar10 using all-conv.

@JaeDukSeo
Copy link

this is some sexy plots 90 percent accuracy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants