Build a Traffic Sign Recognition Project
The goals / steps of this project are the following:
- Load the data set (see below for links to the project data set)
- Explore, summarize and visualize the data set
- Design, train and test a model architecture
- Use the model to make predictions on new images
- Analyze the softmax probabilities of the new images
- Summarize the results with a written report
Here I will consider the rubric points individually and describe how I addressed each point in my implementation.
1. Provide a Writeup / README that includes all the rubric points and how you addressed each one. You can submit your writeup as markdown or pdf. You can use this template as a guide for writing the report. The submission includes the project code.
You're reading it! and here is a link to my project code
1. Provide a basic summary of the data set. In the code, the analysis should be done using python, numpy and/or pandas methods rather than hardcoding results manually.
I used the "len()" python function and numpy shape to calculate summary statistics of the traffic signs data set:
- Number of training examples = 34799
- Number of validation examples = 4410
- Number of testing examples = 12630
- Image data shape = (32, 32, 3)
- Number of unique classes/labels = 43
Here is an exploratory visualization of the data set. It is a bar chart showing how the training images and validation are distributed among classes.
From the below plot, it's clear that some classes are under-represented in the training data. Also, it's noteworthy to mention that percentage for a certain class is similar in both the training and validation data sets. For example: "Speed limit 20Km/h" is only 1% of training images while "Speed limit 30Km/h" is about 5.7%, the precentage is similar in training and validation data sets.
Looking into few examples from random classes, it's clear that many images are quite similar and could be seen as slight transformations to one image
As a first step, I decided to convert the images to grayscale because this helps reduce processing and memory requirements.
I decided to generate additional data to help balance the training data set in order to prevent over-fitting. To add more data to the the data set, I used the following transformations mentioned in the Traffic Sign Recognition with Multi-Scale Convolutional Networks paper :
Samples are randomly perturbed in position ([-2,2] pixels), in scale ([.9,1.1] ratio) and rotation ([-15,+15] degrees)
Adding 119 - 160 images for 26 classes which sample count was less than statistical mean. Augmented training examples 4078
Here some examples of an augmented image and the corresponding original image:
In addition, a balanced sub-set of training data was generated as show in iPython cell #4
My final model consisted of the following layers: Please refer to the notebook cell #12 for exact dimensions
Layer | Description |
---|---|
Input | 32x32x1 Grayscale image |
Max pooling | 1x1 stride, VALID padding |
RELU | |
Max pooling | 2x2 stride, VALID padding |
Convolution 3x3 | 1x1 stride, VALID padding |
RELU | |
dropout | drop 20% |
Max pooling | 2x2 stride, VALID padding |
Convolution 3x3 | 1x1 stride, VALID padding |
RELU | |
dropout | drop 50% |
Fully connected | |
RELU | |
dropout | drop 20% |
Fully connected | classification layer |
Softmax | |
To train the model, I used a relatively small learning rate of 0.0005 suitable for training over a longer number of EPOCHs (typically 1000).
BATCH_SIZE was set to 500, also TensorFlow built-in Adam optimizer was used instead of gradient decsent.
Hyper parameters were set to
- mu = 0
- sigma = 0.1
My final model results were:
- training set accuracy of 100%
- validation set accuracy of 96.1%
- test set accuracy of 93.9%
An iterative approach was chosen as follows:
- LeNet architeture was used as an initial step due to its string capability to classify "glyph"s with high accuracy which is quite similar concept to symbols displayed on traffic signs, only number of classes was adjusted to 43 instead of 10, the network couldn't achieve more than 90% validation accuracy and much less in test set accuracy.
- It seems LeNet architeture couldn't capture details sufficient to classify 43 classes and at the same time overfitted to training set quickly.
- Also, the model was underfitting when normalized images were used accuracy couldn't exceed 60% in neither in training nor valdiation sets.
- In order to capture more details, more convolutional layers were added with more max pooling layers in between because maxpool is operation is less agressive. while convolutional layers help the network tolerate different insegnificant aspects including distortion, translation and rotation ..etc With this, the model achieved 92% validation accuracy.
- An important aspect was using several dropout layers in order to prevent overfitting which was key to exceed 92% validation accuracy.
- Due to memory and available data limitations, a training epoch was broken into three phases to help counter under-represented classes: ** Augmented data ** Balanced data ** Training data
- Training using Augmented data and training sub-set with balanced classes helped the model generalize well and eventually achieve 93.9% over long number of EPOCH phases as shown in the plot below:
1. Choose five German traffic signs found on the web and provide them in the report. For each image, discuss what quality or qualities might be difficult to classify.
I looked for traffic signs photos from Wuppertal, Germany where I'm located now. Here are five German traffic signs that I found on the web:
Pedestrians sign image might be really difficult to classify because it's circular, normally it's triangular. (external sign shape is show in visualizations for the feature maps).
Also, stop sign image shows some distortion as someone wrote on the sign using black marker or a similar thing.
Finally, one image has two signs "Speed Limit 60" and "Wild Animal Crossing" very close to each other, therefore the speed limit sign is partially included in the 32x32 image for "Wild Animal Crossing" sign.
Extracted 32x32 images can be found here
2. Discuss the model's predictions on these new traffic signs and compare the results to predicting on the test set. At a minimum, discuss what the predictions were, the accuracy on these new predictions, and compare the accuracy to the accuracy on the test set (OPTIONAL: Discuss the results in more detail as described in the "Stand Out Suggestions" part of the rubric).
Here are the results of the prediction:
Image | Prediction |
---|---|
Wild animals crossing | Wild animals crossing |
Pedestrians | Keep right |
Speed limit (60km/h) | Speed limit (60km/h) |
Stop | Stop |
General Caution | General Caution |
The model was able to correctly guess 4 of the 5 traffic signs, which gives an accuracy of 80%.
This compares favorably to the accuracy on the test set of 93.9% (and 96.1% on validation set).
It's noteworthy to mention that 32x32 image preparation can affect the network's capability to classify subject signs correctly, scaling down an image from higher resolution affects accuracy negatively.
Therefore, it might be better to use high resolution images taken from far distance than from closer distance whith the sign covering huge area when the network input size (or increase network input size if possible).
This note could be important in a future phase related to car camera setup and integration with the network.
The code for making predictions on my final model is located in the 11th cell of the Ipython notebook.
For the first image, the model is sure that this is a Wild animals crossing sign (probability of 97.2%) which is coming first in the top five soft max probabilities as follows:
Probability | Prediction |
---|---|
0.972 | Wild animals crossing |
0.028 | Double curve |
0.000 | Dangerous curve to the left |
0.000 | Road narrows on the right |
0.000 | Right-of-way at the next intersection |
For the second image, the model is not quite sure about the sign which is classified as Keep right sign (with only 43.4%). This is probably due to the unexpected circular shape.
Probability | Prediction |
---|---|
0.434 | Keep right |
0.236 | Speed limit (30km/h) |
0.094 | End of all speed and passing limits |
0.052 | No vehicles |
0.045 | Go straight or right |
The model was relatively sure about the third image classifying it as Speed limit (60km/h) with 65.9% probability, the image actually is a Speed limit (60km/h) sign.
Probability | Prediction |
---|---|
0.659 | Speed limit (60km/h) |
0.152 | Speed limit (80km/h) |
0.148 | No passing for vehicles over 3.5 metric tons |
0.018 | Speed limit (30km/h) |
0.017 | Go straight or left |
Despite the distortion of stop sign in the fourth image, The model classified it successfully as Stop sign with very high certainity.
Probability | Prediction |
---|---|
1.000 | Stop |
0.000 | Speed limit (30km/h) |
0.000 | Traffic signals |
0.000 | Right-of-way at the next intersection |
0.000 | General caution |
Fith image was for a General caution sign, it was also classified correctly with high certainity.
Probability | Prediction |
---|---|
1.000 | General caution |
0.000 | Traffic signals |
0.000 | Bumpy road |
0.000 | Pedestrians |
0.000 | Keep right |
The network seems to extract features related to shape, edges and internal content of the sign. Below, the network different "filters" or feature maps for three signs different in shape and symbol/content of the sign