To make tensorflow run in high efficiency, first save data in TFRecord files.
-
Create one dir and copy all images into this dir. We call it
image_dir
. -
Create
image_list
txt file. The format is like:COCO_val2014_000000320715.jpg 8 COCO_val2014_000000379048.jpg 2 COCO_val2014_000000014562.jpg 9 ...
Tip: create two files, one for training, one for evaluation.
-
Create
image_label
txt file. The format is like:1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 1 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 ...
In the above example, assume there are total 35 labels. Each line corresponds to one image in
image_list
txt file. Each label has one fixed index. The value 1 means image has this label, 0 means not. The number in second column ofimage_list
means how many labels the image file has.Tip: create two files, one for training, one for evaluation.
-
Create
tfrecords
file. The model will read data from TFRecords data format. Just run script:python create_tfrecord.py \ --image_dir="/path/to/images_dir" \ --imglist_file="/path/to/image_list_file" \ --imglabel_file="/path/to/image_label_file" \ --output_file="/path/to/xx.tfrecords" \ --gpu="1"
Tip: Create
train.tfrecords
andeval.tfrecords
separately.read_tfrecord.py
is just a tool script to read data from tfrecords for test purpose.
-
This library do image feature extraction by pre-trained resnet_50 model. I have downloaded network definition files(
resnet_utils.py
,resnet_v2.py
) from ResNet V2 50, you still need to download the pre-trained checkpoint. -
You can also change the file
multi_label_classification_model.py
to use rest101 or other models. Find the networks and pre-trained models from here.
-
Model is defined in file
multi_label_classification_model.py
. I just choose one endpoint in the pre-trained model, then add three conv2d layers in the end. -
Input image processing is very import. The logic is:
-
In training process, first resize image to a larger size, then random crop to the target size, and do some image augmentations. Finally use this randomly created image for training.
-
In evaluation process, I just resize image to the target size.
-
In inference process, first resize image to a larger size, then use
10 crops evaluation
method: for one image, using 10 crops(top-left, top-right, bottom-left, bottom-right, center and the mirrors) go throw the model, and compute the mean or max value of 10 outputs.
-
-
Config model. In
train.py
, modify the params ofModelConfig
creation. -
Config train. In
train.py
, modify the params ofTrainConfig
. -
Run script:
python train.py
After start train script, start the evaluation script, let it run in parallel with train:
-
Config model. In
evaluate.py
, modify the params ofModelConfig
creation. -
Config eval. In
evaluate.py
, modify the params ofEvalConfig
. -
Run script:
python evaluate.py
Use tensorboard
to monitor the training process. When the model is likely to be overfitting, start it, choose one good checkpoint, and use this checkpoint to do inference operation on test dataset:
-
Config model. In
inference.py
, modify the params ofModelConfig
creation. -
Config eval. In
inference.py
, modify the params ofInferenceConfig
. -
Implement the
get_test_image_list
funciton ininference.py
, let it return a list of image paths, like:[ '/path/to/img1.jpg', '/path/to/img2.jpg', ... ]
-
Run script:
python inference.py
After inference, the model produces score (or confidence) values for each label. It's time to choose threshold values to decide whether specific label belongs to an image or not. Method is:
-
Use trained model to do inference on evaluation dataset. It produces the scores for evaluation dataset.
-
Use
threshold_calibration.py
to compute optimal thresholds for each label. -
Use the computed optimal thresholds on the test dataset's inference result.