##Training.
$ python train_tcga.py --dataset=[DATASET_NAME]
You will need to adjust
--num_classes
option if the dataset contains more than 2 positive classes or only 1 positive class and 1 negative class (binary classifier). See the next section for details.
Useful arguments:
[--num_classes] # Number of non-negative classes.
[--feats_size] # Size of feature vector (depends on the CNN backbone).
[--thres] # List of thresholds for the classes returned by the training function.
[--embedder_weights] # Path to the embedder weights file (saved by SimCLR). Use 'ImageNet' if ImageNet pretrained embedder is used.
[--aggregator_weights] # Path to the aggregator weights file.
[--bag_path] # Path to a folder containing folders of patches.
[--patch_ext] # File extensino of patches.
[--map_path] # Path of output attention maps.
Data is organized in two folders, WSI
and datasets
. WSI
folder contains the images and datasets
contains the computed features.
root
|-- WSI
| |-- DATASET_NAME
| | |-- CLASS_1
| | | |-- SLIDE_1.svs
| | | |-- ...
| | |-- CLASS_2
| | | |-- SLIDE_1.svs
| | | |-- ...
Once patch extraction is performed, sinlge
folder or pyramid
folder will appear.
root
|-- WSI
| |-- DATASET_NAME
| | |-- single
| | | |-- CLASS_1
| | | | |-- SLIDE_1
| | | | | |-- PATCH_1.jpeg
| | | | | |-- ...
| | | | |-- ...
| | |-- pyramid
| | | |-- CLASS_1
| | | | |-- SLIDE_1
| | | | | |-- PATCH_LOW_1
| | | | | | |-- PATCH_HIGH_1.jpeg
| | | | | | |-- ...
| | | | | |-- ...
| | | | | |-- PATCH_LOW_1.jpeg
| | | | | |-- ...
| | | | |-- ...
Once feature computing is performed, DATASET_NAME
folder will appear inside datasets
folder.
root
|-- datasets
| |-- DATASET_NAME
| | |-- CLASS_1
| | | |-- SLIDE_1.csv
| | | |-- ...
| | |-- CLASS_2
| | | |-- SLIDE_1.csv
| | | |-- ...
| | |-- CLASS_1.csv
| | |-- CLASS_2.csv
| | |-- DATASET_NAME.csv
-
For each bag, there is a .csv file where each row contains the feature of an instance. The .csv is named as "bagID.csv" and put into a folder named "dataset-name/category/".
-
There is a "dataset-name.csv" file with two columns where the first column contains the paths to all bagID.csv files, and the second column contains the bag labels.
-
Labels.
For binary classifier, use
1
for positive bags and0
for negative bags. Use--num_classes=1
at training.
For multi-class classifier (N
positive classes and one optional negative class), use0~(N-1)
for positive classes. If you have a negative class (not belonging to any one of the positive classes), useN
for its label. Use--num_classes=N
(N
equals the number of positive classes) at training.
If you use the code or results in your research, please use the following BibTeX entry.
@inproceedings{li2021dual,
title={Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning},
author={Li, Bin and Li, Yin and Eliceiri, Kevin W},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={14318--14328},
year={2021}
}