Skip to content

raycaohmu/DSMIL-code

Repository files navigation

DSMIL: Dual-stream multiple instance learning networks for tumor detection in Whole Slide Image

##Training.

  $ python train_tcga.py --dataset=[DATASET_NAME]

You will need to adjust --num_classes option if the dataset contains more than 2 positive classes or only 1 positive class and 1 negative class (binary classifier). See the next section for details.

Useful arguments:

[--num_classes]         # Number of non-negative classes.
[--feats_size]          # Size of feature vector (depends on the CNN backbone).
[--thres]               # List of thresholds for the classes returned by the training function.
[--embedder_weights]    # Path to the embedder weights file (saved by SimCLR). Use 'ImageNet' if ImageNet pretrained embedder is used.
[--aggregator_weights]  # Path to the aggregator weights file.
[--bag_path]            # Path to a folder containing folders of patches.
[--patch_ext]            # File extensino of patches.
[--map_path]            # Path of output attention maps.

Folder structures

Data is organized in two folders, WSI and datasets. WSI folder contains the images and datasets contains the computed features.

root
|-- WSI
|   |-- DATASET_NAME
|   |   |-- CLASS_1
|   |   |   |-- SLIDE_1.svs
|   |   |   |-- ...
|   |   |-- CLASS_2
|   |   |   |-- SLIDE_1.svs
|   |   |   |-- ...

Once patch extraction is performed, sinlge folder or pyramid folder will appear.

root
|-- WSI
|   |-- DATASET_NAME
|   |   |-- single
|   |   |   |-- CLASS_1
|   |   |   |   |-- SLIDE_1
|   |   |   |   |   |-- PATCH_1.jpeg
|   |   |   |   |   |-- ...
|   |   |   |   |-- ...
|   |   |-- pyramid
|   |   |   |-- CLASS_1
|   |   |   |   |-- SLIDE_1
|   |   |   |   |   |-- PATCH_LOW_1
|   |   |   |   |   |   |-- PATCH_HIGH_1.jpeg
|   |   |   |   |   |   |-- ...
|   |   |   |   |   |-- ...
|   |   |   |   |   |-- PATCH_LOW_1.jpeg
|   |   |   |   |   |-- ...
|   |   |   |   |-- ...

Once feature computing is performed, DATASET_NAME folder will appear inside datasets folder.

root
|-- datasets
|   |-- DATASET_NAME
|   |   |-- CLASS_1
|   |   |   |-- SLIDE_1.csv
|   |   |   |-- ...
|   |   |-- CLASS_2
|   |   |   |-- SLIDE_1.csv
|   |   |   |-- ...
|   |   |-- CLASS_1.csv
|   |   |-- CLASS_2.csv
|   |   |-- DATASET_NAME.csv

Feature vector csv files explanation

  1. For each bag, there is a .csv file where each row contains the feature of an instance. The .csv is named as "bagID.csv" and put into a folder named "dataset-name/category/".

  2. There is a "dataset-name.csv" file with two columns where the first column contains the paths to all bagID.csv files, and the second column contains the bag labels.

  3. Labels.

For binary classifier, use 1 for positive bags and 0 for negative bags. Use --num_classes=1 at training.
For multi-class classifier (N positive classes and one optional negative class), use 0~(N-1) for positive classes. If you have a negative class (not belonging to any one of the positive classes), use N for its label. Use --num_classes=N (N equals the number of positive classes) at training.

Citation

If you use the code or results in your research, please use the following BibTeX entry.

@inproceedings{li2021dual,
  title={Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning},
  author={Li, Bin and Li, Yin and Eliceiri, Kevin W},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={14318--14328},
  year={2021}
}


About

this is the reproduced code of dsmil method

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages