[multiresolutionimageinterface] Speed up patch loading #251

prerakmody · 2022-11-24T13:22:04Z

Hi,
I am attempting to write a as-fast-as-possible (tensorflow/python) dataloader for WSI patches. I looked in the issues for keywords like "fast", "speed", "accelerate", but did not find any best practices.

This is what i have tried for CAMELYON 16 dataset. Maybe the maintainers/community can provide some insights?

# Import ASAP lib first!
import sys
sys.path.append('C:\\Program Files\\ASAP 2.1\\bin')
import multiresolutionimageinterface as mir
reader = mir.MultiResolutionImageReader()

# Step 1 - Loop over random anchor points "pre-selected" from whole-slides-images

# res = {patient_key1: KEY_POINTS: [[x1,y1], [x2,y2], ....]}
patch_width   = ...
patch_height  = ...
patient_level = ...
 
for patient_key in res:
    
    path_img  = ...
    path_mask = ...
    wsi_img   = reader.open(str(path_img)) 
    wsi_mask  = reader.open(str(path_mask))
    ds_factor = wsi_mask.getLevelDownsample(patient_level)
    
    # Step 2 - Loop over points for a particular patient
    for point in res[patient_key][KEY_POINTS]:
        
        wsi_patch_mask  = np.array(wsi_mask.getUCharPatch(point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level))
        wsi_patch_img   = np.array(wsi_img.getUCharPatch( point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level))

        yield(wsi_patch_img, wsi_patch_mask)

Full code can be found here

My concern is that since I am loading so many patches from the same patient (with some randomization). And then once a fixed set of patches N is loaded from a patient, I move on to the next patient. Is it not possible to speed the patch loading for a patient? Or should I load the whole image at once, but that may lead to memory overflow?

GeertLitjens · 2022-11-24T17:24:35Z

There are a couple of things you can try: 1. Use multiprocessing and get patches from several images at once. 2. Sample all patches once and write them to disk in a fast format for your DL library of choice (e.g. TFRecords for TensorFlow) 3. Try to prevent reading across tile boundaries, the underlying TIFF files are tiled. If you request a region that is the same size as the tilesize, but starts at the center point of the tile, you will need to read 4 tiles to construct the requested tile. This is not always possible and depends on your use-case of course. Op do 24 nov. 2022 om 14:22 schreef pmod ***@***.***>:

…

Hi, I am attempting to write a as-fast-as-possible (tensorflow/python) dataloader for WSI patches. I looked in the issues for keywords like "fast", "speed", "accelerate", but did not find any best practices. This is what i have tried for CAMELYON 16 dataset. Maybe the maintainers/community can provide some insights? # Import ASAP lib first!import syssys.path.append('C:\\Program Files\\ASAP 2.1\\bin')import multiresolutionimageinterface as mirreader = mir.MultiResolutionImageReader() # Step 1 - Loop over random anchor points "pre-selected" from whole-slides-images # res = {patient_key1: KEY_POINTS: [[x1,y1], [x2,y2], ....]}patch_width = ...patch_height = ...patient_level = ... for patient_key in res: path_img = ... path_mask = ... wsi_img = reader.open(str(path_img)) wsi_mask = reader.open(str(path_mask)) ds_factor = wsi_mask.getLevelDownsample(patient_level) # Step 2 - Loop over points for a particular patient for point in res[patient_key][KEY_POINTS]: wsi_patch_mask = np.array(wsi_mask.getUCharPatch(point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level)) wsi_patch_img = np.array(wsi_img.getUCharPatch( point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level)) yield(wsi_patch_img, wsi_patch_mask) Full code can be found here <https://gist.github.com/prerakmody/9237b618c804ca9b99c1fd21e30de496> My concern is that since I am loading so many patches from the same patient (with some randomization). And then once a fixed set of patches N is loaded from a patient, I move on to the next patient. Is it not possible to speed the patch loading for a patient? Or should I load the whole image at once, but that may lead to memory overflow? — Reply to this email directly, view it on GitHub <#251>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJIFUEWJW23V5PCKQMJNE3WJ5TYRANCNFSM6AAAAAASKMV7P4> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

prerakmody · 2022-11-29T16:23:50Z

Thanks for the suggestion!
I attempted option 1 as it is feasible for my pipeline. But since .getUCharPatch() is already so fast (less than 0.1 sec for each access (test code)), I did not obtain improvements (or any significant reductions). Note that I used tf.data.Dataset API. Looks like the overhead of multiprocessing adds more time than it saves.

Below is a histogram for 2000 patch accesses using .getUCharPatch() (on different (x,y) coords and WSI's). X-axis=time(s)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[multiresolutionimageinterface] Speed up patch loading #251

[multiresolutionimageinterface] Speed up patch loading #251

prerakmody commented Nov 24, 2022

GeertLitjens commented Nov 24, 2022 via email

prerakmody commented Nov 29, 2022

[multiresolutionimageinterface] Speed up patch loading #251

[multiresolutionimageinterface] Speed up patch loading #251

Comments

prerakmody commented Nov 24, 2022

GeertLitjens commented Nov 24, 2022 via email

prerakmody commented Nov 29, 2022