Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[multiresolutionimageinterface] Speed up patch loading #251

Open
prerakmody opened this issue Nov 24, 2022 · 2 comments
Open

[multiresolutionimageinterface] Speed up patch loading #251

prerakmody opened this issue Nov 24, 2022 · 2 comments

Comments

@prerakmody
Copy link

Hi,
I am attempting to write a as-fast-as-possible (tensorflow/python) dataloader for WSI patches. I looked in the issues for keywords like "fast", "speed", "accelerate", but did not find any best practices.

This is what i have tried for CAMELYON 16 dataset. Maybe the maintainers/community can provide some insights?

# Import ASAP lib first!
import sys
sys.path.append('C:\\Program Files\\ASAP 2.1\\bin')
import multiresolutionimageinterface as mir
reader = mir.MultiResolutionImageReader()

# Step 1 - Loop over random anchor points "pre-selected" from whole-slides-images

# res = {patient_key1: KEY_POINTS: [[x1,y1], [x2,y2], ....]}
patch_width   = ...
patch_height  = ...
patient_level = ...
 
for patient_key in res:
    
    path_img  = ...
    path_mask = ...
    wsi_img   = reader.open(str(path_img)) 
    wsi_mask  = reader.open(str(path_mask))
    ds_factor = wsi_mask.getLevelDownsample(patient_level)
    
    # Step 2 - Loop over points for a particular patient
    for point in res[patient_key][KEY_POINTS]:
        
        wsi_patch_mask  = np.array(wsi_mask.getUCharPatch(point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level))
        wsi_patch_img   = np.array(wsi_img.getUCharPatch( point[0]) * ds_factor, point[1] * ds_factor, patch_width, patch_height, patient_level))

        yield(wsi_patch_img, wsi_patch_mask)

Full code can be found here

My concern is that since I am loading so many patches from the same patient (with some randomization). And then once a fixed set of patches N is loaded from a patient, I move on to the next patient. Is it not possible to speed the patch loading for a patient? Or should I load the whole image at once, but that may lead to memory overflow?

@GeertLitjens
Copy link
Collaborator

GeertLitjens commented Nov 24, 2022 via email

@prerakmody
Copy link
Author

Thanks for the suggestion!
I attempted option 1 as it is feasible for my pipeline. But since .getUCharPatch() is already so fast (less than 0.1 sec for each access (test code)), I did not obtain improvements (or any significant reductions). Note that I used tf.data.Dataset API. Looks like the overhead of multiprocessing adds more time than it saves.

Below is a histogram for 2000 patch accesses using .getUCharPatch() (on different (x,y) coords and WSI's). X-axis=time(s)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants