Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error when using wsinfer outputs -- no matching files #17

Open
kaczmarj opened this issue Apr 12, 2023 · 7 comments
Open

error when using wsinfer outputs -- no matching files #17

kaczmarj opened this issue Apr 12, 2023 · 7 comments

Comments

@kaczmarj
Copy link
Member

kaczmarj commented Apr 12, 2023

hi @lthealy - i am running the tumor-til analysis pipeline on wsinfer outputs. i'm getting an error that "no predictions had exact pairs".

i have attached a tar file with a small dataset (one slide) to reproduce this error.

data.tar.gz

the dataset has the following folder structure:

data
├── samples.csv
├── tils
│   └── TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.csv
└── tumor
    └── TCGA-3C-AALI-01Z-00-DX1.F6E9A5DF-D8FB-45CF-B4BD-C6B76294C291.csv

here is the error:

[1] "TIL Algorithm (Threshold): Frontiers InceptionV4: 0.1"
[1] "=========== Params after R parsing, if any misalignment please check your flags ==========="
$algorithm
[1] "inceptionv4"

$tilDir
[1] "/data/results-tils"

$tilThresh
[1] 0.1

$cancDir

[1] "/data/results-tumor"

$cancThresh
[1] 0.5

$sampFile
[1] ""

$outputFile
[1] "output.csv"

$outputDir
[1] "/data/results-tilalign/"

$writePNG
[1] TRUE

$sampInfo
[1] "/data/sample_info.csv"

 . . . Dropping low_res and color- files . . . 
 . . . Checking for tumor/lymph pairs . . . 
 . . . All files have pairs . . . 
Error: No predictions had exact pairs. Please ensure lymph and cancer pairs have the exact same name.
Execution halted
@lthealy
Copy link
Collaborator

lthealy commented Apr 12, 2023

I think I see it. We have a grep call at line 76 in commandLineAlign.R that only keeps files that start with "prediction" in an effort to drop "color-" files. That returns no entries so everything "matches" because everything is nothing.

WSIinfer never has the prediction/color prefix, so I'll have a flag that if grep(prediction) returns 0 then don't run that trim. Sound good?

@lthealy
Copy link
Collaborator

lthealy commented Apr 12, 2023

Changes made for TIL and Canc sections, shown below for TIL only
Old Code:

tils = tils[grep("^prediction", tils)]
writeLines(" . . . Dropping low_res and color- files . . . ")
if(any(grepl("low_res", tils))){
   tils = tils[-grep("low_res", tils)]
}

New Code:

if(length(grep("^prediction", tils))>0){ ## WSInfer outputs lack prefix, older outputs have prefix. 
   tils = tils[grep("^prediction", tils)]
}

writeLines(" . . . Dropping low_res and color- files . . . ")
if(any(grepl("low_res", tils))){
   tils = tils[-grep("low_res", tils)]
}

@kaczmarj
Copy link
Member Author

is there a different path in the code to deal with wsinfer outputs? we would want to take that path if we detect that the files are from wsinfer. there's at least things we can test:

  1. like you say, a lack of prediction- prefix
  2. the presence of .csv suffixes
  3. the presence of a header in the CSV files

there should also be a message printed saying that it has found wsinfer outputs and will use those.

my only worry about assuming that we have wsinfer outputs if there are no files with prediction- prefixes is that if there are no files at all (or maybe the user passed a nested directory), then the error will be confusing.

@lthealy
Copy link
Collaborator

lthealy commented Apr 12, 2023

Yes that'll just require a little shuffling but should be just as straightforward. Currently WSInfer detection is managed after parsing (and really is just a csv suffix check). See lymphFormatCsv object for that detection

@lthealy
Copy link
Collaborator

lthealy commented Apr 14, 2023

Question @kaczmarj, does WSInfer spit any log files into the output directory? Something we would have to drop on a glob before running? I dont think so, but wanted to make sure

@kaczmarj
Copy link
Member Author

kaczmarj commented Apr 14, 2023 via email

@kaczmarj
Copy link
Member Author

here is a tree of wsinfer outputs. keep in mind that run_metadata_20230225T122426.json includes a timestamp so the actual name will differ across runs.

results-wsinfer
├── masks
│   ├── TCGA-3L-AA1B-01Z-00-DX1.jpg
│   ├── TCGA-4N-A93T-01Z-00-DX1.jpg
│   ├── TCGA-4T-AA8H-01Z-00-DX1.jpg
│   ├── TCGA-5M-AAT4-01Z-00-DX1.jpg
│   ├── TCGA-5M-AAT5-01Z-00-DX1.jpg
│   ├── TCGA-5M-AAT6-01Z-00-DX1.jpg
│   ├── TCGA-5M-AATE-01Z-00-DX1.jpg
│   ├── TCGA-A6-2671-01Z-00-DX1.jpg
│   ├── TCGA-A6-2672-01Z-00-DX1.jpg
│   └── TCGA-A6-2674-01Z-00-DX1.jpg
├── model-outputs
│   ├── TCGA-3L-AA1B-01Z-00-DX1.csv
│   ├── TCGA-4N-A93T-01Z-00-DX1.csv
│   ├── TCGA-4T-AA8H-01Z-00-DX1.csv
│   ├── TCGA-5M-AAT4-01Z-00-DX1.csv
│   ├── TCGA-5M-AAT5-01Z-00-DX1.csv
│   ├── TCGA-5M-AAT6-01Z-00-DX1.csv
│   ├── TCGA-5M-AATE-01Z-00-DX1.csv
│   ├── TCGA-A6-2671-01Z-00-DX1.csv
│   ├── TCGA-A6-2672-01Z-00-DX1.csv
│   └── TCGA-A6-2674-01Z-00-DX1.csv
├── patches
│   ├── TCGA-3L-AA1B-01Z-00-DX1.h5
│   ├── TCGA-4N-A93T-01Z-00-DX1.h5
│   ├── TCGA-4T-AA8H-01Z-00-DX1.h5
│   ├── TCGA-5M-AAT4-01Z-00-DX1.h5
│   ├── TCGA-5M-AAT5-01Z-00-DX1.h5
│   ├── TCGA-5M-AAT6-01Z-00-DX1.h5
│   ├── TCGA-5M-AATE-01Z-00-DX1.h5
│   ├── TCGA-A6-2671-01Z-00-DX1.h5
│   ├── TCGA-A6-2672-01Z-00-DX1.h5
│   └── TCGA-A6-2674-01Z-00-DX1.h5
├── process_list_autogen.csv
├── run_metadata_20230225T122426.json
└── stitches
    ├── TCGA-3L-AA1B-01Z-00-DX1.jpg
    ├── TCGA-4N-A93T-01Z-00-DX1.jpg
    ├── TCGA-4T-AA8H-01Z-00-DX1.jpg
    ├── TCGA-5M-AAT4-01Z-00-DX1.jpg
    ├── TCGA-5M-AAT5-01Z-00-DX1.jpg
    ├── TCGA-5M-AAT6-01Z-00-DX1.jpg
    ├── TCGA-5M-AATE-01Z-00-DX1.jpg
    ├── TCGA-A6-2671-01Z-00-DX1.jpg
    ├── TCGA-A6-2672-01Z-00-DX1.jpg
    └── TCGA-A6-2674-01Z-00-DX1.jpg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants