Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clinical] Fix naming of the tables associated with analysis results #115

Open
fedorov opened this issue Jan 2, 2025 · 7 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@fedorov
Copy link
Member

fedorov commented Jan 2, 2025

See related discussion in https://nciimagingdat-m8a6349.slack.com/archives/C045VQUAZ0S/p1734563139760659.

The convention introduced in v20 of assigning names as <collection_id>_bamf_<cancer_location>_<modality>_segmentation should be reverted.

I suggest for analysis results we assign the name as <analysis_results_collection_id> (if there is no name assigned to the individual table), and analysis_results_collection_id column to the columns_metadata table.

@fedorov fedorov added the bug Something isn't working label Jan 2, 2025
@bcli4d
Copy link
Member

bcli4d commented Jan 2, 2025

An issue with your proposal is that, for both liver and lung cancer collections, there are two csv files. E.G. for lung cancer there is a file corresponding to CT segs and a file corresponding to FDG PET/CT segs. Similarly for liver cancer...CT and MR

So we'd have to merge these files , and, in the case of lung cancer, the files have slightly different columns: the CT seg file has a 'SeriesInstanceUID' column. The FDG PET/CT file has PTSeriesInstanceUID and CTSeriesInstanceUID columns.

Also, probably need to add a column to differentiate modalities?

@fedorov
Copy link
Member Author

fedorov commented Jan 2, 2025

As with any collection, we will need to have custom rules how to assign the suffix of the table. I think those rules should follow the conventions established by the creators of those tables - not by us. This way it will be easier for the users who look at the collection at the page pointed by DOI to reconcile organization of the accompanying file with respect to what we have in BQ.

As another general rule, we do not harmonize and do not modify the tables we ingest, other than making sure PatientID in the images matches dicom_patient_id field in the ingested tables.

I didn't look at the specifics for this collection, but doing this now, from https://zenodo.org/records/13244892, there is a bunch of zip files, and (I assume - I did not look at every zip file) each zip file contains file named qa-results.csv that we need to ingest. I suggest we could use the pattern BAMF_AIMI_Annotations_<zip file prefix>_qa_results, such as BAMF_AIMI_Annotations_lung_fdg_pet_ct_qa_results.

This also reminds me that we should revisit rules for assigning IDs to the analysis results - it is quite confusing that collection_id and analysis_results_collection_id follow different conventions.

@bcli4d
Copy link
Member

bcli4d commented Jan 2, 2025

Which specific conventions are not consistent? There are potentially several.

@bcli4d
Copy link
Member

bcli4d commented Jan 2, 2025

The original_collections_metadata table has:

  • collection_name: Collection name as used externally by IDC webapp
  • collection_id: Collection ID as used internally by IDC webapp
    and analysis_results_metadata has:
  • ID: Results ID
  • Title: Descriptive title

We could go to:

  • collection_name: Collection name as used externally by IDC webapp
  • collection_id: Collection ID as used to identify collections in other BQ tables
  • collection_title: Descriptive title
    and
  • analysis_name: Analysis name as used externally by IDC webapp
  • analysis_id: Analysis ID as used to identify analysis results in other BQ tables
  • analysis_title: Descriptive title

So, for BAMF:
analysis_name: BAMF_AIMI_Annotations_lung_fdg_pet_ct_qa_results
analysis_id: bamf_aimi_annotations_lung_fdg_pet_ct_qa_results
analysis_title: Image segmentations produced by BAMF under the AIMI Annotations initiative

Then, instead of BAMF_AIMI_Annotations_lung_fdg_pet_ct_qa_results, we'd go to bamf_aimi_annotations_lung_fdg_pet_ct_qa_results

@bcli4d
Copy link
Member

bcli4d commented Jan 29, 2025

I'm working on restructuring BAMF file names...
The table_metadata clinical BQ table has one or more rows per collection where each row describes a table which contains data of patients in that collection. Should each bamf_aimi_table be associated with each collection that has patients in the bamf_aimi_table? Or, should bamf_aimi_annotations be in the collection_id column?
E.G., currently table_metadata has this partial row:

anti_pd_1_lung bigquery-public-data.idc_v20_clinical.anti_pd_1_lung_bamf_lung_ct_segmentation segmentation of Lung and Nodules (3mm-30mm) from CT scans

Should this now be replaced by:

anti_pd_1_lung bigquery-public-data.idc_v20_clinical.bamf_aimi_annotations_lung_ct_qa_results segmentation of Lung and Nodules (3mm-30mm) from CT scans

and with a similar row for the other collections, lung_pet_ct_dx, etc., that have patients in bamf_aimi_annotations_lung_ct_qa_results? Or should those rows be deleted, and instead have a row for each bamf_aimi_annotations table:

bamf_aimi_annotations bigquery-public-data.idc_v20_clinical.bamf_aimi_annotations_lung_ct_qa_results segmentation of Lung and Nodules (3mm-30mm) from CT scans
bamf_aimi_annotations bigquery-public-data.idc_v20_clinical.bamf_aimi_annotations_lung_fdg_pet_ct_qa_results segmentation of Lungs and FDG-avid lesions in the lung from FDG PET/CT scans
...

@bcli4d
Copy link
Member

bcli4d commented Jan 31, 2025

Please look at idc-dev-etl.idc_v21_clinical. So, there are now 11 BQ tables named like bamf_ami_annotations_*_qa_results.
I implemented the first approach above. Specifically, the table_metadata BQ table does not include bamf_aimi_annotations in the collection_id column.

@bcli4d
Copy link
Member

bcli4d commented Feb 1, 2025

Also, please look at my note above on conventions for original_collections_metadata and analysis_results_metadata

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants