[clinical] Fix naming of the tables associated with analysis results #115

fedorov · 2025-01-02T17:09:58Z

See related discussion in https://nciimagingdat-m8a6349.slack.com/archives/C045VQUAZ0S/p1734563139760659.

The convention introduced in v20 of assigning names as <collection_id>_bamf_<cancer_location>_<modality>_segmentation should be reverted.

I suggest for analysis results we assign the name as <analysis_results_collection_id> (if there is no name assigned to the individual table), and analysis_results_collection_id column to the columns_metadata table.

The text was updated successfully, but these errors were encountered:

bcli4d · 2025-01-02T19:19:12Z

An issue with your proposal is that, for both liver and lung cancer collections, there are two csv files. E.G. for lung cancer there is a file corresponding to CT segs and a file corresponding to FDG PET/CT segs. Similarly for liver cancer...CT and MR

So we'd have to merge these files , and, in the case of lung cancer, the files have slightly different columns: the CT seg file has a 'SeriesInstanceUID' column. The FDG PET/CT file has PTSeriesInstanceUID and CTSeriesInstanceUID columns.

Also, probably need to add a column to differentiate modalities?

fedorov · 2025-01-02T20:09:03Z

As with any collection, we will need to have custom rules how to assign the suffix of the table. I think those rules should follow the conventions established by the creators of those tables - not by us. This way it will be easier for the users who look at the collection at the page pointed by DOI to reconcile organization of the accompanying file with respect to what we have in BQ.

As another general rule, we do not harmonize and do not modify the tables we ingest, other than making sure PatientID in the images matches dicom_patient_id field in the ingested tables.

I didn't look at the specifics for this collection, but doing this now, from https://zenodo.org/records/13244892, there is a bunch of zip files, and (I assume - I did not look at every zip file) each zip file contains file named qa-results.csv that we need to ingest. I suggest we could use the pattern BAMF_AIMI_Annotations_<zip file prefix>_qa_results, such as BAMF_AIMI_Annotations_lung_fdg_pet_ct_qa_results.

This also reminds me that we should revisit rules for assigning IDs to the analysis results - it is quite confusing that collection_id and analysis_results_collection_id follow different conventions.

bcli4d · 2025-01-02T22:43:00Z

Which specific conventions are not consistent? There are potentially several.

bcli4d · 2025-01-02T23:48:09Z

The original_collections_metadata table has:

collection_name: Collection name as used externally by IDC webapp
collection_id: Collection ID as used internally by IDC webapp
and analysis_results_metadata has:
ID: Results ID
Title: Descriptive title

We could go to:

collection_name: Collection name as used externally by IDC webapp
collection_id: Collection ID as used to identify collections in other BQ tables
collection_title: Descriptive title
and
analysis_name: Analysis name as used externally by IDC webapp
analysis_id: Analysis ID as used to identify analysis results in other BQ tables
analysis_title: Descriptive title

So, for BAMF:
analysis_name: BAMF_AIMI_Annotations_lung_fdg_pet_ct_qa_results
analysis_id: bamf_aimi_annotations_lung_fdg_pet_ct_qa_results
analysis_title: Image segmentations produced by BAMF under the AIMI Annotations initiative

Then, instead of BAMF_AIMI_Annotations_lung_fdg_pet_ct_qa_results, we'd go to bamf_aimi_annotations_lung_fdg_pet_ct_qa_results

bcli4d · 2025-01-29T18:16:56Z

I'm working on restructuring BAMF file names...
The table_metadata clinical BQ table has one or more rows per collection where each row describes a table which contains data of patients in that collection. Should each bamf_aimi_table be associated with each collection that has patients in the bamf_aimi_table? Or, should bamf_aimi_annotations be in the collection_id column?
E.G., currently table_metadata has this partial row:

anti_pd_1_lung	bigquery-public-data.idc_v20_clinical.anti_pd_1_lung_bamf_lung_ct_segmentation	segmentation of Lung and Nodules (3mm-30mm) from CT scans

Should this now be replaced by:

anti_pd_1_lung	bigquery-public-data.idc_v20_clinical.bamf_aimi_annotations_lung_ct_qa_results	segmentation of Lung and Nodules (3mm-30mm) from CT scans

and with a similar row for the other collections, lung_pet_ct_dx, etc., that have patients in bamf_aimi_annotations_lung_ct_qa_results? Or should those rows be deleted, and instead have a row for each bamf_aimi_annotations table:

bamf_aimi_annotations	bigquery-public-data.idc_v20_clinical.bamf_aimi_annotations_lung_ct_qa_results	segmentation of Lung and Nodules (3mm-30mm) from CT scans
bamf_aimi_annotations	bigquery-public-data.idc_v20_clinical.bamf_aimi_annotations_lung_fdg_pet_ct_qa_results	segmentation of Lungs and FDG-avid lesions in the lung from FDG PET/CT scans
...

bcli4d · 2025-01-31T21:04:04Z

Please look at idc-dev-etl.idc_v21_clinical. So, there are now 11 BQ tables named like bamf_ami_annotations_*_qa_results.
I implemented the first approach above. Specifically, the table_metadata BQ table does not include bamf_aimi_annotations in the collection_id column.

bcli4d · 2025-02-01T00:17:34Z

Also, please look at my note above on conventions for original_collections_metadata and analysis_results_metadata

fedorov added the bug Something isn't working label Jan 2, 2025

bcli4d assigned bcli4d and fedorov Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[clinical] Fix naming of the tables associated with analysis results #115

[clinical] Fix naming of the tables associated with analysis results #115

fedorov commented Jan 2, 2025

bcli4d commented Jan 2, 2025

fedorov commented Jan 2, 2025

bcli4d commented Jan 2, 2025 •

edited

Loading

bcli4d commented Jan 2, 2025 •

edited

Loading

bcli4d commented Jan 29, 2025

bcli4d commented Jan 31, 2025

bcli4d commented Feb 1, 2025 •

edited

Loading

[clinical] Fix naming of the tables associated with analysis results #115

[clinical] Fix naming of the tables associated with analysis results #115

Comments

fedorov commented Jan 2, 2025

bcli4d commented Jan 2, 2025

fedorov commented Jan 2, 2025

bcli4d commented Jan 2, 2025 • edited Loading

bcli4d commented Jan 2, 2025 • edited Loading

bcli4d commented Jan 29, 2025

bcli4d commented Jan 31, 2025

bcli4d commented Feb 1, 2025 • edited Loading

bcli4d commented Jan 2, 2025 •

edited

Loading

bcli4d commented Jan 2, 2025 •

edited

Loading

bcli4d commented Feb 1, 2025 •

edited

Loading