-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clinical] Fix naming of the tables associated with analysis results #115
Comments
An issue with your proposal is that, for both liver and lung cancer collections, there are two csv files. E.G. for lung cancer there is a file corresponding to CT segs and a file corresponding to FDG PET/CT segs. Similarly for liver cancer...CT and MR So we'd have to merge these files , and, in the case of lung cancer, the files have slightly different columns: the CT seg file has a 'SeriesInstanceUID' column. The FDG PET/CT file has PTSeriesInstanceUID and CTSeriesInstanceUID columns. Also, probably need to add a column to differentiate modalities? |
As with any collection, we will need to have custom rules how to assign the suffix of the table. I think those rules should follow the conventions established by the creators of those tables - not by us. This way it will be easier for the users who look at the collection at the page pointed by DOI to reconcile organization of the accompanying file with respect to what we have in BQ. As another general rule, we do not harmonize and do not modify the tables we ingest, other than making sure I didn't look at the specifics for this collection, but doing this now, from https://zenodo.org/records/13244892, there is a bunch of zip files, and (I assume - I did not look at every zip file) each zip file contains file named This also reminds me that we should revisit rules for assigning IDs to the analysis results - it is quite confusing that |
Which specific conventions are not consistent? There are potentially several. |
The original_collections_metadata table has:
We could go to:
So, for BAMF: Then, instead of BAMF_AIMI_Annotations_lung_fdg_pet_ct_qa_results, we'd go to bamf_aimi_annotations_lung_fdg_pet_ct_qa_results |
I'm working on restructuring BAMF file names...
Should this now be replaced by:
and with a similar row for the other collections, lung_pet_ct_dx, etc., that have patients in bamf_aimi_annotations_lung_ct_qa_results? Or should those rows be deleted, and instead have a row for each bamf_aimi_annotations table:
|
Please look at idc-dev-etl.idc_v21_clinical. So, there are now 11 BQ tables named like bamf_ami_annotations_*_qa_results. |
Also, please look at my note above on conventions for original_collections_metadata and analysis_results_metadata |
See related discussion in https://nciimagingdat-m8a6349.slack.com/archives/C045VQUAZ0S/p1734563139760659.
The convention introduced in v20 of assigning names as
<collection_id>_bamf_<cancer_location>_<modality>_segmentation
should be reverted.I suggest for analysis results we assign the name as
<analysis_results_collection_id>
(if there is no name assigned to the individual table), andanalysis_results_collection_id
column to thecolumns_metadata
table.The text was updated successfully, but these errors were encountered: