-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional EFO xref context from axioms #19
Conversation
@dhimmel For now, I've only added the SPARQL queries:
Could you please review these queries? Let me know if there are any necessary changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work! I leave you mostly with hard questions and decisions (:
) | ||
BIND( REPLACE( STR(?mapping_property_uri), "^http://purl\\.obolibrary\\.org/obo/mondo#(.+)$", "mondo:$1" ) AS ?mapping_property_id ) | ||
BIND( REPLACE( STR(?mapping_property_id), "^http://www\\.w3\\.org/2004/02/skos/core#(.+)$", "skos:$1" ) AS ?mapping_property_id ) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reply to this comment with the head of the output table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
efo_id | xref_id | mapping_property_id | efo_uri | xref_uri | mapping_property_uri |
---|---|---|---|---|---|
MONDO:0000044 | meddra:10060873 | mondo:closeMatch | http://purl.obolibrary.org/obo/MONDO_0000044 | http://identifiers.org/meddra/10060873 | http://purl.obolibrary.org/obo/mondo#closeMatch |
MONDO:0000050 | meddra:10035083 | mondo:closeMatch | http://purl.obolibrary.org/obo/MONDO_0000050 | http://identifiers.org/meddra/10035083 | http://purl.obolibrary.org/obo/mondo#closeMatch |
MONDO:0000088 | meddra:10044701 | mondo:closeMatch | http://purl.obolibrary.org/obo/MONDO_0000088 | http://identifiers.org/meddra/10044701 | http://purl.obolibrary.org/obo/mondo#closeMatch |
MONDO:0000088 | meddra:10058084 | mondo:closeMatch | http://purl.obolibrary.org/obo/MONDO_0000088 | http://identifiers.org/meddra/10058084 | http://purl.obolibrary.org/obo/mondo#closeMatch |
MONDO:0000127 | meddra:10063361 | mondo:closeMatch | http://purl.obolibrary.org/obo/MONDO_0000127 | http://identifiers.org/meddra/10063361 | http://purl.obolibrary.org/obo/mondo#closeMatch |
BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id ) | ||
} | ||
|
||
GROUP BY ?efo_id ?xref ?axiom_source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you reply to this comment with the head of the output table?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
efo_id | xref | axiom_source |
---|---|---|
CHEBI:100241 | Beilstein:3568352 | Beilstein |
CHEBI:100241 | CAS:85721-33-1 | ChemIDplus |
CHEBI:100241 | CAS:85721-33-1 | KEGG COMPOUND |
CHEBI:100241 | Drug_Central:659 | DrugCentral |
CHEBI:100241 | PMID:10397494 | ChEMBL |
BIND( REPLACE( STR(?source), "^http.+/([^:]+)_(.+)$", "$1:$2" ) AS ?efo_id ) | ||
} | ||
|
||
GROUP BY ?efo_id ?xref ?axiom_source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be nice to add ORDER BY here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#> | ||
|
||
SELECT ?efo_id ?xref ?axiom_source | ||
WHERE { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there are cases of xrefs that don't have any axioms, i.e. things matched by
nxontology-data/nxontology_data/efo/queries/xrefs.rq
Lines 28 to 29 in c7b1429
?source_efo_uri rdf:type owl:Class. | |
?source_efo_uri oboInOwl:hasDbXref ?xref_raw. |
We could do this match first and then make the axiom match OPTIONAL. Or we could decide this query is only for getting xref sources and we don't care about anything without a source.
Questions:
- Do all
oboInOwl:hasDbXref
triples have corresponding axioms? - Do all axioms with
owl:annotatedProperty oboInOwl:hasDbXref
have correspondingoboInOwl:hasDbXref
triples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if there are cases of xrefs that don't have any axioms, i.e. things matched by
There are cases like that, for example MONDO:0004947
in EFO:0000094
and ICD10:O35
in EFO:0009682
don't have axioms.
We could do this match first and then make the axiom match OPTIONAL. Or we could decide this query is only for getting xref sources and we don't care about anything without a source.
I think that this query should be only for getting xref sources
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename to xref_sources.rq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok
IF( STRSTARTS( ?xref_id_dirty, "http://identifiers.org" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)/(.+)$", "$1:$2" ), ?error ), | ||
IF( STRSTARTS( ?xref_id_dirty, "http://linkedlifedata.com/resource/umls/id" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "UMLS:$1" ), ?error ), | ||
IF( STRSTARTS( ?xref_id_dirty, "http://purl.bioontology.org/ontology/ICD10CM" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "ICD10CM:$1" ), ?error ), | ||
IF( STRSTARTS( ?xref_id_dirty, "https://icd.who.int/browse10/2019/en#" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "ICD10:$1" ), ?error ), | ||
IF( STRSTARTS( ?xref_id_dirty, "https://omim.org/entry" ), REPLACE( ?xref_id_dirty, "^http.*/(.+)$", "OMIM:$1" ), ?error ), | ||
IF( STRSTARTS( ?xref_id_dirty, "https://omim.org/phenotypicSeries" ), REPLACE( ?xref_id_dirty, "^http.*/PS(.+)$", "OMIMPS:$1" ), ?error ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these special cases are a bit annoying to maintain, but great work figuring them out. Was it just an iterative process of figuring out which URIs are not handled?
One option would be to save the URI to CURIE conversion for post-processing in python with bioregistry.curie_from_iri
or curies
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it just an iterative process of figuring out which URIs are not handled?
Yes, this was an iterative process
One option would be to save the URI to CURIE conversion for post-processing in python with
bioregistry.curie_from_iri
orcuries
.
Thanks for the tip, this would be better if we can do it like that.
I used curies.get_bioregistry_converter
to get the converter for URIs. However, there are some differences how curies
maps URIs compared to how I mapped them. Here are some examples:
orphanet:99022
vsobo:orphanet_99022
URI:http://purl.obolibrary.org/obo/Orphanet_99022
. Should we replaceobo:orphanet_
withorphanet:
?orphanet:98813
vsorphanet.ordo:98813
URI:http://www.orpha.net/ORDO/Orphanet_98813
I guess we can replaceorphanet.ordo
withorphanet
akin tonxontology-data/nxontology_data/utils.py
Lines 84 to 88 in c7b1429
if collapse_orphanet and prefix.lower() == "orphanet.ordo": # In EFO, all orphanet.ordo terms existed in orphanet. # The consistency of using a single prefix will help with mapping. # https://github.com/biopragmatics/bioregistry/issues/187#issuecomment-1706308305 prefix = "Orphanet" omimps:203655
vsomim.ps:203655
URI:https://omim.org/phenotypicSeries/PS203655
. Should we replaceomim.ps
withomimps
?
There is also a missing uri_prefix http://purl.bioontology.org/ontology/ICD10CM/
for ICD10CM
. The add_prefix method for adding prefixes lacks merge
option in the version we use. Would it be safe to update the curies
version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Should we replace
obo:orphanet_
withorphanet:
?
Yes. It's nice that curies handles that.
For orphanet, we can replace orphanet.ordo
after normalization.
omim.ps
is the correct normalized prefix.
Ideal is you call normalize_parsed_curie
on the output of curies.get_bioregistry_converter
so we get consistently formatted CURIES everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once the queries are done, what would be the next steps? How this data should be represented in the output?
We can save the tables as output and/or we can include in the nxontology. We'll have to decide how we want to represent this information as node data in networkx. Doing so is tricky because we have to decide to what extent users will want access to rawer forms versus a more consolidated but opinionated format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dhimmel How about saving the table in the output for now, so that users can access this data? We could create a follow up issue to discuss how we can represent this information as node data in networkx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about saving the table in the output for now
Sounds good.
@dhimmel I've added saving the xref sources and mapping properties tables as output. I've also removed the mapping from URI to CURIE in |
@dhimmel I've added the curie normalization using |
Okay merged and exporting EFO in https://github.com/related-sciences/nxontology-data/actions/runs/6424757602! Nice work navigating this @bfoltyn |
#18