Skip to content

Resource identifier mapping

regetz edited this page Feb 5, 2025 · 1 revision

The purpose of this page is to flesh out how we (will) manage unique identifiers for various resources.

Overview

First, at the DB level, all tables have a serial integer primary key following the naming convention <tablename>_id. In the original VegBank, these are implicitly exposed in various URLs and in accession codes, but we neither expect nor want users to think about and use these table-level primary keys.

Second, at the application level, most resources have an accession code stored in an accessionCode field in relevant database tables. See table at the bottom of this page identifying which tables have this field. Note: This field does not have a not-null constraint in the DB tables, and although it should always be populated per application logic, in practice all tables have missing records as of the start of this project (late 2024).

Finally, for external linkages and identification, we could also support annotation of VegBank resources with relevant stable identifiers provided by users and/or minted using external services. These could notably include DOIs for references and records, ORCIDs for people, and RORs for research organizations, along with other custom user-supplied identifiers that link VegBank records to an external system of record (e.g., an agency's internal database).

Requirements (proposed)

  1. For historical fidelity, we need to retain the relationship between existing accession codes and their corresponding resources. More specifically, we need to ensure that VegBank's original citation URLs continue to work in the future, because they may by published and reported in various enduring artifacts.
  2. In order to use accession codes as the primary resource identifier in our API layer, we need to ensure this field exists in all relevant tables, is unique across the entire database, and is populated (not null) for all records.
  3. We should also have a way to store and maintain the relationship between resources and various external IDs (e.g. DOIs, RORs, ORCIDs, and other custom identifiers) that are either supplied by users or assigned by us for external linkage purposes, where such IDs may be optional and/or revokable depending on the use case, and are distinct from our authoritative internal resource identifiers.

Design (WIP)

To meet these requirements, we probably need a new table to manage identifiers and their mapping to various use cases. Here's a simple proposed identifier table that links external identifiers to internal accession codes, with a simple scheme for classifying each identifier by its type, and enforcing uniqueness of each identifier by type. We may want to add other fields to indicate the creator/etc, the date created/modified/etc, the status (active or not), etc, but these are left out for now.

CREATE TABLE identifier (
    identifier_id SERIAL,
    resource_accession_code VARCHAR NOT NULL,  -- Reference to accession_code
    identifier_type VARCHAR NOT NULL,          -- e.g., DOI, ORCID, ROR, LOCAL
    identifier_value VARCHAR NOT NULL,         -- e.g., "10.1234/abcd", "0000-0002-1825-0097"
    PRIMARY KEY (identifier_id)
    UNIQUE (identifier_type, identifier_value) -- Prevent duplicate mappings
);

Appendix: Tables with accessionCode

(This is WIP -- need to complete/verify)

  • commClass
  • commConcept
  • commStatus
  • coverMethod
  • namedPlace
  • observation
  • observationSynonym
  • party
  • plantConcept
  • plantStatus
  • plot
  • project
  • reference
  • referenceParty
  • referenceJournal
  • soilTaxon
  • stratumMethod
  • taxonInterpretation
  • taxonObservation
  • userDefined
  • userDataset
  • userDatasetItem (field itemAccessionCode)
  • userQuery
  • aux_Role
  • graphic
  • note