API Design ‐ Upload workflows

We want to be able to support a workflow for users to bulk upload new records, where the uploaded data refers to multiple related VegBank entities. In the general case, some of these entities may already exist in the database, and some may not. For example, imagine a user wants to upload a batch of Plots, each of which has an associated Plot Observation that belongs to a specific Project. From the database perspective, plot observations cannot be inserted until the associated plots and projects have been inserted.

Open questions:

Examples below all rely on accession codes as the API resource identifiers. Are we prepared to do this? Will this break down for any entities that we need to support, either because relevant backend DB table does not have an accession_code field, or because the field contains nulls in the existing database?
Examples below also rely on each entity type having some user-provided, non-null name/label/identifier (i.e., a string field) that is assumed to be unique within the payload, although not necessarily within the database. These are used in API responses to provide the user with meaningful record-specific information such as actions taken, validation status, and errors, etc. Are we okay with this?
In API responses that itemize created entities, should we simply return each entity's accession_code string, or return an object that contains an accession code key-value pair, potentially with other information (e.g., accession codes of related entities)?

Option 1: Sequential Upload with References

Summary: User first uploads Plots and Projects (i.e., the independent entities, in some order), then uploads Plot Observations (i.e., the dependent entities). In the simplest form of this implementation, it is always assumed that uploaded records are new records to be inserted. In other words, the system is not responsible for determining whether an entity is already present in the database.

Step 1: Upload Plots

User uploads all Plots first. Response indicates the created Plots with their accession codes.

Request

POST /plots/bulk
Content-Type: application/json

{
  "plots": [
    {
      "author_plot_code": "Plot.A",
      "real_latitude": 34.4146,
      "real_longitude": -119.6908,
      "latitude": 34.4,
      "longitude": -119.7,
      "confidentiality_status": "10 km"
    },
    {
      "author_plot_code": "Plot.B",
      "real_latitude": 34.76277,
      "real_longitude": -120.04167,
      "latitude": 34.76277,
      "longitude": -120.04167,
      "confidentiality_status": "exact"
    }
  ]
}

Response

{
  "status": "success",
  "created_plots": {
    "Plot.A": "vb.plot.123",
    "Plot.B": "vb.plot.124"
  }
}

Step 2: Upload Projects

User uploads new Projects. Response indicates the created Projects with their accession codes.

Request

POST /projects/bulk
Content-Type: application/json

{
  "projects": [
    {
      "project_name": "Project.1",
      "project_description": "Description of project",
      "start_date": "2024-10-01"
    }
  ]
}

Response

{
  "status": "success",
  "created_projects": {
    "Project.1": "vb.project.456"
  }
}

Step 3: Upload Plot Observations

User uploads Plot Observations referencing existing Plot and Project accession codes. In the example below, both of the Plot accession codes, and one of the Project accession codes, were those returned in the responses above; the other Project accession code is one that already exists in the database.

Request

POST /plot_observations/bulk
Content-Type: application/json

{
  "plot_observations": [
    {
      "author_obs_code": "Plot.A.Obs.1",
      "plot_accession_code": "vb.plot.123",
      "project_accession_code": "vb.project.456",
      "observation_start_date": "2024-10-03",
      "topographic_position": "High level"
    },
    {
      "author_obs_code": "Plot.B.Obs.1",
      "plot_accession_code": "vb.plot.124",
      "project_accession_code": "vb.project.444",
      "stratum_method_accession_code": "vb.sm.333",
      "observation_start_date": "2024-11-27",
      "topographic_position": "High slope"
    }
  ]
}

Response

{
  "status": "success",
  "created_plot_observations": [
    {
      "author_obs_code": "Plot.A.Obs.1",
      "accession_code": "vb.plot.obs.789",
      "plot_accession_code": "vb.plot.123",
      "project_accession_code": "vb.project.456"
    },
    {
      "author_obs_code": "Plot.B.Obs.1",
      "accession_code": "vb.plot.obs.790",
      "plot_accession_code": "vb.plot.124",
      "project_accession_code": "vb.project.444"
      "stratum_method_accession_code": "vb.sm.333"
    }
  ]
}

Pros:

Simple validation logic at each step.
Clear separation of dependencies.

Cons:

Requires multiple steps from the user.
Users must look up the accession codes of referenced entities in advance, and/or extract accession codes from newly created upstream entities, for reference in subsequent uploads.

Option 2: Inline Upload with Temporary Identifiers

Enables one-step uploads while handling new and existing records dynamically. May be best if simplicity is key and user expertise varies?

Users have full control and visibility over which records are intended as "new".
System logic is simpler because it doesn’t need to infer whether a record is new -- it just processes the placeholders.
Less risk of creating duplicate or incorrect records due to user error (e.g., typos in identifiers).

User's Responsibility:

The user explicitly marks new records by using accession code placeholders (e.g., new.vb.plot.1 or new.vb.project.1) in their upload data.

System's Responsibility:

The system interprets these placeholders as instructions to create new records. These are then mapped to real accession codes for use in the subsequent processing.

Workflow:

User uploads all data in a single file (e.g., for Plot Observations), including temporary identifiers for new Plots and Projects.
The system processes the upload in stages:
- Create new Plots and Projects using temporary accession codes.
- Map temporary accession codes to real database accession codes.
- Validate and create the Plot Observations using the mapped accession codes.

Request

POST /plot_observations/bulk
Content-Type: application/json

{
  "plot_observations": [
    {
      "author_obs_code": "Plot.A.Obs.1",
      "plot": {
        "accession_code": "new.vb.plot.123",  // new plot
        "author_plot_code": "Plot.A",
        "real_latitude": 34.4146,
        "real_longitude": -119.6908,
        "latitude": 34.4,
        "longitude": -119.7,
        "confidentiality_status": "10 km"
      },
      "project": {
        "accession_code": "new.vb.project.456"  // new project
        "project_name": "Project.1",
        "project_description": "Description of project",
        "start_date": "2024-10-01"
      },
      "observation_start_date": "2024-10-03",
      "topographic_position": "High level"
    },
    {
      "author_obs_code": "Plot.B.Obs.1",
      "plot": {
        "accession_code": "vb.plot.111"  // existing plot
      },
      "project": {
        "accession_code": "new.vb.project.457"  // new project
        "project_name": "Project.2",
        "project_description": "Description of project",
        "start_date": "2024-11-27"
      },
      "stratum_method": {
        "accession_code": "vb.sm.333",  // existing stratum method
      },
      "observation_start_date": "2024-11-27",
      "topographic_position": "High slope"
    }
  ]
}

System Behavior: If plot_accession_code or project_accession_code starts with new., treat it as a new entry. Auto-create these records and return the final mappings in the response.

Response

{
  "status": "success",
  "new_records": {
    "plots": { "Plot.A": "vb.plot.123" },
    "projects": { "Project.1": "vb.project.456", "Project.2": "vb.project.457" },
    "plot_observations": { "Plot.A.Obs.1": "vb.plot.obs.789", "Plot.B.Obs.1": "vb.plot.obs.790" }
  }
}

Pros:

Streamlines the workflow to a single step for the user.
Minimizes user burden in tracking database IDs.

Cons:

More complex server-side processing and validation logic.
Requires careful error handling to ensure partial failures don’t create inconsistent states.

Option 3: Two-phase Validate & Commit Workflow

Gives users detailed control and review around handling existing vs new references. May be best if validation and feedback are critical?

Validation phase

User uploads a file to the validation endpoint (e.g., /plot_observations/validate) including references to existing and/or new Plots and Projects.
System validates the file and returns a report that:
- Gives visibility into how new data interacts with the existing system.
- Lists any unresolved references (e.g., plot_accession_code or project_accession_code not found) and potential issues before committing.
- Identifies data that will be auto-created (e.g., new Plots, Projects).

Confirmation phase:

User confirms the upload via a commit endpoint (e.g., /plot_observations/commit).
- Allows users to finalize uploads with new record details explicitly defined.
- Reduces chance of unintended data creation or duplication.

Step 1: Validation request

The user submits a data file to a /plot_observations/validate endpoint. This file contains references to both existing and potentially new records.

Request (Validation)

POST /plot_observations/validate
Content-Type: application/json

{
  "plot_observations": [
    {
      "author_obs_code": "Plot.A.Obs.1",
      "plot_accession_code": "vb.plot.123",  // new plot
      "project_accession_code": "vb.project.456",  // new project
      "observation_start_date": "2024-10-03",
      "topographic_position": "High level"
    },
    {
      "author_obs_code": "Plot.B.Obs.1",
      "plot_accession_code": "vb.plot.111",  // existing
      "project_accession_code": "vb.project.457",  // new project
      "stratum_method_accession_code": "vb.sm.333",  // existing stratum method
      "observation_start_date": "2024-11-27",
      "topographic_position": "High slope"
    }
  ]
}

Step 2: Validation Response

The system validates the data and returns a structured JSON response, showing:

Resolved references.
Unresolved references requiring user action.

Response (Validation)

{
  "status": "validation_complete",
  "results": {
    "unresolved_references": {
      "plots": [
        {
          "plot_accession_code": "vb.plot.123",
          "details": "No matching plot found in the database."
        }
      ],
      "projects": [
        {
          "project_accession_code": "vb.project.456",
          "details": "No matching project found in the database."
        },
        {
          "project_accession_code": "vb.project.457",
          "details": "No matching project found in the database."
        }
      ]
    },
    "resolved_references": {
      "plots": {
        "vb.plot.123": "Plot.A"
      },
      "stratum_methods": {
        "vb.sm.333": "Carolina Vegetation Survey"
      }
    }
  },
  "suggested_actions": {
    "create_new": {
      "plots": [
        {
          "temp_accession_code": "vb.plot.123",
          "required_fields": ["author_plot_code", "real_latitude", "real_longitude", "latitude", "longitude", "confidentiality_status"]
        }
      ],
      "projects": [
        {
          "temp_accession_code": "vb.project.456",
          "required_fields": ["project_name", "project_description", "start_date"]
        },
        {
          "temp_accession_code": "vb.project.457",
          "required_fields": ["project_name", "project_description", "start_date"]
        }
      ]
    }
  }
}

Step 3: Confirmation Request

User reviews the validation results and confirms the creation of missing Plots and Projects. The user updates the missing details for new records and submits the full dataset to the /plot_observations/commit endpoint.

Request (Confirmation)

POST /plot_observations/commit
Content-Type: application/json

{
  "new_records": {
    "plots": [
      {
        "temp_accession_code": "tmp.vb.plot.123",
        "author_plot_code": "Plot.A",
        "real_latitude": 34.4146,
        "real_longitude": -119.6908,
        "latitude": 34.4,
        "longitude": -119.7,
        "confidentiality_status": "10 km"
      }
    ],
    "projects": [
      {
        "temp_accession_code": "tmp.vb.project.456",
        "project_name": "Project.1",
        "project_description": "Description of project",
        "start_date": "2024-10-01"
      },
      {
        "temp_accession_code": "tmp.vb.project.457",
        "project_name": "Project.2",
        "project_description": "Description of project",
        "start_date": "2024-11-01"
      }
    ]
  },
  "plot_observations": [
    {
      "author_obs_code": "Plot.A.Obs.1",
      "plot_accession_code": "tmp.vb.plot.123",  // plot to be created
      "project_accession_code": "tmp.vb.project.456", # // project to be created
      "observation_start_date": "2024-10-03",
      "topographic_position": "High level"
    },
    {
      "author_obs_code": "Plot.B.Obs.1",
      "plot_accession_code": "vb.plot.111",
      "project_accession_code": "tmp.vb.project.457", # // project to be created
      "stratum_method_accession_code": "vb.sm.333",
      "observation_start_date": "2024-11-27",
      "topographic_position": "High slope"
    }
  ]
}

Step 4: Confirmation Response

The system processes the data, creates new records where necessary, and finalizes the upload.

Response (Confirmation)

{
  "status": "commit_success",
  "new_records": {
    "plots": {
      "Plot.A": "tmp.vb.plot.123"
    },
    "projects": {
      "Project.1": "tmp.vb.project.456",
      "Project.2": "tmp.vb.project.457"
    },
    "plot_observations": {
      "Plot.A.Obs.1": "vb.plot.obs.789",
      "Plot.B.Obs.1": "vb.plot.obs.790"
    }
  }
}

Pros:

Provides users with clear feedback before committing data.
Supports hybrid scenarios where both existing and new records are referenced.

Cons:

Two-step workflow can add friction for users.

Option 4: Dynamic Inference and Deduplication

May be best if API efficiency and automation are paramount?

Workflow:

Users upload all data in a single step.
The system dynamically infers whether references (e.g., plot_accession_code, project_accession_code) exist:

Matches references to existing records.
Creates new records for unmatched references.

Server-Side Logic:

Query the database for plot.accession_code and project.accession_code values in the upload file.
Deduplicate references to avoid creating duplicates.
Proceed to create new entries and associations.

Step 1: Bulk Upload Request

The user sends a bulk upload request. The request includes:

Classifications referencing both existing and potentially new plots and projects.
Inline details for new plots and projects to be created if they don’t already exist.

POST /plot_observations/bulk
Content-Type: application/json

{
  "plot_observations": [
    {
      "author_obs_code": "Plot.A.Obs.1",
      "plot": {
        "accession_code": "tmp.vb.plot.123",  // New plot
        "author_plot_code": "Plot.A",
        "real_latitude": 34.4146,
        "real_longitude": -119.6908,
        "latitude": 34.4,
        "longitude": -119.7,
        "confidentiality_status": "10 km"
      },
      "project": {
        "accession_code": "tmp.vb.project.456"  // New project
        "project_name": "Project.1",
        "project_description": "Description of project",
        "start_date": "2024-10-01"
      },
      "observation_start_date": "2024-10-03",
      "topographic_position": "High level"
    },
    {
      "author_obs_code": "Plot.B.Obs.1",
      "plot": {
        "accession_code": "vb.plot.111"  // Existing plot
      },
      "project": {
        "accession_code": "tmp.vb.project.457"  // New project
        "project_name": "Project.2",
        "project_description": "Description of project",
        "start_date": "2024-11-27"
      },
      "stratum_method": {
        "accession_code": "vb.sm.333"
      },
      "observation_start_date": "2024-11-27",
      "topographic_position": "High slope"
    }
  ]
}

Step 2: Bulk Upload Response

The system processes the request, validates existing records, creates new records as needed, and links everything correctly.

{
  "status": "success",
  "results": {
    "resolved_references": {
      "plots": {
        "vb.plot.111": "Existing Plot Name"
      },
      "projects": {},
      "stratum_methods": {
        "vb.sm.333": "Carolina Vegetation Survey"
      }
    },
    "new_records": {
      "plots": [
        {
          "author_plot_code": "Plot.A",
          "tmp_accession_code": "tmp.vb.plot.123",
          "accession_code": "vb.plot.123"
        }
      ],
      "projects": [
        {
          "project_name": "Project.1",
          "tmp_accession_code": "tmp.vb.project.456",
          "accession_code": "vb.project.456"
        },
        {
          "project_name": "Project.2",
          "tmp_accession_code": "tmp.vb.project.457",
          "accession_code": "vb.project.457"
        }
      ]
      "plot_observations": [
        {
          "author_obs_code": "Plot.A.Obs.1",
          "accession_code": "vb.plot.obs.789",
          "plot_accession_code": "vb.plot.123",
          "project_accession_code": "vb.project.456"
        },
        {
          "author_obs_code": "Plot.B.Obs.1",
          "accession_code": "vb.plot.obs.790",
          "plot_accession_code": "vb.plot.111",
          "project_accession_code": "vb.project.457"
          "stratum_method_accession_code": "vb.sm.333"
        }
      ]
    },
  }
}

Note: If the upload contains errors (e.g., missing required fields or invalid references), the system returns an error response with actionable feedback.

Here's another example, this time with errors:

Request (with missing fields)

{
  "plot_observations": [
    {
      "author_obs_code": "Plot.A.Obs.1",
      "plot": {
        "accession_code": "tmp.vb.plot.123",  // New plot
        "author_plot_code": "Plot.A",
        "real_latitude": 34.4146,
        "real_longitude": -119.6908,
        "latitude": 34.4,
        "longitude": -119.7,
        // missing confidentiality_status
      },
      "project": {
        "accession_code": "tmp.vb.project.456"  // New project
        "project_name": "Project.1",
        "project_description": "Description of project",
        "start_date": "2024-10-01"
      },
      "observation_start_date": "2024-10-03",
      "topographic_position": "High level"
    },
  ]
}

Error Response

{
  "status": "error",
  "errors": [
    {
      "record_type": "plot",
      "tmp_accession_code": "tmp.vb.plot.123",
      "field": "confidentiality_status",
      "message": "Confidentiality status is required for new plots."
    }
  ],
  "partial_results": {
    "resolved_references": {
      "plots": {
      },
      "projects": {
      },
    }
  }
}

Pros:

Fully automated workflow.
Simplifies user experience.

Cons:

Risk of unintended creation of duplicates if reference fields aren’t cleanly validated.
Relies heavily on high-performance database queries to avoid delays.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Design ‐ Upload workflows

Option 1: Sequential Upload with References

Step 1: Upload Plots

Step 2: Upload Projects

Step 3: Upload Plot Observations

Option 2: Inline Upload with Temporary Identifiers

Option 3: Two-phase Validate & Commit Workflow

Step 1: Validation request

Step 2: Validation Response

Step 3: Confirmation Request

Step 4: Confirmation Response

Option 4: Dynamic Inference and Deduplication

Step 1: Bulk Upload Request

Step 2: Bulk Upload Response

Clone this wiki locally