-
Notifications
You must be signed in to change notification settings - Fork 0
API Design ‐ Upload workflows
We want to be able to support a workflow for users to bulk upload new records, where the uploaded data refers to multiple related VegBank entities. In the general case, some of these entities may already exist in the database, and some may not. For example, imagine a user wants to upload a batch of Plots, each of which has an associated Plot Observation that belongs to a specific Project. From the database perspective, plot observations cannot be inserted until the associated plots and projects have been inserted.
Open questions:
- Examples below all rely on accession codes as the API resource identifiers. Are we prepared to do this? Will this break down for any entities that we need to support, either because relevant backend DB table does not have an
accession_code
field, or because the field contains nulls in the existing database? - Examples below also rely on each entity type having some user-provided, non-null name/label/identifier (i.e., a string field) that is assumed to be unique within the payload, although not necessarily within the database. These are used in API responses to provide the user with meaningful record-specific information such as actions taken, validation status, and errors, etc. Are we okay with this?
- In API responses that itemize created entities, should we simply return each entity's
accession_code
string, or return an object that contains an accession code key-value pair, potentially with other information (e.g., accession codes of related entities)?
Summary: User first uploads Plots and Projects (i.e., the independent entities, in some order), then uploads Plot Observations (i.e., the dependent entities). In the simplest form of this implementation, it is always assumed that uploaded records are new records to be inserted. In other words, the system is not responsible for determining whether an entity is already present in the database.
User uploads all Plots first. Response indicates the created Plots with their accession codes.
Request
POST /plots/bulk
Content-Type: application/json
{
"plots": [
{
"author_plot_code": "Plot.A",
"real_latitude": 34.4146,
"real_longitude": -119.6908,
"latitude": 34.4,
"longitude": -119.7,
"confidentiality_status": "10 km"
},
{
"author_plot_code": "Plot.B",
"real_latitude": 34.76277,
"real_longitude": -120.04167,
"latitude": 34.76277,
"longitude": -120.04167,
"confidentiality_status": "exact"
}
]
}
Response
{
"status": "success",
"created_plots": {
"Plot.A": "vb.plot.123",
"Plot.B": "vb.plot.124"
}
}
User uploads new Projects. Response indicates the created Projects with their accession codes.
Request
POST /projects/bulk
Content-Type: application/json
{
"projects": [
{
"project_name": "Project.1",
"project_description": "Description of project",
"start_date": "2024-10-01"
}
]
}
Response
{
"status": "success",
"created_projects": {
"Project.1": "vb.project.456"
}
}
User uploads Plot Observations referencing existing Plot and Project accession codes. In the example below, both of the Plot accession codes, and one of the Project accession codes, were those returned in the responses above; the other Project accession code is one that already exists in the database.
Request
POST /plot_observations/bulk
Content-Type: application/json
{
"plot_observations": [
{
"author_obs_code": "Plot.A.Obs.1",
"plot_accession_code": "vb.plot.123",
"project_accession_code": "vb.project.456",
"observation_start_date": "2024-10-03",
"topographic_position": "High level"
},
{
"author_obs_code": "Plot.B.Obs.1",
"plot_accession_code": "vb.plot.124",
"project_accession_code": "vb.project.444",
"stratum_method_accession_code": "vb.sm.333",
"observation_start_date": "2024-11-27",
"topographic_position": "High slope"
}
]
}
Response
{
"status": "success",
"created_plot_observations": [
{
"author_obs_code": "Plot.A.Obs.1",
"accession_code": "vb.plot.obs.789",
"plot_accession_code": "vb.plot.123",
"project_accession_code": "vb.project.456"
},
{
"author_obs_code": "Plot.B.Obs.1",
"accession_code": "vb.plot.obs.790",
"plot_accession_code": "vb.plot.124",
"project_accession_code": "vb.project.444"
"stratum_method_accession_code": "vb.sm.333"
}
]
}
Pros:
- Simple validation logic at each step.
- Clear separation of dependencies.
Cons:
- Requires multiple steps from the user.
- Users must look up the accession codes of referenced entities in advance, and/or extract accession codes from newly created upstream entities, for reference in subsequent uploads.
Enables one-step uploads while handling new and existing records dynamically. May be best if simplicity is key and user expertise varies?
- Users have full control and visibility over which records are intended as "new".
- System logic is simpler because it doesn’t need to infer whether a record is new -- it just processes the placeholders.
- Less risk of creating duplicate or incorrect records due to user error (e.g., typos in identifiers).
User's Responsibility:
- The user explicitly marks new records by using accession code placeholders (e.g.,
new.vb.plot.1
ornew.vb.project.1
) in their upload data.
System's Responsibility:
- The system interprets these placeholders as instructions to create new records. These are then mapped to real accession codes for use in the subsequent processing.
Workflow:
- User uploads all data in a single file (e.g., for Plot Observations), including temporary identifiers for new Plots and Projects.
- The system processes the upload in stages:
- Create new Plots and Projects using temporary accession codes.
- Map temporary accession codes to real database accession codes.
- Validate and create the Plot Observations using the mapped accession codes.
Request
POST /plot_observations/bulk
Content-Type: application/json
{
"plot_observations": [
{
"author_obs_code": "Plot.A.Obs.1",
"plot": {
"accession_code": "new.vb.plot.123", // new plot
"author_plot_code": "Plot.A",
"real_latitude": 34.4146,
"real_longitude": -119.6908,
"latitude": 34.4,
"longitude": -119.7,
"confidentiality_status": "10 km"
},
"project": {
"accession_code": "new.vb.project.456" // new project
"project_name": "Project.1",
"project_description": "Description of project",
"start_date": "2024-10-01"
},
"observation_start_date": "2024-10-03",
"topographic_position": "High level"
},
{
"author_obs_code": "Plot.B.Obs.1",
"plot": {
"accession_code": "vb.plot.111" // existing plot
},
"project": {
"accession_code": "new.vb.project.457" // new project
"project_name": "Project.2",
"project_description": "Description of project",
"start_date": "2024-11-27"
},
"stratum_method": {
"accession_code": "vb.sm.333", // existing stratum method
},
"observation_start_date": "2024-11-27",
"topographic_position": "High slope"
}
]
}
System Behavior: If plot_accession_code
or project_accession_code
starts with new.
, treat it as a new entry. Auto-create these records and return the final mappings in the response.
Response
{
"status": "success",
"new_records": {
"plots": { "Plot.A": "vb.plot.123" },
"projects": { "Project.1": "vb.project.456", "Project.2": "vb.project.457" },
"plot_observations": { "Plot.A.Obs.1": "vb.plot.obs.789", "Plot.B.Obs.1": "vb.plot.obs.790" }
}
}
Pros:
- Streamlines the workflow to a single step for the user.
- Minimizes user burden in tracking database IDs.
Cons:
- More complex server-side processing and validation logic.
- Requires careful error handling to ensure partial failures don’t create inconsistent states.
Gives users detailed control and review around handling existing vs new references. May be best if validation and feedback are critical?
Validation phase
- User uploads a file to the validation endpoint (e.g.,
/plot_observations/validate
) including references to existing and/or new Plots and Projects. - System validates the file and returns a report that:
- Gives visibility into how new data interacts with the existing system.
- Lists any unresolved references (e.g., plot_accession_code or project_accession_code not found) and potential issues before committing.
- Identifies data that will be auto-created (e.g., new Plots, Projects).
Confirmation phase:
- User confirms the upload via a
commit
endpoint (e.g., /plot_observations/commit).- Allows users to finalize uploads with new record details explicitly defined.
- Reduces chance of unintended data creation or duplication.
The user submits a data file to a /plot_observations/validate
endpoint. This file contains references to both existing and potentially new records.
Request (Validation)
POST /plot_observations/validate
Content-Type: application/json
{
"plot_observations": [
{
"author_obs_code": "Plot.A.Obs.1",
"plot_accession_code": "vb.plot.123", // new plot
"project_accession_code": "vb.project.456", // new project
"observation_start_date": "2024-10-03",
"topographic_position": "High level"
},
{
"author_obs_code": "Plot.B.Obs.1",
"plot_accession_code": "vb.plot.111", // existing
"project_accession_code": "vb.project.457", // new project
"stratum_method_accession_code": "vb.sm.333", // existing stratum method
"observation_start_date": "2024-11-27",
"topographic_position": "High slope"
}
]
}
The system validates the data and returns a structured JSON response, showing:
- Resolved references.
- Unresolved references requiring user action.
Response (Validation)
{
"status": "validation_complete",
"results": {
"unresolved_references": {
"plots": [
{
"plot_accession_code": "vb.plot.123",
"details": "No matching plot found in the database."
}
],
"projects": [
{
"project_accession_code": "vb.project.456",
"details": "No matching project found in the database."
},
{
"project_accession_code": "vb.project.457",
"details": "No matching project found in the database."
}
]
},
"resolved_references": {
"plots": {
"vb.plot.123": "Plot.A"
},
"stratum_methods": {
"vb.sm.333": "Carolina Vegetation Survey"
}
}
},
"suggested_actions": {
"create_new": {
"plots": [
{
"temp_accession_code": "vb.plot.123",
"required_fields": ["author_plot_code", "real_latitude", "real_longitude", "latitude", "longitude", "confidentiality_status"]
}
],
"projects": [
{
"temp_accession_code": "vb.project.456",
"required_fields": ["project_name", "project_description", "start_date"]
},
{
"temp_accession_code": "vb.project.457",
"required_fields": ["project_name", "project_description", "start_date"]
}
]
}
}
}
User reviews the validation results and confirms the creation of missing Plots and Projects. The user updates the missing details for new records and submits the full dataset to the /plot_observations/commit
endpoint.
Request (Confirmation)
POST /plot_observations/commit
Content-Type: application/json
{
"new_records": {
"plots": [
{
"temp_accession_code": "tmp.vb.plot.123",
"author_plot_code": "Plot.A",
"real_latitude": 34.4146,
"real_longitude": -119.6908,
"latitude": 34.4,
"longitude": -119.7,
"confidentiality_status": "10 km"
}
],
"projects": [
{
"temp_accession_code": "tmp.vb.project.456",
"project_name": "Project.1",
"project_description": "Description of project",
"start_date": "2024-10-01"
},
{
"temp_accession_code": "tmp.vb.project.457",
"project_name": "Project.2",
"project_description": "Description of project",
"start_date": "2024-11-01"
}
]
},
"plot_observations": [
{
"author_obs_code": "Plot.A.Obs.1",
"plot_accession_code": "tmp.vb.plot.123", // plot to be created
"project_accession_code": "tmp.vb.project.456", # // project to be created
"observation_start_date": "2024-10-03",
"topographic_position": "High level"
},
{
"author_obs_code": "Plot.B.Obs.1",
"plot_accession_code": "vb.plot.111",
"project_accession_code": "tmp.vb.project.457", # // project to be created
"stratum_method_accession_code": "vb.sm.333",
"observation_start_date": "2024-11-27",
"topographic_position": "High slope"
}
]
}
The system processes the data, creates new records where necessary, and finalizes the upload.
Response (Confirmation)
{
"status": "commit_success",
"new_records": {
"plots": {
"Plot.A": "tmp.vb.plot.123"
},
"projects": {
"Project.1": "tmp.vb.project.456",
"Project.2": "tmp.vb.project.457"
},
"plot_observations": {
"Plot.A.Obs.1": "vb.plot.obs.789",
"Plot.B.Obs.1": "vb.plot.obs.790"
}
}
}
Pros:
- Provides users with clear feedback before committing data.
- Supports hybrid scenarios where both existing and new records are referenced.
Cons:
- Two-step workflow can add friction for users.
May be best if API efficiency and automation are paramount?
Workflow:
- Users upload all data in a single step.
- The system dynamically infers whether references (e.g.,
plot_accession_code
,project_accession_code
) exist:
- Matches references to existing records.
- Creates new records for unmatched references.
Server-Side Logic:
- Query the database for
plot.accession_code
andproject.accession_code
values in the upload file. - Deduplicate references to avoid creating duplicates.
- Proceed to create new entries and associations.
The user sends a bulk upload request. The request includes:
- Classifications referencing both existing and potentially new plots and projects.
- Inline details for new plots and projects to be created if they don’t already exist.
POST /plot_observations/bulk
Content-Type: application/json
{
"plot_observations": [
{
"author_obs_code": "Plot.A.Obs.1",
"plot": {
"accession_code": "tmp.vb.plot.123", // New plot
"author_plot_code": "Plot.A",
"real_latitude": 34.4146,
"real_longitude": -119.6908,
"latitude": 34.4,
"longitude": -119.7,
"confidentiality_status": "10 km"
},
"project": {
"accession_code": "tmp.vb.project.456" // New project
"project_name": "Project.1",
"project_description": "Description of project",
"start_date": "2024-10-01"
},
"observation_start_date": "2024-10-03",
"topographic_position": "High level"
},
{
"author_obs_code": "Plot.B.Obs.1",
"plot": {
"accession_code": "vb.plot.111" // Existing plot
},
"project": {
"accession_code": "tmp.vb.project.457" // New project
"project_name": "Project.2",
"project_description": "Description of project",
"start_date": "2024-11-27"
},
"stratum_method": {
"accession_code": "vb.sm.333"
},
"observation_start_date": "2024-11-27",
"topographic_position": "High slope"
}
]
}
The system processes the request, validates existing records, creates new records as needed, and links everything correctly.
{
"status": "success",
"results": {
"resolved_references": {
"plots": {
"vb.plot.111": "Existing Plot Name"
},
"projects": {},
"stratum_methods": {
"vb.sm.333": "Carolina Vegetation Survey"
}
},
"new_records": {
"plots": [
{
"author_plot_code": "Plot.A",
"tmp_accession_code": "tmp.vb.plot.123",
"accession_code": "vb.plot.123"
}
],
"projects": [
{
"project_name": "Project.1",
"tmp_accession_code": "tmp.vb.project.456",
"accession_code": "vb.project.456"
},
{
"project_name": "Project.2",
"tmp_accession_code": "tmp.vb.project.457",
"accession_code": "vb.project.457"
}
]
"plot_observations": [
{
"author_obs_code": "Plot.A.Obs.1",
"accession_code": "vb.plot.obs.789",
"plot_accession_code": "vb.plot.123",
"project_accession_code": "vb.project.456"
},
{
"author_obs_code": "Plot.B.Obs.1",
"accession_code": "vb.plot.obs.790",
"plot_accession_code": "vb.plot.111",
"project_accession_code": "vb.project.457"
"stratum_method_accession_code": "vb.sm.333"
}
]
},
}
}
Note: If the upload contains errors (e.g., missing required fields or invalid references), the system returns an error response with actionable feedback.
Here's another example, this time with errors:
Request (with missing fields)
{
"plot_observations": [
{
"author_obs_code": "Plot.A.Obs.1",
"plot": {
"accession_code": "tmp.vb.plot.123", // New plot
"author_plot_code": "Plot.A",
"real_latitude": 34.4146,
"real_longitude": -119.6908,
"latitude": 34.4,
"longitude": -119.7,
// missing confidentiality_status
},
"project": {
"accession_code": "tmp.vb.project.456" // New project
"project_name": "Project.1",
"project_description": "Description of project",
"start_date": "2024-10-01"
},
"observation_start_date": "2024-10-03",
"topographic_position": "High level"
},
]
}
Error Response
{
"status": "error",
"errors": [
{
"record_type": "plot",
"tmp_accession_code": "tmp.vb.plot.123",
"field": "confidentiality_status",
"message": "Confidentiality status is required for new plots."
}
],
"partial_results": {
"resolved_references": {
"plots": {
},
"projects": {
},
}
}
}
Pros:
- Fully automated workflow.
- Simplifies user experience.
Cons:
- Risk of unintended creation of duplicates if reference fields aren’t cleanly validated.
- Relies heavily on high-performance database queries to avoid delays.