Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add primary key to schema_history #13611

Open
wonko opened this issue Sep 17, 2024 · 4 comments · May be fixed by #14103
Open

Add primary key to schema_history #13611

wonko opened this issue Sep 17, 2024 · 4 comments · May be fixed by #14103
Labels
area/offloading Node status offloading area/workflow-archive good first issue Good for newcomers solution/suggested A solution to the bug has been suggested. Someone needs to implement it. type/feature Feature request

Comments

@wonko
Copy link

wonko commented Sep 17, 2024

Summary

It would be good to have a primary key in the creation statement for the schema_history table.

Use Cases

DB replication mechanisms and integrity-checking mechanisms sometimes rely on the ability to identify a single row. Primary keys are made for this.

I bumped into this while using a percona-xtradb database deployed with the Percona Operator (which isn't too uncommon, I had it running in the cluster for another db already, so re-used it for the argo database), which will deny the creation of this table with the default configuration:

Failed to init db: Error 1105 (HY000): Percona-XtraDB-Cluster prohibits use of DML command on a table (argo.schema_history) without an explicit primary key with pxc_strict_mode = ENFORCING or MASTER

While it might feel like it adds no value to the argo-workflows project, this is a minor change, won't affect anything else, but will allow users to deploy argo easier. I might add a PR if this would be allowed.

The changes would go in migrate.go, where the creation would get a primary key(schema_version) and an additional migration line in the big set which would do an alter-add-primary-key statement to try and add the primary key to an already existing table.


Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.

@wonko wonko added the type/feature Feature request label Sep 17, 2024
@agilgur5 agilgur5 added area/workflow-archive area/offloading Node status offloading solution/suggested A solution to the bug has been suggested. Someone needs to implement it. labels Sep 17, 2024
@MasonM MasonM added the good first issue Good for newcomers label Dec 15, 2024
@MasonM
Copy link
Contributor

MasonM commented Dec 15, 2024

Notes for anyone wanting to work on this:

  1. This will involve adding a small migration here:
    // PostgreSQL only: convert argo_archived_workflows.workflow column to JSONB for performance and consistency with MySQL. #13779
    ternary(dbType == MySQL,
    noop{},
    ansiSQLChange(`alter table argo_archived_workflows alter column workflow set data type jsonb using workflow::jsonb`),
    ),
  2. Make sure to test with both PostgreSQL and MySQL, which you do with the go run ./hack/db migrate tool: https://argo-workflows.readthedocs.io/en/latest/running-locally/#database-tooling

@radusora
Copy link

I ran some tests with Percona XtraDB and noticed that adding the migration as suggested by @MasonM doesn’t fully address the issue. After the schema_history table is created, subsequent DML commands in the migrations will fail without a primary key.

Adding the primary key directly in the create table statement helps with new installations, but it creates a problem for migrations attempting to add the key for existing installations. Similarly, trying to remove and re-add the primary key in the migrations isn’t reliable because it may not exist in the first place.

Given these constraints, one possible solution is to introduce conditional logic immediately after the schema_history table creation. This logic would check whether the primary key exists and add it if it doesn’t – or safely handle any duplicate key errors if it already exists. While this approach isn’t perfectly elegant (as it requires running an extra query each time), it seems like a practical compromise for both new and existing installations.

If this approach sounds acceptable, I’d be happy to submit a PR to implement it.

@MasonM
Copy link
Contributor

MasonM commented Jan 18, 2025

@radusora Thanks for the details. That sounds reasonable. The query select 1 from information_schema.table_constraints where constraint_type = 'PRIMARY KEY' and table_name = 'schema_history' should allow you to quickly check if the primary key is already there with both MySQL and PostgreSQL (at least with the versions I tried).

radusora pushed a commit to radusora/argo-workflows that referenced this issue Jan 20, 2025
radusora pushed a commit to radusora/argo-workflows that referenced this issue Jan 20, 2025
radusora added a commit to radusora/argo-workflows that referenced this issue Jan 20, 2025
@radusora
Copy link

@MasonM Thank you for the suggestion! I realized I needed to add an additional filter for the database, as the DB user in use could theoretically have access to other databases. Without the filter, the query against information_schema might return irrelevant results. Submitted the PR now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/offloading Node status offloading area/workflow-archive good first issue Good for newcomers solution/suggested A solution to the bug has been suggested. Someone needs to implement it. type/feature Feature request
Projects
None yet
4 participants