Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override BQ load job location when necessary #31986

Merged
merged 2 commits into from
Aug 8, 2024

Conversation

ahmedabu98
Copy link
Contributor

Found a case where if we retry a successful load job (e.g. due to bundle failure), we get a 409 ALREADY_EXISTS error along with a job reference that does not contain a location.

In finish bundle, we perform wait operations on job references, where we get and poll until the job is finished. If we attempt to get a job without a location, the API will default it to US multi region. However, this is problematic when we're writing to a different region because BQ won't be able to find the job and we encounter a 404 NOT_FOUND error.

In such a case where the job reference has null location, we should override it with the table's location

Copy link
Contributor

Assigning reviewers. If you would like to opt out of this review, comment assign to next reviewer:

R: @shunping for label python.
R: @chamikaramj for label io.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@shunping
Copy link
Contributor

shunping commented Jul 25, 2024

Looks like there are some presubmit errors caused by the code change.

apitools.base.protorpclite.messages.ValidationError: Expected type <class 'str'> for field location, found <Mock name='mock.tables.Get().location' id='133722614311760'> (type <class 'mock.mock.Mock'>) 

I think we need to mock location when creating the mocked object of bq_client.

@shunping
Copy link
Contributor

waiting on author

@ahmedabu98
Copy link
Contributor Author

@shunping can you PTAL?

Copy link
Contributor

@shunping shunping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ahmedabu98 ahmedabu98 merged commit ea98212 into apache:master Aug 8, 2024
89 checks passed
@Abacn
Copy link
Contributor

Abacn commented Oct 29, 2024

There is a racing condition here such that the retry job could happen when the conflict job is running, and then get_table_location will fail on 404, thus failing the Dataflow job.

Abacn added a commit to Abacn/beam that referenced this pull request Oct 29, 2024
reeba212 pushed a commit to reeba212/beam that referenced this pull request Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants