Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve query for getting users for review #1891

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Kobzol
Copy link
Contributor

@Kobzol Kobzol commented Jan 24, 2025

The previous query had a problem where if some user was missing in the review_prefs table, it would not be returned from the query. This is problematic, because for several reasons, the review_prefs table might be incomplete, and in general, it should be possible to get users even if they are not in the review_prefs table.

There is one problem with the new query though, because there is no index on the users.username column, this will result in a sequential scan on the users table. It's probably still fast enough, but I wanted to mention it.

The LEAST clause was not required, I think. The added COALESCE is needed to treat the number of PRs for people missing in review_prefs as zero, otherwise they would be again ignored.

You can experiment with this in https://onecompiler.com/postgresql/43722tb3w.

@apiraino
Copy link
Contributor

The LEAST clause was not required, I think.

So in my comment I was wrong about LEAST, right? 🤔

@Kobzol
Copy link
Contributor Author

Kobzol commented Jan 25, 2025

COALESCE should return the first non-null argument, so it should be enough to do COALESCE(NULL, 100000). I don't think that you need LEAST, if the first argument to COALESCE is non-NULL, it should be returned.

Comment on lines +832 to +837
SELECT u.username AS username
FROM users AS u
LEFT JOIN review_prefs AS r ON u.user_id = r.user_id
AND COALESCE(CARDINALITY(r.assigned_prs), 0) < COALESCE(r.max_assigned_prs, 100000)
WHERE u.username = ANY('{{ {} }}')
",
Copy link
Contributor

@apiraino apiraino Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an observation: before your change, this function didn't return any record if no "joined" user was found. Changing the query as you suggest also changes the API, now it always return something so the caller must check if the returned candidate actually has capacity.

There is a TODO at the top of has_user_capacity() (which does more or less the same work) and that was implemented in #1893. I'll probably split that patch to make it clearer.

EDIT: in commit b93064a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so my query assumes that the only way the capacity could be affected is through the review_prefs column. Which I assume should be true? Apart from the on vacation list, that has to be checked elsewhere (until we get rid of it).

In other words, if you don't have an entry in review_prefs, then you have capacity.

@Kobzol Kobzol marked this pull request as draft February 20, 2025 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants