Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(postgresql): set query result column nullable if selected column contains json operator or null value in case expression #3739

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

swallowstalker
Copy link
Contributor

@swallowstalker swallowstalker commented Dec 10, 2024

Trying to solve #3710

JSON traversal operator on selected columns

Based on postgresql docs, operator ->> and #>> should return string, and since this is JSON and it could have null value on its field, it should return pgtype.Text. Other JSON operator should return interface{} since it returns JSON object

Refactor AExpr and AConst on selected columns

Since it is used on CaseExpr, I refactored both of them to allow it to be reusable in different location

TypeCast null value checking

If there is any selected columns which has null value to be casted to other type, for example null::text then it should return pgtype.Text to allow nullable text

CaseExpr nullable + column type handling

Previously, sqlc determines column type from default case and uses hardcoded NotNull to true. Relying on default case can be bad since it might not be declared, if user decides not to use ELSE keyword.

This PR will determine column type + nullability from all sub-case, on list of THEN clause combined with ELSE clause considering data types involved on those multiple clause.

  1. If there is only one data type on any case expr clause, then it will be nullable of that data type
  2. If there are multiple data type, it will automatically generate interface{} type
  3. If all case expr clause have null result, then it will generate interface{} type
  4. Case expr without default clause, will be assumed to have null as its ELSE result

and if any of those column are nullable, then the resulting column will be set to nullable.

@swallowstalker swallowstalker changed the title feat: adding nullable text if selected column is json operator and casted as text feat: adding nullable text if selected column has json operator and has cast type Dec 10, 2024
@swallowstalker swallowstalker changed the title feat: adding nullable text if selected column has json operator and has cast type fix(postgresql): adding nullable text if selected column has json operator and has cast type Dec 10, 2024
@swallowstalker swallowstalker changed the title fix(postgresql): adding nullable text if selected column has json operator and has cast type fix(postgresql): set query result column nullable if selected column contains json operator or null value in case expression Dec 12, 2024
@veqryn
Copy link

veqryn commented Dec 12, 2024

Can we add this test case as well please?

-- name: GetNullable3B :many
SELECT CASE WHEN true THEN 'hello'::text ELSE null::text END 
FROM "mytable";

@veqryn
Copy link

veqryn commented Dec 12, 2024

Where in your tests are we setting the expectation of the generated results?
Like, I would have expected your GetNullable2 to return an any/interface{}, but I don't see the expectation set anywhere. Right now, the generated code appears to return a pgtype.Text, which is odd for something returning integers, floats, and strings.

Or is the committed code the expectation?

@swallowstalker
Copy link
Contributor Author

swallowstalker commented Dec 13, 2024

Can we add this test case as well please?

GetNullable3B added

Like, I would have expected your GetNullable2 to return an any/interface{}

In my PR description I proposed type precedence, for multiple type inside case expression. For example

  • 1st condition result is int, 2nd condition result is float, 3rd condition result is text, and default condition is null, then it should be nullable text, not an interface
  • 1st condition result is int, 2nd condition result is float, and default condition is int, then it should be non-nullable float, not an interface
  • 1st condition result is int, and default condition is null, then it should be nullable int, not an interface

and so on. but on the other side, I think that the user should be given freedom if the return value of case expr has several type variance (in short, just return any or interface{}). @kyleconroy maybe you have feedback on this particular case? 👀

@swallowstalker
Copy link
Contributor Author

Or is the committed code the expectation?

I have generated the result using code from this PR, and then make it as expectation (commited on repo), but to be honest I have a problem in github workflow tests, it seems that the binary that the workflow use is the previous version before I made these changes, so the generated go file would always be different than the one I commited on the repo. Especially on TestReplay endtoend tests. Maybe somebody can help giving me some pointers?

@veqryn
Copy link

veqryn commented Dec 13, 2024

So after thinking about it, I realized we really only care about what Postgres thinks the returned values are, not what we think they should be.
I tried various combinations of your GetNullable2, and found that postgres doesn't really care about the order of which case/when statement comes first, but instead has some kind of preference, which you mentioned.

For example, if you insert at least 4 rows into mytable, then running this query:

SELECT CASE
    WHEN id = 1 THEN '1'
    WHEN id = 2 THEN null
    WHEN id = 3 THEN id
    WHEN id = 4 THEN 4.5
    ELSE '10'
END
FROM "mytable";

Returns:

1
[NULL]
3
4.5
10
10

The strings for '1' and '10' got cast into numbers. id is still an integer/number. The decimal or float of 4.5 got to stay as is. I suppose the correct type for this would be a nullable decimal?

But if you change the query to this:

SELECT CASE
    WHEN id = 1 THEN '1'::text
    WHEN id = 2 THEN null
    WHEN id = 3 THEN id
    WHEN id = 4 THEN 4.5
    ELSE '10'
END
FROM "mytable";

You get an error: SQL Error [42804]: ERROR: CASE types text and bigint cannot be matched

or this:

SELECT CASE
    WHEN id = 1 THEN 'foobar'
    WHEN id = 2 THEN null
    WHEN id = 3 THEN id
    WHEN id = 4 THEN 4.5
    ELSE '10'
END
FROM "mytable";

You get a similar error: SQL Error [22P02]: ERROR: invalid input syntax for type numeric: "foobar"

So to sum up: I think you are on the right track, with forcing these types of queries into a type rather than an any/interface{}

I don't know how realistic these are in the real world. Perhaps we should try to find queries that hit an actual table of data, rather than case statements of literals.

@swallowstalker
Copy link
Contributor Author

I agree that it's not quite realistic, I better left it to sqlc user to decide which data type that will be used in case of multiple data type exist on case-when-then-else. In short, I'll make it interface then

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Dec 19, 2024
col = &Column{Name: name, DataType: "float", NotNull: true}
case *ast.Boolean:
col = &Column{Name: name, DataType: "bool", NotNull: true}
case *ast.Null:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is new addition to handle null const value. I need to distinguish between any and null, because null in CaseExpr would determine query column result nullability, but does not make it included in column type decision.

for ast.A_Const condition, i have converted DataType null back to any (line 136, outputColumns function) because we don't know its type and to preserve backward compatibility

}
} else {
cols = append(cols, &Column{Name: name, DataType: "any", NotNull: false})
defaultCol = &Column{Name: name, DataType: "null", NotNull: false}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing ELSE clause on CaseExpr is considered to return null value

@veqryn
Copy link

veqryn commented Dec 22, 2024

Based on the tests, I believe this fixes the issue for me.
How do we get this reviewed by sqlc devs?

@swallowstalker
Copy link
Contributor Author

from my previous PR, I mentioned @kyleconroy to ask for review. would you mind checking it @kyleconroy? sorry i still can't assign reviewer for this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants