-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(expr-common): Coerce to Decimal(20, 0) when combining UInt64 with signed integers #14223
Changes from all commits
5f0bd17
d632d7a
c21610c
befdc43
bb2c5c0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -433,3 +433,30 @@ drop table test_column_defaults | |
|
||
statement error DataFusion error: Error during planning: Column reference is not allowed in the DEFAULT expression : Schema error: No field named a. | ||
create table test_column_defaults(a int, b int default a+1) | ||
|
||
|
||
# test inserting UInt64 and signed integers into a bigint unsigned column | ||
statement ok | ||
create table unsigned_bigint_test (v bigint unsigned) | ||
|
||
query I | ||
insert into unsigned_bigint_test values (10000000000000000000), (18446744073709551615) | ||
---- | ||
2 | ||
|
||
query I | ||
insert into unsigned_bigint_test values (10000000000000000001), (1), (10000000000000000002) | ||
---- | ||
3 | ||
|
||
query I rowsort | ||
select * from unsigned_bigint_test | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree the case of overflow on However, I would argue that if the user cares about avoiding overflows when doing Intger arithmetic they should use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Overflow is unexpected in this case, as all these values are valid unsigned bigint. The problem is that in This is very similar to the union case mentioned above. |
||
---- | ||
1 | ||
10000000000000000000 | ||
10000000000000000001 | ||
10000000000000000002 | ||
18446744073709551615 | ||
|
||
statement ok | ||
drop table unsigned_bigint_test |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -413,23 +413,23 @@ explain SELECT c1, c9 FROM aggregate_test_100 UNION ALL SELECT c1, c3 FROM aggre | |
logical_plan | ||
01)Sort: aggregate_test_100.c9 DESC NULLS FIRST, fetch=5 | ||
02)--Union | ||
03)----Projection: aggregate_test_100.c1, CAST(aggregate_test_100.c9 AS Int64) AS c9 | ||
03)----Projection: aggregate_test_100.c1, CAST(aggregate_test_100.c9 AS Decimal128(20, 0)) AS c9 | ||
04)------TableScan: aggregate_test_100 projection=[c1, c9] | ||
05)----Projection: aggregate_test_100.c1, CAST(aggregate_test_100.c3 AS Int64) AS c9 | ||
05)----Projection: aggregate_test_100.c1, CAST(aggregate_test_100.c3 AS Decimal128(20, 0)) AS c9 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this seems like a regression to me (there is now 2x the space needed) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This query unions create table t1(a smallint) as values(1);
create table t2(a bigint unsigned) as values(10000000000000000000);
select * from t1 union select * from t2; |
||
06)------TableScan: aggregate_test_100 projection=[c1, c3] | ||
physical_plan | ||
01)SortPreservingMergeExec: [c9@1 DESC], fetch=5 | ||
02)--UnionExec | ||
03)----SortExec: TopK(fetch=5), expr=[c9@1 DESC], preserve_partitioning=[true] | ||
04)------ProjectionExec: expr=[c1@0 as c1, CAST(c9@1 AS Int64) as c9] | ||
04)------ProjectionExec: expr=[c1@0 as c1, CAST(c9@1 AS Decimal128(20, 0)) as c9] | ||
05)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
06)----------CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1, c9], has_header=true | ||
07)----SortExec: TopK(fetch=5), expr=[c9@1 DESC], preserve_partitioning=[true] | ||
08)------ProjectionExec: expr=[c1@0 as c1, CAST(c3@1 AS Int64) as c9] | ||
08)------ProjectionExec: expr=[c1@0 as c1, CAST(c3@1 AS Decimal128(20, 0)) as c9] | ||
09)--------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
10)----------CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1, c3], has_header=true | ||
|
||
query TI | ||
query TR | ||
SELECT c1, c9 FROM aggregate_test_100 UNION ALL SELECT c1, c3 FROM aggregate_test_100 ORDER BY c9 DESC LIMIT 5 | ||
---- | ||
c 4268716378 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this has potentially (large) performance implications.
My understanding is that this means that Int64+Int64 will result in (always) a 128bit result?
So even though performing int64+int64 will never overflow, all queries will pay the price of 2x space (and some time) overhead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No,
Int64+Int64
is not affected, it uses mathematics_numerical_coercion.I did some validation on this PR branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really liked this way to verify the behavior. I made a PR with this type of test and verified that the tests still pass with the changes in this PR: