Skip to content

Commit

Permalink
feat: add first and last aggregate functions
Browse files Browse the repository at this point in the history
Substrait already defines `first_value` and `last_value`,
however they're defined as window functions.
First and last are commonly used as aggregate functions,
and having them as aggregates allows using them also in windows
(but not the other way around), see
https://substrait.io/expressions/window_functions/#aggregate-functions-as-window-functions

Spark defines first and last as aggregate functions
https://github.com/apache/spark/blob/a50b30d7a02bf45f1ddb8db0be6779b1441e1d4d/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L518
, and e.g. dropDuplicates is implemented as an aggregation using first()
https://github.com/apache/spark/blob/a50b30d7a02bf45f1ddb8db0be6779b1441e1d4d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L2262
  • Loading branch information
Blizzara committed Jun 26, 2024
1 parent fe7eba4 commit 0d924e9
Showing 1 changed file with 37 additions and 0 deletions.
37 changes: 37 additions & 0 deletions extensions/functions_arithmetic.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1563,6 +1563,43 @@ aggregate_functions:
values: [ TIE_TO_EVEN, TIE_AWAY_FROM_ZERO, TRUNCATE, CEILING, FLOOR ]
nullability: DECLARED_OUTPUT
return: fp64?
- name: "first"
description: >-
First value in a set of rows.
The `null_handling` option determines whether or not null values will be recognized by the function.
If `null_handling` is set to `IGNORE_NULLS`, the function will look for the first non-null value.
If set to `ACCEPT_NULLS`, the function will return the first value it sees, even if null.
If all values are null or there are no values, returns null.
impls:
- args:
- name: x
value: any
options:
null_handling:
values: [ IGNORE_NULLS, ACCEPT_NULLS ]
nullability: DECLARED_OUTPUT
decomposable: MANY
intermediate: any?
return: any?
- name: "last"
description: >-
Last value in a set of rows.
The `null_handling` option determines whether or not null values will be recognized by the function.
If `null_handling` is set to `IGNORE_NULLS`, the function will look for the last non-null value.
If set to `ACCEPT_NULLS`, the function will return the last value it sees, even if null.
If all values are null or there are no values, returns null.
impls:
- args:
- name: x
value: any
nullability: DECLARED_OUTPUT
decomposable: MANY
intermediate: any?
return: any?
- name: "quantile"
description: >
Calculates quantiles for a set of values.
Expand Down

0 comments on commit 0d924e9

Please sign in to comment.