-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48665][PYTHON][CONNECT] Support providing a dict in pyspark lit to create a map. #49318
base: master
Are you sure you want to change the base?
[SPARK-48665][PYTHON][CONNECT] Support providing a dict in pyspark lit to create a map. #49318
Conversation
|
||
with self.sql_conf( | ||
{ | ||
"spark.sql.ansi.enabled": False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how ansi
affect this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without ansi the result is {“a”: 1, “b”: 2, “c”: None}
with the ansi {“a”:”1”, “b”:”2”, “c”: None} all the int are strings
) | ||
from pyspark.sql import SparkSession | ||
|
||
spark = SparkSession.getActiveSession() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we require an active session here? @HyukjinKwon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use the default value of spark.sql.pyspark.inferNestedDictAsStruct.enabled
when SparkSession.getActiveSession
returns None
. For Connect too.
|
||
spark = SparkSession.getActiveSession() | ||
dict_as_struct = ( | ||
spark.conf.get("spark.sql.pyspark.inferNestedDictAsStruct.enabled") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this triggers a Config
RPC, for nested cases, it will re-trigger multiple times.
We should cache the config, and make sure only at most one invocation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall I create a new cached function in utils
@lru_cache()
def __get_conf_nested():
spark = SparkSession.getActiveSession()
dict_as_struct = (
spark.conf.get("spark.sql.pyspark.inferNestedDictAsStruct.enabled")
if spark
else “true"
)
return dict_as_struct
?
What changes were proposed in this pull request?
Reopening the PR done by Ronserruya and addidng small changes.
Added the option to provide a dict to
pyspark.sql.functions.lit
in order to create a mapWhy are the changes needed?
To make it easier to create a map in pyspark.
Currently, it is only possible via
create_map
which requires a sequence of key,value,key,value...Scala already supports such functionality using
typedLit
A similar PR was done in the past to add similar functionality for the creating of an array using a list, so I tried to follow all the changes done there as well.
Does this PR introduce any user-facing change?
Yes, docstring of
lit
was edited, and new functionality was addedBefore:
After:
How was this patch tested?
Manual tests + unittest in CI
Was this patch authored or co-authored using generative AI tooling?
No