Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48665][PYTHON][CONNECT] Support providing a dict in pyspark lit to create a map. #49318

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

skanderboudawara
Copy link

@skanderboudawara skanderboudawara commented Dec 27, 2024

What changes were proposed in this pull request?

Reopening the PR done by Ronserruya and addidng small changes.

Added the option to provide a dict to pyspark.sql.functions.lit in order to create a map

Why are the changes needed?

To make it easier to create a map in pyspark.
Currently, it is only possible via create_map which requires a sequence of key,value,key,value...
Scala already supports such functionality using typedLit

A similar PR was done in the past to add similar functionality for the creating of an array using a list, so I tried to follow all the changes done there as well.

Does this PR introduce any user-facing change?

Yes, docstring of lit was edited, and new functionality was added

Before:

from pyspark.sql import functions as F
F.lit({"a":1})
# pyspark.errors.exceptions.captured.SparkRuntimeException: [UNSUPPORTED_FEATURE.LITERAL_TYPE] The feature is not supported: Literal for '{asd=2}' of class java.util.HashMap.

After:

from pyspark.sql import functions as F
F.lit({"a":1, "b": 2})
# Column<'map(a, 1, b, 2)'>

How was this patch tested?

Manual tests + unittest in CI

Was this patch authored or co-authored using generative AI tooling?

No

python/pyspark/sql/functions/builtin.py Outdated Show resolved Hide resolved

with self.sql_conf(
{
"spark.sql.ansi.enabled": False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how ansi affect this feature?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without ansi the result is {“a”: 1, “b”: 2, “c”: None}
with the ansi {“a”:”1”, “b”:”2”, “c”: None} all the int are strings

python/pyspark/sql/tests/test_functions.py Show resolved Hide resolved
python/pyspark/sql/functions/builtin.py Outdated Show resolved Hide resolved
python/pyspark/sql/tests/test_functions.py Show resolved Hide resolved
)
from pyspark.sql import SparkSession

spark = SparkSession.getActiveSession()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we require an active session here? @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the default value of spark.sql.pyspark.inferNestedDictAsStruct.enabled when SparkSession.getActiveSession returns None. For Connect too.


spark = SparkSession.getActiveSession()
dict_as_struct = (
spark.conf.get("spark.sql.pyspark.inferNestedDictAsStruct.enabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this triggers a Config RPC, for nested cases, it will re-trigger multiple times.
We should cache the config, and make sure only at most one invocation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall I create a new cached function in utils

@lru_cache()
def __get_conf_nested():
        spark = SparkSession.getActiveSession()
        dict_as_struct = (
            spark.conf.get("spark.sql.pyspark.inferNestedDictAsStruct.enabled")
            if spark
            else “true"
        )
       return dict_as_struct

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants