-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The schema
argument of as_polars_df()
is needed?
#897
Comments
@etiennebacher What do you think about this? I was thinking of deprecating it, but I feel like we could remove it since probably very few people use this. |
IIUC the point of |
I do not understand your position. This is like
I mistakenly thought that you were the one who suggested the removal of this. #896 (comment) |
The problem there is that the behavior of
library(polars)
options(polars.do_not_repeat_call = TRUE)
pl$DataFrame(a = 1:3, b = 4:6, schema = list(b = pl$String, y = pl$Int32))
#> Error: Execution halted with the following contexts
#> 0: In R: in $DataFrame():
#> 1: Some columns in `schema` are not in the DataFrame. This is consistent with py-polars: import polars as pl
pl.DataFrame(
{
"a": [1, 2, 3],
"b": [4, 5, 6],
},
schema={"b": pl.String, "y": pl.Int32},
)
ValueError: the given column-schema names do not match the data dictionary
library(polars)
options(polars.do_not_repeat_call = TRUE)
as_polars_df(data.frame(a = 1:3, b = 4:6), schema = list(b = pl$String, y = pl$Int32))
#> shape: (3, 2)
#> ┌─────┬─────┐
#> │ b ┆ y │
#> │ --- ┆ --- │
#> │ str ┆ i32 │
#> ╞═════╪═════╡
#> │ 1 ┆ 4 │
#> │ 2 ┆ 5 │
#> │ 3 ┆ 6 │
#> └─────┴─────┘ IMO this is not a good behavior and it should rather throw an error like in So supposing this behavior is fixed, I don't think there's a reason to remove |
By the way, the argument This should error since not all column names are specified in library(polars)
options(polars.do_not_repeat_call = TRUE)
pl$DataFrame(a = 1:3, b = 4:6, schema = list(b = pl$String))
#> shape: (3, 2)
#> ┌─────┬─────┐
#> │ a ┆ b │
#> │ --- ┆ --- │
#> │ i32 ┆ str │
#> ╞═════╪═════╡
#> │ 1 ┆ 4 │
#> │ 2 ┆ 5 │
#> │ 3 ┆ 6 │
#> └─────┴─────┘ import polars as pl
pl.DataFrame(
{
"a": [1, 2, 3],
"b": [4, 5, 6],
},
schema={"b": pl.String}
)
ValueError: the given column-schema names do not match the data dictionary |
I feel that is inconsistent with the behavior described in the documentation. It says:
In other words, |
I think this is already tracked here: pola-rs/polars#14386 |
Good point. But in any case I have no idea when to use the |
In my mind
which I think doesn't make sense. I think the current state is quite confusing, so that's also why I think we should wait for them to clarify (hopefully quite quickly). |
That makes sense. If the current behavior of |
If so, yes. If the expected behavior is to only specify the column types, without renaming, then I don't think it should be removed. |
In my opinion, there is no user demand to change column names in
as_polars_df()
.So simply removing the
schema
argument and leaving onlyschema_override
would be sufficient.In terms of type change, the
schema
argument is more difficult to use thanschema_override
in that all columns must be specified.Originally posted by @eitsupi in #896 (comment)
The text was updated successfully, but these errors were encountered: