-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend "unfold" operation and support it in the compiler plugin #742
base: master
Are you sure you want to change the base?
Conversation
…rt them as compilation errors. Add special constructor for errors that shouldn't be caught
Interpreters need an ability to pass arguments down to DSL, so introduce new "dsl" factory function
…ts failure messages
Generated sources will be updated after merging this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering, why did you decide to put unfold
after replace
?
Unfold itself, by definition, already replaces a column with a new column. I think we're making the API more complicated than it needs to be by making our users write df.replace { a }.unfold {}
instead of keeping it inside the unfold operation, like df.unfold { a }.by {}
. You can even allow df.unfold(maxDepth = 2) { a }
. It would keep "replace with" simple and "unfold" more powerful.
return when (kind()) { | ||
ColumnKind.Group, ColumnKind.Frame -> this | ||
else -> when { | ||
skipPrimitive && isPrimitive() -> this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was very confused, like how can you unfold a primitive? but it's an isPrimitive()
which can be a collection too... Can we rename isPrimitive()
to something like isPrimitiveOrListLike()
? unfold seems to be the only operation using it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't. Have a look unfold primitive
test. skipPrimitive = false is needed to make it work, and skipPrimitive = true is needed to avoid unpacking for example a column of String to ColumnGroup, size: Int
, the same as we do for toDataFrame with overloads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but can you take a look at the isPrimitive()
function? That function also returns true when you run in on a collection and an array. My suggestion was only to rename the isPrimitive()
function.
public inline fun <reified T> DataColumn<T>.unfold(noinline body: CreateDataFrameDsl<T>.() -> Unit): AnyCol = | ||
unfoldImpl(skipPrimitive = false, body) | ||
|
||
public inline fun <T, reified C> ReplaceClause<T, C>.unfold(vararg props: KProperty<*>, maxDepth: Int = 0): DataFrame<T> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the name unfolding
would read better, or byUnfolding
/withUnfolded
. replace {}.unfold {}
doesn't read as a sentence anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's possible, let's avoid motion or gravity to the native language, I believe, it's not a goal
What definition? |
I meant the definition of "unfolding", you're unfolding a column with its contents, so its type is bound to change, aka, the column is replaced. I can see where you're coming from, but it still may be hard for users to have two different ways to unfold, namely So I'd either:
wdyt? |
I'd say
Please write this API with needed overloads and a few examples of its usage then |
public inline fun <reified T> DataColumn<T>.unfold(noinline body: CreateDataFrameDsl<T>.() -> Unit): AnyCol = | ||
unfoldImpl(skipPrimitive = false, body) | ||
|
||
public inline fun <T, reified C> ReplaceClause<T, C>.unfold(vararg props: KProperty<*>, maxDepth: Int = 0): DataFrame<T> = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, we should also use KCallable
instead of KProperty
for java classes support :)
fun `unfold properties`() { | ||
val col by columnOf(A("1", 123, B(3.0))) | ||
val df1 = dataFrameOf(col) | ||
val conv = df1.replace { col }.unfold(maxDepth = 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not specifying maxDepth now breaks, while it worked before.
Try running df1.replace { col }.with { it.unfold() }
before the PR and after it.
It works before, but now it gives: java.lang.UnsupportedOperationException: Can not get nested column 'd' from ValueColumn 'bb'
I made a little sample of |
UnfoldingDataFrame looks good, we can use it. Supporting it on the plugin side will require some changes, but it's ok. I'm only worried with return type being different than DataFrame people will be tempted to call |
@koperagen Yes, I couldn't find another way to have a notation with 2 selectors and the second one being optional while keeping the DataFrame DSL style :/ but indeed, it's something new. A bit like Actually, since We could also put it inside the class to make it even more discoverable. Alternatively, we could change the return-type of Imagine that XD
|
|
||
@Test | ||
fun `unfold properties`() { | ||
val col by columnOf(A("1", 123, B(3.0))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this case "More fine-grained toDataFrame. Instead of converting 20-30 properties to 2-3 level of nesting all at once user can choose to convert toDataFrame(maxDepth = 0) and unfold required properties to whatever level they need" covered here, in this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, yes. I intend to have a more representative example as a part of compiler plugin demo. There's a tree of objects with many properties and potentially deep nesting from konsist library. It will be a good illustration. But here it merely unfolds one specific column up to 2 levels.
val a by columnOf("123") | ||
val df = dataFrameOf(a) | ||
|
||
val conv = df.replace { a }.unfold { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use replace and unfold independently? If somehow yes, could you please add test for this, of only together, could be combined to one function?
Honetsly, I like the idea from the use-case "More fine-grained toDataFrame. Instead of converting 20-30 properties to 2-3 level of nesting all at once user can choose to convert toDataFrame(maxDepth = 0) and unfold required properties to whatever level they need", defining a level of our unfolding (that is a special situation of Also I found that we have a lack of docs/examples for this operation https://kotlin.github.io/dataframe/unfold.html Some crazy ideas
|
Since it's a WIP I made it a draft for now |
Just a thought :) We can actually have something like Something like: df.replace { data }.with {
it.unfold {
"b" from { it }
"c" from { DataRow.readJsonStr("""{"prop": 1}""") }
}
} |
It covers two interesting use cases:
add
for thatOn the compiler plugin side i will continue to support other overloads later in different PR