-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: PySpark-like, DataFrame-only API #104
Merged
Changes from 25 commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
e752e4e
Deprecate RDD API
dfdx ed60744
Start building a completely new API mimicking the Python API
dfdx 339ef63
Add Column, a few basic operations on Column and DataFrame, some tests
dfdx 83953c0
Add Column API
dfdx 7c1a3c2
Start implementation of julia2java compiler
dfdx 999f0f3
Add complete UDF implementation (PoC). Introduce modules
dfdx 39dbe44
Fix issues in CI/CD
dfdx 22d809b
Add methods for StructField and StructType
dfdx 08307c4
Add spark.createDataFrame()
dfdx 5d4723a
Add select()
dfdx 5624463
Add RuntimeConfig
dfdx 8a284d6
Add withColumn, filter and others
dfdx 1d8ef0c
Fix empty tuple argument in jcall (for older version of JavaCall)
dfdx e7572f3
Fix more jcall's, rename GroupedData vars
dfdx eda3861
Add more ways to create a DataFrame
dfdx 6bc47e5
Finish GroupedData
dfdx b07ba1a
Add sql() and friends
dfdx 1239298
Add join(), remove unused RDD stuff
dfdx 19096ff
Divide code into multiple files
dfdx ed9137d
Add reader/writer + tests
dfdx 576df2d
Update versions in CI/CD
dfdx 139cdbb
Unset JULIA_COPY_STACKS on Appveyor
dfdx 099e042
Add operations on Window
dfdx 77b8f04
Another attempt to fix tests on Appveyor
dfdx dc2a760
Add Windows to the test config in Github Actions
dfdx db3b1dc
Rework pom.xml, update versions of everything
dfdx 75f8f17
Turn off foreach(::DataStreamWriter, ...), start reworking compilatio…
dfdx 1ff7aad
Fix compatibility issue
dfdx 96ec648
Remove Windows from the test matrix for now
dfdx a5e8227
Switch to Oracle JDK in Github Actions
dfdx 383af82
Try another distribution of OpenJDK
dfdx 3abb2cc
Try Oracle JDK 11.0.14
dfdx afb383d
Add a few stubs for docs
dfdx 3f22e99
Try JDK 8 to fix UncaughtExceptionHandler in thread 'process reaper'
dfdx 795dbe4
In preparation to the release
dfdx 5826cdf
Fix dependencies, rename doc files
dfdx 4ff15f9
Add support for Date & DateTime
dfdx 37f7dfb
Conversion with primitive types
dfdx 7b0f1c9
Add SQL docs
dfdx c0a5866
Add a few docstrings
dfdx 49501b6
Add a few more docstrings
dfdx 10e7f3f
Add API reference
dfdx d8d4071
Make a better representation for a streaming DataFrame
dfdx 8f43024
Add Structured Streaming docs
dfdx d2da3b3
Add docs workflow to GitHub Actions
dfdx 0964d10
Update branch name in GitHub Actions workflows
dfdx File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
*.jl.mem | ||
*~ | ||
.idea/ | ||
.vscode/ | ||
target/ | ||
project/ | ||
*.class | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,50 +1,5 @@ | ||
module Spark | ||
|
||
export | ||
SparkConf, | ||
SparkContext, | ||
RDD, | ||
JuliaRDD, | ||
JavaRDD, | ||
text_file, | ||
parallelize, | ||
map, | ||
map_pair, | ||
map_partitions, | ||
map_partitions_pair, | ||
map_partitions_with_index, | ||
reduce, | ||
filter, | ||
collect, | ||
count, | ||
id, | ||
num_partitions, | ||
close, | ||
@attach, | ||
share_variable, | ||
@share, | ||
flat_map, | ||
flat_map_pair, | ||
cartesian, | ||
group_by_key, | ||
reduce_by_key, | ||
cache, | ||
repartition, | ||
coalesce, | ||
pipe, | ||
# SQL | ||
SparkSession, | ||
Dataset, | ||
sql, | ||
read_json, | ||
write_json, | ||
read_parquet, | ||
write_parquet, | ||
read_df, | ||
write_df | ||
|
||
|
||
|
||
include("core.jl") | ||
|
||
end | ||
end |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
""" | ||
DotChainer{O, Fn} | ||
|
||
See `@chainable` for details. | ||
""" | ||
struct DotChainer{O, Fn} | ||
obj::O | ||
fn::Fn | ||
end | ||
|
||
# DotChainer(obj, fn) = DotChainer{typeof(obj), typeof(fn)}(obj, fn) | ||
|
||
(c::DotChainer)(args...) = c.fn(c.obj, args...) | ||
|
||
|
||
""" | ||
@chainable T | ||
|
||
Adds dot chaining syntax to the type, i.e. automatically translate: | ||
|
||
foo.bar(a) | ||
|
||
into | ||
|
||
bar(foo, a) | ||
|
||
For single-argument functions also support implicit calls, e.g: | ||
|
||
foo.bar.baz(a, b) | ||
|
||
is treated the same as: | ||
|
||
foo.bar().baz(a, b) | ||
|
||
Note that `@chainable` works by overloading `Base.getproperty()`, | ||
making it impossible to customize it for `T`. To have more control, | ||
one may use the underlying wrapper type - `DotCaller`. | ||
""" | ||
macro chainable(T) | ||
return quote | ||
function Base.getproperty(obj::$(esc(T)), prop::Symbol) | ||
if hasfield(typeof(obj), prop) | ||
return getfield(obj, prop) | ||
elseif isdefined(@__MODULE__, prop) | ||
fn = getfield(@__MODULE__, prop) | ||
return DotChainer(obj, fn) | ||
else | ||
error("type $(typeof(obj)) has no field $prop") | ||
end | ||
end | ||
end | ||
end | ||
|
||
|
||
function Base.getproperty(dc::DotChainer, prop::Symbol) | ||
if hasfield(typeof(dc), prop) | ||
return getfield(dc, prop) | ||
else | ||
# implicitely call function without arguments | ||
# and propagate getproperty to the returned object | ||
return getproperty(dc(), prop) | ||
end | ||
end | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wow, good magic, I had no idea this was possible in Julia :)