Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spark): add Apache Sedona #11527

Merged
merged 1 commit into from
Oct 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/.static-type-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ jobs:
poetry run poe static-type-check-python -- --package=hm-rasa
poetry run poe static-type-check-python -- --package=hm-ray.applications.greet
poetry run poe static-type-check-python -- --package=hm-serial
poetry run poe static-type-check-python -- --package=hm-spark.applications.analyze-coffee-customers
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-retired-people-python
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-taxi-top-routes
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-taxi-top-routes-sql
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,7 @@ static-type-check-python:
poetry run poe static-type-check-python -- --package=hm-rasa
poetry run poe static-type-check-python -- --package=hm-ray.applications.greet
poetry run poe static-type-check-python -- --package=hm-serial
poetry run poe static-type-check-python -- --package=hm-spark.applications.analyze-coffee-customers
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-retired-people-python
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-taxi-top-routes
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-taxi-top-routes-sql
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The diagram illustrates the repository's architecture, which is considered overl

(The diagram here may take a moment to load. Please wait patiently.)

![Architecture](https://github.com/hongbo-miao/hongbomiao.com/assets/3375461/e25c3792-d29d-4831-906e-2906838322bc)
![Architecture](https://github.com/hongbo-miao/hongbomiao.com/assets/3375461/765c280b-103b-4f65-85be-22b6f0e06b9c)

# 📦 Setup

Expand Down Expand Up @@ -269,6 +269,7 @@ make kubernetes-clean
- **flink-connector-twitter** - Flink Twitter connector
- **flink-connector-jdbc** - Flink JDBC Connector
- **flink-connector-redis** - Flink Redis connector
- **Apache Sedona** - Spatial data processing framework
- **Grafana** - Data visualization
- **Metabase** - Data visualization
- **Apache Superset** - Data visualization
Expand Down
2 changes: 2 additions & 0 deletions hm-spark/Makefile
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
openjdk-install:
brew install openjdk@17
apache-spark-install:
brew install apache-spark
sbt-install:
Expand Down
13 changes: 13 additions & 0 deletions hm-spark/applications/analyze-coffee-customers/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
poetry-env-use:
poetry env use 3.11
poetry-update-lock-file:
poetry lock --no-update
poetry-install:
poetry install --no-root
poetry-add:
poetry add xxx
poetry-add-dev:
poetry add xxx --group=dev

poetry-run-dev:
poetry run poe dev
Empty file.
239 changes: 239 additions & 0 deletions hm-spark/applications/analyze-coffee-customers/poetry.lock

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions hm-spark/applications/analyze-coffee-customers/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[tool.poetry]
name = "hm-spark-analyze-coffee-customers"
version = "1.0.0"
description = ""
authors = ["Hongbo Miao"]

[tool.poetry.dependencies]
python = "3.11.x"
apache-sedona = {version = "1.4.1", extras = ["spark"]}

[tool.poetry.group.dev.dependencies]
poethepoet = "0.23.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

[tool.poe.tasks]
dev = "python src/main.py"
53 changes: 53 additions & 0 deletions hm-spark/applications/analyze-coffee-customers/src/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from sedona.spark import SedonaContext


def main() -> None:
sedona_config = (
SedonaContext.builder()
.config(
"spark.jars.packages",
# https://mvnrepository.https://mvnrepository.com/artifact/org.apache.sedona
"org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.4.1,"
# https://mvnrepository.com/artifact/org.datasyslab/geotools-wrapper
"org.datasyslab:geotools-wrapper:1.4.0-28.2",
)
.getOrCreate()
)
sedona = SedonaContext.create(sedona_config)

(
sedona.read.format("csv")
.option("delimiter", ",")
.option("header", "false")
# https://github.com/apache/sedona/blob/master/binder/data/testpoint.csv
.load("data/testpoint.csv")
).createOrReplaceTempView("points")

sedona.sql(
"""
select st_point(cast(points._c0 as double), cast(points._c1 as double)) as point
from points
"""
).createOrReplaceTempView("points1")
sedona.sql(
"""
select st_point(cast(points._c0 as double), cast(points._c1 as double)) as point
from points
"""
).createOrReplaceTempView("points2")

df = sedona.sql(
"""
select
points1.point as point1,
points2.point as point2,
st_distance(points1.point, points2.point) as distance
from points1, points2
where st_distance(points1.point, points2.point) < 2
"""
)
df.show()


if __name__ == "__main__":
main()
5 changes: 1 addition & 4 deletions hm-spark/applications/find-retired-people-python/Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
openjdk-install:
brew install openjdk@17

poetry-env-use:
poetry env use 3.10
poetry env use 3.11
poetry-update-lock-file:
poetry lock --no-update
poetry-install:
Expand Down
5 changes: 1 addition & 4 deletions hm-spark/applications/find-taxi-top-routes-sql/Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
openjdk-install:
brew install openjdk@17

poetry-env-use:
poetry env use 3.10
poetry env use 3.11
poetry-update-lock-file:
poetry lock --no-update
poetry-install:
Expand Down
5 changes: 1 addition & 4 deletions hm-spark/applications/find-taxi-top-routes/Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
openjdk-install:
brew install openjdk@17

poetry-env-use:
poetry env use 3.10
poetry env use 3.11
poetry-update-lock-file:
poetry lock --no-update
poetry-install:
Expand Down
5 changes: 1 addition & 4 deletions hm-spark/applications/recommend-movies/Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
openjdk-install:
brew install openjdk@17

poetry-env-use:
poetry env use 3.10
poetry env use 3.11
poetry-update-lock-file:
poetry lock --no-update
poetry-install:
Expand Down