Skip to content

Commit

Permalink
feat(spark): add Apache Sedona (#11527)
Browse files Browse the repository at this point in the history
  • Loading branch information
hongbo-miao authored Oct 1, 2023
1 parent 0f4bf97 commit 1c4b314
Show file tree
Hide file tree
Showing 13 changed files with 334 additions and 17 deletions.
1 change: 1 addition & 0 deletions .github/workflows/.static-type-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,7 @@ jobs:
poetry run poe static-type-check-python -- --package=hm-rasa
poetry run poe static-type-check-python -- --package=hm-ray.applications.greet
poetry run poe static-type-check-python -- --package=hm-serial
poetry run poe static-type-check-python -- --package=hm-spark.applications.analyze-coffee-customers
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-retired-people-python
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-taxi-top-routes
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-taxi-top-routes-sql
Expand Down
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -321,6 +321,7 @@ static-type-check-python:
poetry run poe static-type-check-python -- --package=hm-rasa
poetry run poe static-type-check-python -- --package=hm-ray.applications.greet
poetry run poe static-type-check-python -- --package=hm-serial
poetry run poe static-type-check-python -- --package=hm-spark.applications.analyze-coffee-customers
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-retired-people-python
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-taxi-top-routes
poetry run poe static-type-check-python -- --package=hm-spark.applications.find-taxi-top-routes-sql
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ The diagram illustrates the repository's architecture, which is considered overl

(The diagram here may take a moment to load. Please wait patiently.)

![Architecture](https://github.com/hongbo-miao/hongbomiao.com/assets/3375461/e25c3792-d29d-4831-906e-2906838322bc)
![Architecture](https://github.com/hongbo-miao/hongbomiao.com/assets/3375461/765c280b-103b-4f65-85be-22b6f0e06b9c)

# 📦 Setup

Expand Down Expand Up @@ -269,6 +269,7 @@ make kubernetes-clean
- **flink-connector-twitter** - Flink Twitter connector
- **flink-connector-jdbc** - Flink JDBC Connector
- **flink-connector-redis** - Flink Redis connector
- **Apache Sedona** - Spatial data processing framework
- **Grafana** - Data visualization
- **Metabase** - Data visualization
- **Apache Superset** - Data visualization
Expand Down
2 changes: 2 additions & 0 deletions hm-spark/Makefile
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
openjdk-install:
brew install openjdk@17
apache-spark-install:
brew install apache-spark
sbt-install:
Expand Down
13 changes: 13 additions & 0 deletions hm-spark/applications/analyze-coffee-customers/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
poetry-env-use:
poetry env use 3.11
poetry-update-lock-file:
poetry lock --no-update
poetry-install:
poetry install --no-root
poetry-add:
poetry add xxx
poetry-add-dev:
poetry add xxx --group=dev

poetry-run-dev:
poetry run poe dev
Empty file.
239 changes: 239 additions & 0 deletions hm-spark/applications/analyze-coffee-customers/poetry.lock

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions hm-spark/applications/analyze-coffee-customers/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
[tool.poetry]
name = "hm-spark-analyze-coffee-customers"
version = "1.0.0"
description = ""
authors = ["Hongbo Miao"]

[tool.poetry.dependencies]
python = "3.11.x"
apache-sedona = {version = "1.4.1", extras = ["spark"]}

[tool.poetry.group.dev.dependencies]
poethepoet = "0.23.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

[tool.poe.tasks]
dev = "python src/main.py"
53 changes: 53 additions & 0 deletions hm-spark/applications/analyze-coffee-customers/src/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
from sedona.spark import SedonaContext


def main() -> None:
sedona_config = (
SedonaContext.builder()
.config(
"spark.jars.packages",
# https://mvnrepository.https://mvnrepository.com/artifact/org.apache.sedona
"org.apache.sedona:sedona-spark-shaded-3.4_2.12:1.4.1,"
# https://mvnrepository.com/artifact/org.datasyslab/geotools-wrapper
"org.datasyslab:geotools-wrapper:1.4.0-28.2",
)
.getOrCreate()
)
sedona = SedonaContext.create(sedona_config)

(
sedona.read.format("csv")
.option("delimiter", ",")
.option("header", "false")
# https://github.com/apache/sedona/blob/master/binder/data/testpoint.csv
.load("data/testpoint.csv")
).createOrReplaceTempView("points")

sedona.sql(
"""
select st_point(cast(points._c0 as double), cast(points._c1 as double)) as point
from points
"""
).createOrReplaceTempView("points1")
sedona.sql(
"""
select st_point(cast(points._c0 as double), cast(points._c1 as double)) as point
from points
"""
).createOrReplaceTempView("points2")

df = sedona.sql(
"""
select
points1.point as point1,
points2.point as point2,
st_distance(points1.point, points2.point) as distance
from points1, points2
where st_distance(points1.point, points2.point) < 2
"""
)
df.show()


if __name__ == "__main__":
main()
5 changes: 1 addition & 4 deletions hm-spark/applications/find-retired-people-python/Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
openjdk-install:
brew install openjdk@17

poetry-env-use:
poetry env use 3.10
poetry env use 3.11
poetry-update-lock-file:
poetry lock --no-update
poetry-install:
Expand Down
5 changes: 1 addition & 4 deletions hm-spark/applications/find-taxi-top-routes-sql/Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
openjdk-install:
brew install openjdk@17

poetry-env-use:
poetry env use 3.10
poetry env use 3.11
poetry-update-lock-file:
poetry lock --no-update
poetry-install:
Expand Down
5 changes: 1 addition & 4 deletions hm-spark/applications/find-taxi-top-routes/Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
openjdk-install:
brew install openjdk@17

poetry-env-use:
poetry env use 3.10
poetry env use 3.11
poetry-update-lock-file:
poetry lock --no-update
poetry-install:
Expand Down
5 changes: 1 addition & 4 deletions hm-spark/applications/recommend-movies/Makefile
Original file line number Diff line number Diff line change
@@ -1,8 +1,5 @@
openjdk-install:
brew install openjdk@17

poetry-env-use:
poetry env use 3.10
poetry env use 3.11
poetry-update-lock-file:
poetry lock --no-update
poetry-install:
Expand Down

0 comments on commit 1c4b314

Please sign in to comment.