Skip to content

Commit

Permalink
add example, update README [ci skip]
Browse files Browse the repository at this point in the history
  • Loading branch information
Preetam committed Oct 15, 2021
1 parent 4b0fca9 commit e391356
Show file tree
Hide file tree
Showing 5 changed files with 3,737 additions and 2 deletions.
40 changes: 40 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,43 @@
# crossjoin [![Docker](https://github.com/crossjoin-io/crossjoin/actions/workflows/docker.yml/badge.svg)](https://github.com/crossjoin-io/crossjoin/actions/workflows/docker.yml) [![CLI](https://github.com/crossjoin-io/crossjoin/actions/workflows/go.yml/badge.svg)](https://github.com/crossjoin-io/crossjoin/actions/workflows/go.yml) [![Security scan](https://github.com/crossjoin-io/crossjoin/actions/workflows/shiftleft.yml/badge.svg)](https://github.com/crossjoin-io/crossjoin/blob/main/SECURITY.md)

Crossjoin joins together your data from anywhere.

- Supports PostgreSQL, Redshift, CSV data sources
- Zero dependency CLI, or a single Docker container

## Example

In the [example](https://github.com/crossjoin-io/crossjoin/tree/main/example) directory, there are two CSVs (adapted from
this [AWS blog post](https://aws.amazon.com/blogs/big-data/joining-across-data-sources-on-amazon-quicksight/)) representing
orders and returns data.

The config creates a combined data set in a `joined.db` SQLite3 file.

```yaml
data_sets:
- name: joined
data_source:
name: orders
type: csv
path: ./orders.csv
joins:
- type: JOIN
columns:
- left_column: Order ID
right_column: Order ID
data_source:
name: returns
type: csv
path: ./returns.csv
```
```
$ crossjoin --config ./config.yaml
2021/10/14 18:08:06 using config file path config.yaml
2021/10/14 18:08:06 starting crossjoin
2021/10/14 18:08:06 creating data set `joined`
2021/10/14 18:08:06 querying `orders`
2021/10/14 18:08:06 querying `returns`
2021/10/14 18:08:06 joining data
2021/10/14 18:08:06 finished crossjoin
```
9 changes: 7 additions & 2 deletions cmd/crossjoin/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import (
"io"
"log"
"os"
"strconv"
"strings"

"github.com/crossjoin-io/crossjoin/config"
Expand Down Expand Up @@ -114,13 +115,14 @@ func createDataset(dataset config.DataSet) error {
for _, join := range dataset.Joins {
joinColumns := []string{}
for _, cols := range join.Columns {
joinColumns = append(joinColumns, fmt.Sprintf("%s.%s = %s.%s", dataset.DataSource.Name, cols.LeftColumn, join.DataSource.Name, cols.RightColumn))
joinColumns = append(joinColumns, fmt.Sprintf(`%s."%s" = %s."%s"`, dataset.DataSource.Name, cols.LeftColumn, join.DataSource.Name, cols.RightColumn))
}
joinClauses += fmt.Sprintf(" %s %s ON %s", join.Type, join.DataSource.Name, strings.Join(joinColumns, " AND "))
}

log.Println("joining data")
_, err = db.Exec(fmt.Sprintf("CREATE TABLE %s AS SELECT * FROM %s %s", dataset.Name, dataset.DataSource.Name, joinClauses))
joinQuery := fmt.Sprintf("CREATE TABLE %s AS SELECT * FROM %s %s", dataset.Name, dataset.DataSource.Name, joinClauses)
_, err = db.Exec(joinQuery)
return err
}

Expand All @@ -138,6 +140,9 @@ func fetchSingle(dest *sql.DB, dataSource *config.DataSource) error {
return err
}
columns := firstLine
for i := range columns {
columns[i] = strconv.Quote(columns[i])
}

_, err = dest.Exec(ngsastOK(fmt.Sprintf("CREATE TABLE %s (%s)", dataSource.Name, strings.Join(columns, ","))))
if err != nil {
Expand Down
15 changes: 15 additions & 0 deletions example/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
data_sets:
- name: joined
data_source:
name: orders
type: csv
path: ./orders.csv
joins:
- type: JOIN
columns:
- left_column: Order ID
right_column: Order ID
data_source:
name: returns
type: csv
path: ./returns.csv
Loading

0 comments on commit e391356

Please sign in to comment.