Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extend the LOAD_CSV documentation #112

Merged
merged 4 commits into from
Jan 15, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ The following table summarizes which configuration parameters can be set at modu
| [EFFECTS_THRESHOLD](#effects_threshold) | V | V |
| [CMD_INFO](#cmd_info) | V | V |
| [MAX_INFO_QUERIES](#max_info_queries) | V | V |
| [IMPORT_FOLDER](#import_folder) | V | X |

---

Expand Down Expand Up @@ -379,3 +380,10 @@ total execution time / number of changes: 5ms / 5 = 1ms.
if the average modification time is greater then `EFFECTS_THRESHOLD` the query
will be replicated to both replicas and AOF as a graph effect otherwise the original
query will be replicated.

---

### IMPORT_FOLDER

The import folder configuration specifies an absolute path to a folder from which
FalkorDB is allowed to load CSV files, defaults to: '/var/lib/FalkorDB/import/'
gkorland marked this conversation as resolved.
Show resolved Hide resolved
gkorland marked this conversation as resolved.
Show resolved Hide resolved
84 changes: 82 additions & 2 deletions cypher/load_csv.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ LOAD CSV FROM 'file://actors.csv' AS row
MERGE (a:Actor {name: row[0]})
```

`LOAD CSV FROM` accepts a string containing the path to a CSV file,
`LOAD CSV FROM` accepts a string path to a CSV file,
the file is parsed line by line, the current line is accessible through the
variable specified by `AS`. Each parsed value is treated as a `string`, use
the right conversion functions e.g. `toInteger` to cast a value to its
Expand All @@ -25,7 +25,7 @@ Additional clauses can follow and accesses the `row` variable

### Importing local files

FalkorDB defines a data directory ![see configuration](../configuration)
FalkorDB defines a data directory [see configuration](../configuration#import_folder)
Under which local CSV files should be stored, all `file://` URIs are resolved
gkorland marked this conversation as resolved.
Show resolved Hide resolved
relatively to that directory.

Expand Down Expand Up @@ -112,3 +112,83 @@ MERGE (m:Movie {title: row['movie']})
MERGE (a)-[:ACTED_IN]->(m)
```

### Importing remote files

FalkorDB supports the importing remote CSVs via HTTPS
gkorland marked this conversation as resolved.
Show resolved Hide resolved

Below we'll be loading the bigmac dataset from calmcode.io
gkorland marked this conversation as resolved.
Show resolved Hide resolved

```LOAD CSV WITH HEADERS FROM 'https://calmcode.io/static/data/bigmac.csv' AS row
RETURN row LIMIT 4

1) 1) "ROW"
2) 1) 1) "{date: 2002-04-01, currency_code: PHP, name: Philippines, local_price: 65.0, dollar_ex: 51.0, dollar_price: 1.27450980392157}"
2) 1) "{date: 2002-04-01, currency_code: PEN, name: Peru, local_price: 8.5, dollar_ex: 3.43, dollar_price: 2.47813411078717}"
3) 1) "{date: 2002-04-01, currency_code: NZD, name: New Zealand, local_price: 3.6, dollar_ex: 2.24, dollar_price: 1.60714285714286}"
4) 1) "{date: 2002-04-01, currency_code: NOK, name: Norway, local_price: 35.0, dollar_ex: 8.56, dollar_price: 4.088785046728971}"
```
gkorland marked this conversation as resolved.
Show resolved Hide resolved

### Dealing with large number of columns or missing entries
gkorland marked this conversation as resolved.
Show resolved Hide resolved

It's likely that not all of the cells in a CSV file are present, this makes
gkorland marked this conversation as resolved.
Show resolved Hide resolved
loading the data a bit more complicated, luckly there's an easy way around it
which is also useful for loading a large number of columns
gkorland marked this conversation as resolved.
Show resolved Hide resolved

Assuming this it the CSV file we're loading:
gkorland marked this conversation as resolved.
Show resolved Hide resolved


### missing_entries.csv

| name | birthyear |
| :--------------| :---------|
| Lee Pace | 1979 |
| Vin Diesel | |
| Chris Pratt | |
| Zoe Saldana | 1978 |

Note: both Vin Diesel an Chris Pratt are missing their birthyear entry
gkorland marked this conversation as resolved.
Show resolved Hide resolved

Upon creating the Actor nodes We don't need to explicitly specify each column as we did so far,
gkorland marked this conversation as resolved.
Show resolved Hide resolved
the following query creates an empty Actor node and assigns the current CSV row to the node
this inturn sets the node's attribute-set to the current row

```
LOAD CSV FROM 'file://missing_entries.csv' AS row
CREATE (a:Actor)
SET a = row
RETURN a

1) 1) "a"
2) 1) 1) 1) 1) "id"
2) (integer) 0
2) 1) "labels"
2) 1) "Actor"
3) 1) "properties"
2) 1) 1) "name"
2) "Zoe Saldana"
2) 1) "birthyear"
2) "1978"
2) 1) 1) 1) "id"
2) (integer) 1
2) 1) "labels"
2) 1) "Actor"
3) 1) "properties"
2) 1) 1) "name"
2) "Chris Pratt"
3) 1) 1) 1) "id"
2) (integer) 2
2) 1) "labels"
2) 1) "Actor"
3) 1) "properties"
2) 1) 1) "name"
2) "Vin Diesel"
4) 1) 1) 1) "id"
2) (integer) 3
2) 1) "labels"
2) 1) "Actor"
3) 1) "properties"
2) 1) 1) "name"
2) "Lee Pace"
2) 1) "birthyear"
2) "1979"
```
gkorland marked this conversation as resolved.
Show resolved Hide resolved
Loading