Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load from csv #13

Open
peter-8640 opened this issue Dec 18, 2024 · 5 comments
Open

Load from csv #13

peter-8640 opened this issue Dec 18, 2024 · 5 comments

Comments

@peter-8640
Copy link

The dump file is no longer available (not obviously) on the icij, instead a zip of CSVs is provided.
Now if these are added to the dmbs import folder e.g.
C:\Users<user>.Neo4jDesktop\relate-data\dbmss\dbms-xxxx..\import

The graph can be built with:
bin/neo4j-admin database import full neo4j --delimiter="," --array-delimiter="U+007C" --quote="'" --nodes=Addresses=import/nodes-addresses.csv --nodes=Entities=import/nodes-entities.csv --nodes=Intermediaries=import/nodes-intermediaries.csv --nodes=Officers=import/nodes-officers.csv --nodes=Others=import/nodes-others.csv --relationships=import/relationships.csv

Also, change the headers of relationships.csv to:
:START_ID,:END_ID,:TYPE,link,status,start_date,end_date,sourceID
(neo4j 5.24)

I still get this error:
org.neo4j.internal.batchimport.input.HeaderException: Group 'null' not found. Available groups are: []
at org.neo4j.internal.batchimport.input.Groups.get(Groups.java:80)

So something isn't working with the import.
image

image

[Some quirks with getting the correct version of java but that is not related to this repo]

@peter-8640
Copy link
Author

Importing a single file gives a different error:

bin/neo4j-admin database import full
neo4j --delimiter="," --array-delimiter="U+007C" --quote="'" --nodes=Addresses=import/nodes-addresses.csv --overwrite-destination

image

@peter-8640
Copy link
Author

This doesn't quite work:

bin/neo4j-admin database import full neo4j --delimiter="," --multiline-fields=true --overwrite-destination --nodes=Addresses=import/nodes-addresses.csv --nodes=Entities=import/nodes-entities.csv --nodes=Intermediaries=import/nodes-intermediaries.csv --nodes=Officers=import/nodes-officers.csv --nodes=Others=import/nodes-others.csv --relationships=import/relationships.csv

image

@miguelfg
Copy link
Member

Hi @peter-8640 ,

Sorry for making not-clear changes here in this repo. Indeed we stopped hosting here the downloadable data, but both formats are still available, you have the links in this page. You have one for the csv format, and 2 for neo4j dumps, version 4 and version 5.

We do some transformations to the CSV format, that is why is not straightforward to import them to Neo4j. We will provide a script to do so anyway. In the meantime I fully suggest to download and import one of the Neo4j dumps.

@peter-8640
Copy link
Author

Hi @miguelfg ah perfect. I missed the link to the dumps. Up and running now.
Many thanks!

I guess this issue could be closed (or leave it open if you like until the csv topic is closed).

@miguelfg
Copy link
Member

Hi, you can put the Makefile (remove .txt extension) rules where is convenient for you, or modify the path to the csv files and run these 3 commands.

make prepare-test-csv-import-neo4j
make fix-intermediaries
make test-csv-import-neo4j
Makefile.txt

I've notice there are dups in the intermediary nodes they should be removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants