GitHub - michelsumbul/spark-hbase-export-import

Spark job that export/import efficiently a hbase table. You can choose to compress the output of the export to bzip2.

The project has been developped because the native hbase mapreduce export/import job can be quiet slow on very big hbase table. Especially if the table has enable compression wit ha high % compression rate

The code have been tested on HDP3.1 with Spark 2.3.2 and HBase 2.0.2

To run an export use the following command: Synthax: sparkHBaseExportImport.sparkhbaseexportimport.exportHbase --num-executors <num_exec> --executor-memory <size_mem_exec> --master yarn --deploy-mode client sparkHBaseExportImport-1.0-jar-with-dependencies.jar <table_name> <hdfs_destination_folder> <boolean_compression true|false>

Example:

/usr/hdp/current/spark2-client/bin/spark-submit --class sparkHBaseExportImport.sparkhbaseexportimport.exportHbase --num-executors 6 --executor-memory 10G --master yarn --deploy-mode client sparkHBaseExportImport-1.0-jar-with-dependencies.jar test_table /tmp/test2/ true

You will have to choose the right number of executor and the memory per executor depending on the size of the table, the ressource available on the cluster and how quick you want to execute the export.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src/main/java/sparkHBaseExportImport/sparkhbaseexportimport		src/main/java/sparkHBaseExportImport/sparkhbaseexportimport
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

michelsumbul/spark-hbase-export-import

Folders and files

Latest commit

History

Repository files navigation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages