Skip to content

michelsumbul/spark-hbase-export-import

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Spark job that export/import efficiently a hbase table. You can choose to compress the output of the export to bzip2.

The project has been developped because the native hbase mapreduce export/import job can be quiet slow on very big hbase table. Especially if the table has enable compression wit ha high % compression rate

The code have been tested on HDP3.1 with Spark 2.3.2 and HBase 2.0.2

To run an export use the following command: Synthax: sparkHBaseExportImport.sparkhbaseexportimport.exportHbase --num-executors <num_exec> --executor-memory <size_mem_exec> --master yarn --deploy-mode client sparkHBaseExportImport-1.0-jar-with-dependencies.jar <table_name> <hdfs_destination_folder> <boolean_compression true|false>

Example:

/usr/hdp/current/spark2-client/bin/spark-submit --class sparkHBaseExportImport.sparkhbaseexportimport.exportHbase --num-executors 6 --executor-memory 10G --master yarn --deploy-mode client sparkHBaseExportImport-1.0-jar-with-dependencies.jar test_table /tmp/test2/ true

You will have to choose the right number of executor and the memory per executor depending on the size of the table, the ressource available on the cluster and how quick you want to execute the export.

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages