Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SUPPORT]I can see the Hudi table and data in spark-shell, but in Hive, the table is visible, yet no data can be queried. #12673

Open
TigerTORA opened this issue Jan 19, 2025 · 5 comments
Labels
hive Issues related to hive priority:critical production down; pipelines stalled; Need help asap. release-1.0.0

Comments

@TigerTORA
Copy link

TigerTORA commented Jan 19, 2025

Tips before filing an issue

  • Have you gone through our FAQs?

  • Join the mailing list to engage in conversations and get faster support at [email protected].

  • If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

I can see the Hudi table and data in spark-shell, but in Hive, the table is visible, yet no data can be queried.

To Reproduce

Steps to reproduce the behavior:

  1. I can see the Hudi table and its data that I created in the spark-shell.
scala> spark.sql("SELECT *  FROM  my_db.hudi_cow").show()
+-------------------+--------------------+------------------+----------------------+--------------------+-----+-------+----+---------+--------------+---------+---------+-----------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|   _hoodie_file_name|rowId|preComb|name|versionId|toBeDeletedStr|intToLong|longToInt|partitionId|
+-------------------+--------------------+------------------+----------------------+--------------------+-----+-------+----+---------+--------------+---------+---------+-----------+
|  20250106212648252|20250106212648252...|             row_1|  partitionId=2021/...|95e9a53d-a167-4d3...|row_1|      0| bob|      v_0|      toBeDel0|        0|  1000000| 2021/01/01|
|  20250106212648252|20250106212648252...|             row_2|  partitionId=2021/...|95e9a53d-a167-4d3...|row_2|      0|john|      v_0|      toBeDel0|        0|  1000000| 2021/01/01|
|  20250106212648252|20250106212648252...|             row_3|  partitionId=2021/...|e7e7da47-bd42-497...|row_3|      0| tom|      v_0|      toBeDel0|        0|  1000000| 2021/01/02|
+-------------------+--------------------+------------------+----------------------+--------------------+-----+-------+----+---------+--------------+---------+---------+-----------+
  1. I can also see the table in Hive, but I'm unable to query any data.
0: jdbc:hive2://cdp73-1.test.com:2181,cdp73-2> select *from my_db.hudi_cow;
WARN  : WARNING! Query command could not be redacted.java.lang.IllegalStateException: Error loading from /home/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/bin/../lib/hive/conf/redaction-rules.json: java.io.FileNotFoundException: /home/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/bin/../lib/hive/conf/redaction-rules.json (No such file or directory)
INFO  : Compiling command(queryId=hive_20250118211448_449370b5-f63f-46dd-93b3-98bf8b4be147): select *from my_db.hudi_cow
DEBUG : Shutting down query select *from my_db.hudi_cow
+-------------------------------+--------------------------------+------------------------------+----------------------------------+-----------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
| hudi_cow._hoodie_commit_time  | hudi_cow._hoodie_commit_seqno  | hudi_cow._hoodie_record_key  | hudi_cow._hoodie_partition_path  | hudi_cow._hoodie_file_name  | hudi_cow.rowid  | hudi_cow.precomb  | hudi_cow.name  | hudi_cow.versionid  | hudi_cow.tobedeletedstr  | hudi_cow.inttolong  | hudi_cow.longtoint  | hudi_cow.partitionid  |
+-------------------------------+--------------------------------+------------------------------+----------------------------------+-----------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
+-------------------------------+--------------------------------+------------------------------+----------------------------------+-----------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
No rows selected (0.747 seconds)
0: jdbc:hive2://cdp73-1.test.com:2181,cdp73-2>`
  1. here is my show create table in hive
+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `my_db`.`hudi_cow`(          |
|   `_hoodie_commit_time` string COMMENT '',         |
|   `_hoodie_commit_seqno` string COMMENT '',        |
|   `_hoodie_record_key` string COMMENT '',          |
|   `_hoodie_partition_path` string COMMENT '',      |
|   `_hoodie_file_name` string COMMENT '',           |
|   `rowid` string COMMENT '',                       |
|   `precomb` bigint COMMENT '',                     |
|   `name` string COMMENT '',                        |
|   `versionid` string COMMENT '',                   |
|   `tobedeletedstr` string COMMENT '',              |
|   `inttolong` int COMMENT '',                      |
|   `longtoint` bigint COMMENT '')                   |
| PARTITIONED BY (                                   |
|   `partitionid` string COMMENT '')                 |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
| WITH SERDEPROPERTIES (                             |
|   'hoodie.query.as.ro.table'='false',              |
|   'path'='/user/hive/warehouse/hudi_cow')          |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hudi.hadoop.HoodieParquetInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION                                           |
|   'hdfs://nameservice1/user/hive/warehouse/hudi_cow' |
| TBLPROPERTIES (                                    |
|   'last_commit_completion_time_sync'='20250106212709780',  |
|   'last_commit_time_sync'='20250106212648252',     |
|   'spark.sql.create.version'='3.4.1.7.3.1.0-197',  |
|   'spark.sql.sources.provider'='hudi',             |
|   'spark.sql.sources.schema.numPartCols'='1',      |
|   'spark.sql.sources.schema.numParts'='1',         |
|   'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"rowId","type":"string","nullable":true,"metadata":{}},{"name":"preComb","type":"long","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"versionId","type":"string","nullable":true,"metadata":{}},{"name":"toBeDeletedStr","type":"string","nullable":true,"metadata":{}},{"name":"intToLong","type":"integer","nullable":true,"metadata":{}},{"name":"longToInt","type":"long","nullable":true,"metadata":{}},{"name":"partitionId","type":"string","nullable":true,"metadata":{}}]}',  |
|   'spark.sql.sources.schema.partCol.0'='partitionId',  |
|   'transient_lastDdlTime'='1736216830')            |
+----------------------------------------------------+
  1. here is my sql create table in spark-shell
// spark-shell
import org.apache.hudi.QuickstartUtils._
import scala.collection.JavaConversions._
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
 
 
val databaseName = "my_db"
val tableName = "hudi_cow"
val basePath = "/user/hive/warehouse/hudi_cow"
 
val schema = StructType(Array(
StructField("rowId", StringType,true),
StructField("partitionId", StringType,true),
StructField("preComb", LongType,true),
StructField("name", StringType,true),
StructField("versionId", StringType,true),
StructField("toBeDeletedStr", StringType,true),
StructField("intToLong", IntegerType,true),
StructField("longToInt", LongType,true)
))
 
val data0 = Seq(Row("row_1", "2021/01/01",0L,"bob","v_0","toBeDel0",0,1000000L), 
               Row("row_2", "2021/01/01",0L,"john","v_0","toBeDel0",0,1000000L), 
               Row("row_3", "2021/01/02",0L,"tom","v_0","toBeDel0",0,1000000L))
 
var dfFromData0 = spark.createDataFrame(data0,schema)

dfFromData0.write.format("hudi").
  options(getQuickstartWriteConfigs).
  option("hoodie.datasource.write.precombine.field", "preComb").
  option("hoodie.datasource.write.recordkey.field", "rowId").
  option("hoodie.datasource.write.partitionpath.field", "partitionId").
  option("hoodie.database.name", databaseName).
  option("hoodie.table.name", tableName).
  option("hoodie.datasource.write.table.type", "COPY_ON_WRITE").
  option("hoodie.datasource.write.operation", "upsert").
  option("hoodie.datasource.write.hive_style_partitioning","true").
  option("hoodie.datasource.meta.sync.enable", "true").
  option("hoodie.datasource.hive_sync.mode", "hms").
  option("hoodie.embed.timeline.server", "false").
  mode(Overwrite).
  save(basePath)

Expected behavior

Environment Description

  • Hudi version :1.0.0

  • Spark version :3.4.1.7.3.1.0-197

  • Hive version :3.1.3000.7.3.1.0-197

  • Hadoop version :3.1.1.7.3.1.0-197

  • Storage (HDFS/S3/GCS..) : HDFS

  • Running on Docker? (yes/no) :no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

@TigerTORA TigerTORA changed the title [SUPPORT] [SUPPORT]I can see the Hudi table and data in spark-shell, but in Hive, the table is visible, yet no data can be queried. Jan 19, 2025
@ad1happy2go
Copy link
Collaborator

@TigerTORA Did you add hive bundle for version 1.0.0 to your hive libs, or you can do add jar under hive cli too.

@TigerTORA
Copy link
Author

TigerTORA commented Jan 20, 2025

@TigerTORA Did you add hive bundle for version 1.0.0 to your hive libs, or you can do add jar under hive cli too.

yes I have add hive bundle for version 1.0.0 to my hive libs

[root@cdp73-1 ~]# ls -al /opt/cloudera/parcels/CDH/lib/hive/lib/*hudi*
lrwxrwxrwx. 1 root root       46 Dec  3 21:55 /opt/cloudera/parcels/CDH/lib/hive/lib/hudi-common-0.5.0-incubating.jar -> ../../../jars/hudi-common-0.5.0-incubating.jar
lrwxrwxrwx. 1 root root       49 Dec  3 21:55 /opt/cloudera/parcels/CDH/lib/hive/lib/hudi-hadoop-mr-0.5.0-incubating.jar -> ../../../jars/hudi-hadoop-mr-0.5.0-incubating.jar
-rw-r--r--. 1 root root 44000261 Jan  5 02:48 /opt/cloudera/parcels/CDH/lib/hive/lib/hudi-hadoop-mr-bundle-1.0.0.jar
-rw-r--r--. 1 root root 48602464 Jan  4 01:08 /opt/cloudera/parcels/CDH/lib/hive/lib/hudi-hive-sync-bundle-1.0.0.jar
[root@cdp73-1 ~]#

@ad1happy2go ad1happy2go added priority:critical production down; pipelines stalled; Need help asap. hive Issues related to hive release-1.0.0 labels Jan 20, 2025
@github-project-automation github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Jan 20, 2025
@rangareddy
Copy link

Hi @TigerTORA

I have successfully read data from Hive using your code. Could you please review the Hive query logs to identify any potential issues?

0: jdbc:hive2://hive-server:10000/default> use my_db;
0: jdbc:hive2://hive-server:10000/default> show tables;
+-----------+
| tab_name  |
+-----------+
| hudi_cow  |
+-----------+
0: jdbc:hive2://hive-server:10000/default> select * from hudi_cow;
INFO  : Compiling command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020): select * from hudi_cow
INFO  : No Stats for my_db@hudi_cow, Columns: _hoodie_commit_time, inttolong, longtoint, _hoodie_partition_path, versionid, precomb, _hoodie_record_key, name, tobedeletedstr, _hoodie_commit_seqno, _hoodie_file_name, rowid
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:hudi_cow._hoodie_commit_time, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_commit_seqno, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_record_key, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_partition_path, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_file_name, type:string, comment:null), FieldSchema(name:hudi_cow.rowid, type:string, comment:null), FieldSchema(name:hudi_cow.precomb, type:bigint, comment:null), FieldSchema(name:hudi_cow.name, type:string, comment:null), FieldSchema(name:hudi_cow.versionid, type:string, comment:null), FieldSchema(name:hudi_cow.tobedeletedstr, type:string, comment:null), FieldSchema(name:hudi_cow.inttolong, type:int, comment:null), FieldSchema(name:hudi_cow.longtoint, type:bigint, comment:null), FieldSchema(name:hudi_cow.partitionid, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020); Time taken: 0.31 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020): select * from hudi_cow
INFO  : Completed executing command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020); Time taken: 0.001 seconds
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
| hudi_cow._hoodie_commit_time  | hudi_cow._hoodie_commit_seqno  | hudi_cow._hoodie_record_key  | hudi_cow._hoodie_partition_path  |             hudi_cow._hoodie_file_name             | hudi_cow.rowid  | hudi_cow.precomb  | hudi_cow.name  | hudi_cow.versionid  | hudi_cow.tobedeletedstr  | hudi_cow.inttolong  | hudi_cow.longtoint  | hudi_cow.partitionid  |
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
| 20250120110832477             | 20250120110832477_1_0          | row_1                        | partitionId=2021/01/01           | 8273822c-3db4-4f41-9369-52a48b289ec9-0_1-68-121_20250120110832477.parquet | row_1           | 0                 | bob            | v_0                 | toBeDel0                 | 0                   | 1000000             | 2021/01/01            |
| 20250120110832477             | 20250120110832477_1_1          | row_2                        | partitionId=2021/01/01           | 8273822c-3db4-4f41-9369-52a48b289ec9-0_1-68-121_20250120110832477.parquet | row_2           | 0                 | john           | v_0                 | toBeDel0                 | 0                   | 1000000             | 2021/01/01            |
| 20250120110832477             | 20250120110832477_0_0          | row_3                        | partitionId=2021/01/02           | 328f6651-ed04-49db-be45-9c3671bcda8d-0_0-68-120_20250120110832477.parquet | row_3           | 0                 | tom            | v_0                 | toBeDel0                 | 0                   | 1000000             | 2021/01/02            |
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
3 rows selected (1.282 seconds)
0: jdbc:hive2://hive-server:10000/default> 

I have placed hudi-hadoop-mr-bundle-1.0.0-rc1.jar to the hive lib.

hive@9b304fb95236:/opt/hive$ ls lib/*hudi*
lib/hudi-hadoop-mr-bundle-1.0.0-rc1.jar

@TigerTORA
Copy link
Author

Hi @TigerTORA

I have successfully read data from Hive using your code. Could you please review the Hive query logs to identify any potential issues?

0: jdbc:hive2://hive-server:10000/default> use my_db;
0: jdbc:hive2://hive-server:10000/default> show tables;
+-----------+
| tab_name |
+-----------+
| hudi_cow |
+-----------+
0: jdbc:hive2://hive-server:10000/default> select * from hudi_cow;
INFO : Compiling command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020): select * from hudi_cow
INFO : No Stats for my_db@hudi_cow, Columns: _hoodie_commit_time, inttolong, longtoint, _hoodie_partition_path, versionid, precomb, _hoodie_record_key, name, tobedeletedstr, _hoodie_commit_seqno, _hoodie_file_name, rowid
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:hudi_cow._hoodie_commit_time, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_commit_seqno, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_record_key, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_partition_path, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_file_name, type:string, comment:null), FieldSchema(name:hudi_cow.rowid, type:string, comment:null), FieldSchema(name:hudi_cow.precomb, type:bigint, comment:null), FieldSchema(name:hudi_cow.name, type:string, comment:null), FieldSchema(name:hudi_cow.versionid, type:string, comment:null), FieldSchema(name:hudi_cow.tobedeletedstr, type:string, comment:null), FieldSchema(name:hudi_cow.inttolong, type:int, comment:null), FieldSchema(name:hudi_cow.longtoint, type:bigint, comment:null), FieldSchema(name:hudi_cow.partitionid, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020); Time taken: 0.31 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020): select * from hudi_cow
INFO : Completed executing command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020); Time taken: 0.001 seconds
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
| hudi_cow._hoodie_commit_time | hudi_cow._hoodie_commit_seqno | hudi_cow._hoodie_record_key | hudi_cow._hoodie_partition_path | hudi_cow._hoodie_file_name | hudi_cow.rowid | hudi_cow.precomb | hudi_cow.name | hudi_cow.versionid | hudi_cow.tobedeletedstr | hudi_cow.inttolong | hudi_cow.longtoint | hudi_cow.partitionid |
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
| 20250120110832477 | 20250120110832477_1_0 | row_1 | partitionId=2021/01/01 | 8273822c-3db4-4f41-9369-52a48b289ec9-0_1-68-121_20250120110832477.parquet | row_1 | 0 | bob | v_0 | toBeDel0 | 0 | 1000000 | 2021/01/01 |
| 20250120110832477 | 20250120110832477_1_1 | row_2 | partitionId=2021/01/01 | 8273822c-3db4-4f41-9369-52a48b289ec9-0_1-68-121_20250120110832477.parquet | row_2 | 0 | john | v_0 | toBeDel0 | 0 | 1000000 | 2021/01/01 |
| 20250120110832477 | 20250120110832477_0_0 | row_3 | partitionId=2021/01/02 | 328f6651-ed04-49db-be45-9c3671bcda8d-0_0-68-120_20250120110832477.parquet | row_3 | 0 | tom | v_0 | toBeDel0 | 0 | 1000000 | 2021/01/02 |
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
3 rows selected (1.282 seconds)
0: jdbc:hive2://hive-server:10000/default>
I have placed hudi-hadoop-mr-bundle-1.0.0-rc1.jar to the hive lib.

hive@9b304fb95236:/opt/hive$ ls lib/hudi
lib/hudi-hadoop-mr-bundle-1.0.0-rc1.jar

First of all, I would like to sincerely thank everyone for their help. This is my query log

hiveserver2.tar.gz

@rangareddy
Copy link

Hi @TigerTORA

Could you please check the following:

  1. Are you able to read the Hudi table created with an older version, such as Hudi 0.14 or 0.15?
  2. Have you correctly verified the Ranger permissions?
  3. If possible, could you fix the error: java.lang.IllegalStateException: Error loading from /home/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/bin/../lib/hive/conf/redaction-rules.json?"

@ad1happy2go ad1happy2go moved this from ⏳ Awaiting Triage to 👤 User Action in Hudi Issue Support Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hive Issues related to hive priority:critical production down; pipelines stalled; Need help asap. release-1.0.0
Projects
Status: 👤 User Action
Development

No branches or pull requests

3 participants