[SUPPORT]I can see the Hudi table and data in spark-shell, but in Hive, the table is visible, yet no data can be queried. #12673

TigerTORA · 2025-01-19T02:28:10Z

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at [email protected].
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

I can see the Hudi table and data in spark-shell, but in Hive, the table is visible, yet no data can be queried.

To Reproduce

Steps to reproduce the behavior:

I can see the Hudi table and its data that I created in the spark-shell.

scala> spark.sql("SELECT *  FROM  my_db.hudi_cow").show()
+-------------------+--------------------+------------------+----------------------+--------------------+-----+-------+----+---------+--------------+---------+---------+-----------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|   _hoodie_file_name|rowId|preComb|name|versionId|toBeDeletedStr|intToLong|longToInt|partitionId|
+-------------------+--------------------+------------------+----------------------+--------------------+-----+-------+----+---------+--------------+---------+---------+-----------+
|  20250106212648252|20250106212648252...|             row_1|  partitionId=2021/...|95e9a53d-a167-4d3...|row_1|      0| bob|      v_0|      toBeDel0|        0|  1000000| 2021/01/01|
|  20250106212648252|20250106212648252...|             row_2|  partitionId=2021/...|95e9a53d-a167-4d3...|row_2|      0|john|      v_0|      toBeDel0|        0|  1000000| 2021/01/01|
|  20250106212648252|20250106212648252...|             row_3|  partitionId=2021/...|e7e7da47-bd42-497...|row_3|      0| tom|      v_0|      toBeDel0|        0|  1000000| 2021/01/02|
+-------------------+--------------------+------------------+----------------------+--------------------+-----+-------+----+---------+--------------+---------+---------+-----------+

I can also see the table in Hive, but I'm unable to query any data.

0: jdbc:hive2://cdp73-1.test.com:2181,cdp73-2> select *from my_db.hudi_cow;
WARN  : WARNING! Query command could not be redacted.java.lang.IllegalStateException: Error loading from /home/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/bin/../lib/hive/conf/redaction-rules.json: java.io.FileNotFoundException: /home/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/bin/../lib/hive/conf/redaction-rules.json (No such file or directory)
INFO  : Compiling command(queryId=hive_20250118211448_449370b5-f63f-46dd-93b3-98bf8b4be147): select *from my_db.hudi_cow
DEBUG : Shutting down query select *from my_db.hudi_cow
+-------------------------------+--------------------------------+------------------------------+----------------------------------+-----------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
| hudi_cow._hoodie_commit_time  | hudi_cow._hoodie_commit_seqno  | hudi_cow._hoodie_record_key  | hudi_cow._hoodie_partition_path  | hudi_cow._hoodie_file_name  | hudi_cow.rowid  | hudi_cow.precomb  | hudi_cow.name  | hudi_cow.versionid  | hudi_cow.tobedeletedstr  | hudi_cow.inttolong  | hudi_cow.longtoint  | hudi_cow.partitionid  |
+-------------------------------+--------------------------------+------------------------------+----------------------------------+-----------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
+-------------------------------+--------------------------------+------------------------------+----------------------------------+-----------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
No rows selected (0.747 seconds)
0: jdbc:hive2://cdp73-1.test.com:2181,cdp73-2>`

here is my show create table in hive

+----------------------------------------------------+
|                   createtab_stmt                   |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `my_db`.`hudi_cow`(          |
|   `_hoodie_commit_time` string COMMENT '',         |
|   `_hoodie_commit_seqno` string COMMENT '',        |
|   `_hoodie_record_key` string COMMENT '',          |
|   `_hoodie_partition_path` string COMMENT '',      |
|   `_hoodie_file_name` string COMMENT '',           |
|   `rowid` string COMMENT '',                       |
|   `precomb` bigint COMMENT '',                     |
|   `name` string COMMENT '',                        |
|   `versionid` string COMMENT '',                   |
|   `tobedeletedstr` string COMMENT '',              |
|   `inttolong` int COMMENT '',                      |
|   `longtoint` bigint COMMENT '')                   |
| PARTITIONED BY (                                   |
|   `partitionid` string COMMENT '')                 |
| ROW FORMAT SERDE                                   |
|   'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'  |
| WITH SERDEPROPERTIES (                             |
|   'hoodie.query.as.ro.table'='false',              |
|   'path'='/user/hive/warehouse/hudi_cow')          |
| STORED AS INPUTFORMAT                              |
|   'org.apache.hudi.hadoop.HoodieParquetInputFormat'  |
| OUTPUTFORMAT                                       |
|   'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION                                           |
|   'hdfs://nameservice1/user/hive/warehouse/hudi_cow' |
| TBLPROPERTIES (                                    |
|   'last_commit_completion_time_sync'='20250106212709780',  |
|   'last_commit_time_sync'='20250106212648252',     |
|   'spark.sql.create.version'='3.4.1.7.3.1.0-197',  |
|   'spark.sql.sources.provider'='hudi',             |
|   'spark.sql.sources.schema.numPartCols'='1',      |
|   'spark.sql.sources.schema.numParts'='1',         |
|   'spark.sql.sources.schema.part.0'='{"type":"struct","fields":[{"name":"_hoodie_commit_time","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_commit_seqno","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_record_key","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_partition_path","type":"string","nullable":true,"metadata":{}},{"name":"_hoodie_file_name","type":"string","nullable":true,"metadata":{}},{"name":"rowId","type":"string","nullable":true,"metadata":{}},{"name":"preComb","type":"long","nullable":true,"metadata":{}},{"name":"name","type":"string","nullable":true,"metadata":{}},{"name":"versionId","type":"string","nullable":true,"metadata":{}},{"name":"toBeDeletedStr","type":"string","nullable":true,"metadata":{}},{"name":"intToLong","type":"integer","nullable":true,"metadata":{}},{"name":"longToInt","type":"long","nullable":true,"metadata":{}},{"name":"partitionId","type":"string","nullable":true,"metadata":{}}]}',  |
|   'spark.sql.sources.schema.partCol.0'='partitionId',  |
|   'transient_lastDdlTime'='1736216830')            |
+----------------------------------------------------+

here is my sql create table in spark-shell

// spark-shell
import org.apache.hudi.QuickstartUtils._
import scala.collection.JavaConversions._
import org.apache.spark.sql.SaveMode._
import org.apache.hudi.DataSourceReadOptions._
import org.apache.hudi.DataSourceWriteOptions._
import org.apache.hudi.config.HoodieWriteConfig._
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
 
 
val databaseName = "my_db"
val tableName = "hudi_cow"
val basePath = "/user/hive/warehouse/hudi_cow"
 
val schema = StructType(Array(
StructField("rowId", StringType,true),
StructField("partitionId", StringType,true),
StructField("preComb", LongType,true),
StructField("name", StringType,true),
StructField("versionId", StringType,true),
StructField("toBeDeletedStr", StringType,true),
StructField("intToLong", IntegerType,true),
StructField("longToInt", LongType,true)
))
 
val data0 = Seq(Row("row_1", "2021/01/01",0L,"bob","v_0","toBeDel0",0,1000000L), 
               Row("row_2", "2021/01/01",0L,"john","v_0","toBeDel0",0,1000000L), 
               Row("row_3", "2021/01/02",0L,"tom","v_0","toBeDel0",0,1000000L))
 
var dfFromData0 = spark.createDataFrame(data0,schema)

dfFromData0.write.format("hudi").
  options(getQuickstartWriteConfigs).
  option("hoodie.datasource.write.precombine.field", "preComb").
  option("hoodie.datasource.write.recordkey.field", "rowId").
  option("hoodie.datasource.write.partitionpath.field", "partitionId").
  option("hoodie.database.name", databaseName).
  option("hoodie.table.name", tableName).
  option("hoodie.datasource.write.table.type", "COPY_ON_WRITE").
  option("hoodie.datasource.write.operation", "upsert").
  option("hoodie.datasource.write.hive_style_partitioning","true").
  option("hoodie.datasource.meta.sync.enable", "true").
  option("hoodie.datasource.hive_sync.mode", "hms").
  option("hoodie.embed.timeline.server", "false").
  mode(Overwrite).
  save(basePath)

Expected behavior

Environment Description

Hudi version :1.0.0
Spark version :3.4.1.7.3.1.0-197
Hive version :3.1.3000.7.3.1.0-197
Hadoop version :3.1.1.7.3.1.0-197
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) :no

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

The text was updated successfully, but these errors were encountered:

ad1happy2go · 2025-01-20T04:35:25Z

@TigerTORA Did you add hive bundle for version 1.0.0 to your hive libs, or you can do add jar under hive cli too.

TigerTORA · 2025-01-20T05:17:52Z

@TigerTORA Did you add hive bundle for version 1.0.0 to your hive libs, or you can do add jar under hive cli too.

yes I have add hive bundle for version 1.0.0 to my hive libs

[root@cdp73-1 ~]# ls -al /opt/cloudera/parcels/CDH/lib/hive/lib/*hudi*
lrwxrwxrwx. 1 root root       46 Dec  3 21:55 /opt/cloudera/parcels/CDH/lib/hive/lib/hudi-common-0.5.0-incubating.jar -> ../../../jars/hudi-common-0.5.0-incubating.jar
lrwxrwxrwx. 1 root root       49 Dec  3 21:55 /opt/cloudera/parcels/CDH/lib/hive/lib/hudi-hadoop-mr-0.5.0-incubating.jar -> ../../../jars/hudi-hadoop-mr-0.5.0-incubating.jar
-rw-r--r--. 1 root root 44000261 Jan  5 02:48 /opt/cloudera/parcels/CDH/lib/hive/lib/hudi-hadoop-mr-bundle-1.0.0.jar
-rw-r--r--. 1 root root 48602464 Jan  4 01:08 /opt/cloudera/parcels/CDH/lib/hive/lib/hudi-hive-sync-bundle-1.0.0.jar
[root@cdp73-1 ~]#

rangareddy · 2025-01-20T11:15:13Z

Hi @TigerTORA

I have successfully read data from Hive using your code. Could you please review the Hive query logs to identify any potential issues?

0: jdbc:hive2://hive-server:10000/default> use my_db;
0: jdbc:hive2://hive-server:10000/default> show tables;
+-----------+
| tab_name  |
+-----------+
| hudi_cow  |
+-----------+
0: jdbc:hive2://hive-server:10000/default> select * from hudi_cow;
INFO  : Compiling command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020): select * from hudi_cow
INFO  : No Stats for my_db@hudi_cow, Columns: _hoodie_commit_time, inttolong, longtoint, _hoodie_partition_path, versionid, precomb, _hoodie_record_key, name, tobedeletedstr, _hoodie_commit_seqno, _hoodie_file_name, rowid
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:hudi_cow._hoodie_commit_time, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_commit_seqno, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_record_key, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_partition_path, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_file_name, type:string, comment:null), FieldSchema(name:hudi_cow.rowid, type:string, comment:null), FieldSchema(name:hudi_cow.precomb, type:bigint, comment:null), FieldSchema(name:hudi_cow.name, type:string, comment:null), FieldSchema(name:hudi_cow.versionid, type:string, comment:null), FieldSchema(name:hudi_cow.tobedeletedstr, type:string, comment:null), FieldSchema(name:hudi_cow.inttolong, type:int, comment:null), FieldSchema(name:hudi_cow.longtoint, type:bigint, comment:null), FieldSchema(name:hudi_cow.partitionid, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020); Time taken: 0.31 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020): select * from hudi_cow
INFO  : Completed executing command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020); Time taken: 0.001 seconds
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
| hudi_cow._hoodie_commit_time  | hudi_cow._hoodie_commit_seqno  | hudi_cow._hoodie_record_key  | hudi_cow._hoodie_partition_path  |             hudi_cow._hoodie_file_name             | hudi_cow.rowid  | hudi_cow.precomb  | hudi_cow.name  | hudi_cow.versionid  | hudi_cow.tobedeletedstr  | hudi_cow.inttolong  | hudi_cow.longtoint  | hudi_cow.partitionid  |
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
| 20250120110832477             | 20250120110832477_1_0          | row_1                        | partitionId=2021/01/01           | 8273822c-3db4-4f41-9369-52a48b289ec9-0_1-68-121_20250120110832477.parquet | row_1           | 0                 | bob            | v_0                 | toBeDel0                 | 0                   | 1000000             | 2021/01/01            |
| 20250120110832477             | 20250120110832477_1_1          | row_2                        | partitionId=2021/01/01           | 8273822c-3db4-4f41-9369-52a48b289ec9-0_1-68-121_20250120110832477.parquet | row_2           | 0                 | john           | v_0                 | toBeDel0                 | 0                   | 1000000             | 2021/01/01            |
| 20250120110832477             | 20250120110832477_0_0          | row_3                        | partitionId=2021/01/02           | 328f6651-ed04-49db-be45-9c3671bcda8d-0_0-68-120_20250120110832477.parquet | row_3           | 0                 | tom            | v_0                 | toBeDel0                 | 0                   | 1000000             | 2021/01/02            |
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
3 rows selected (1.282 seconds)
0: jdbc:hive2://hive-server:10000/default>

I have placed hudi-hadoop-mr-bundle-1.0.0-rc1.jar to the hive lib.

hive@9b304fb95236:/opt/hive$ ls lib/*hudi*
lib/hudi-hadoop-mr-bundle-1.0.0-rc1.jar

TigerTORA · 2025-01-21T05:03:51Z

Hi @TigerTORA

I have successfully read data from Hive using your code. Could you please review the Hive query logs to identify any potential issues?

0: jdbc:hive2://hive-server:10000/default> use my_db;
0: jdbc:hive2://hive-server:10000/default> show tables;
+-----------+
| tab_name |
+-----------+
| hudi_cow |
+-----------+
0: jdbc:hive2://hive-server:10000/default> select * from hudi_cow;
INFO : Compiling command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020): select * from hudi_cow
INFO : No Stats for my_db@hudi_cow, Columns: _hoodie_commit_time, inttolong, longtoint, _hoodie_partition_path, versionid, precomb, _hoodie_record_key, name, tobedeletedstr, _hoodie_commit_seqno, _hoodie_file_name, rowid
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:hudi_cow._hoodie_commit_time, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_commit_seqno, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_record_key, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_partition_path, type:string, comment:null), FieldSchema(name:hudi_cow._hoodie_file_name, type:string, comment:null), FieldSchema(name:hudi_cow.rowid, type:string, comment:null), FieldSchema(name:hudi_cow.precomb, type:bigint, comment:null), FieldSchema(name:hudi_cow.name, type:string, comment:null), FieldSchema(name:hudi_cow.versionid, type:string, comment:null), FieldSchema(name:hudi_cow.tobedeletedstr, type:string, comment:null), FieldSchema(name:hudi_cow.inttolong, type:int, comment:null), FieldSchema(name:hudi_cow.longtoint, type:bigint, comment:null), FieldSchema(name:hudi_cow.partitionid, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020); Time taken: 0.31 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
INFO : Executing command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020): select * from hudi_cow
INFO : Completed executing command(queryId=hive_20250120110910_bacc7a57-aabb-467f-93b7-da01cf279020); Time taken: 0.001 seconds
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
| hudi_cow._hoodie_commit_time | hudi_cow._hoodie_commit_seqno | hudi_cow._hoodie_record_key | hudi_cow._hoodie_partition_path | hudi_cow._hoodie_file_name | hudi_cow.rowid | hudi_cow.precomb | hudi_cow.name | hudi_cow.versionid | hudi_cow.tobedeletedstr | hudi_cow.inttolong | hudi_cow.longtoint | hudi_cow.partitionid |
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
| 20250120110832477 | 20250120110832477_1_0 | row_1 | partitionId=2021/01/01 | 8273822c-3db4-4f41-9369-52a48b289ec9-0_1-68-121_20250120110832477.parquet | row_1 | 0 | bob | v_0 | toBeDel0 | 0 | 1000000 | 2021/01/01 |
| 20250120110832477 | 20250120110832477_1_1 | row_2 | partitionId=2021/01/01 | 8273822c-3db4-4f41-9369-52a48b289ec9-0_1-68-121_20250120110832477.parquet | row_2 | 0 | john | v_0 | toBeDel0 | 0 | 1000000 | 2021/01/01 |
| 20250120110832477 | 20250120110832477_0_0 | row_3 | partitionId=2021/01/02 | 328f6651-ed04-49db-be45-9c3671bcda8d-0_0-68-120_20250120110832477.parquet | row_3 | 0 | tom | v_0 | toBeDel0 | 0 | 1000000 | 2021/01/02 |
+-------------------------------+--------------------------------+------------------------------+----------------------------------+----------------------------------------------------+-----------------+-------------------+----------------+---------------------+--------------------------+---------------------+---------------------+-----------------------+
3 rows selected (1.282 seconds)
0: jdbc:hive2://hive-server:10000/default>
I have placed hudi-hadoop-mr-bundle-1.0.0-rc1.jar to the hive lib.

hive@9b304fb95236:/opt/hive$ ls lib/hudi
lib/hudi-hadoop-mr-bundle-1.0.0-rc1.jar

First of all, I would like to sincerely thank everyone for their help. This is my query log

hiveserver2.tar.gz

rangareddy · 2025-01-21T06:46:32Z

Hi @TigerTORA

Could you please check the following:

Are you able to read the Hudi table created with an older version, such as Hudi 0.14 or 0.15?
Have you correctly verified the Ranger permissions?
If possible, could you fix the error: java.lang.IllegalStateException: Error loading from /home/opt/cloudera/parcels/CDH-7.3.1-1.cdh7.3.1.p0.60371244/bin/../lib/hive/conf/redaction-rules.json?"

TigerTORA changed the title ~~[SUPPORT]~~ [SUPPORT]I can see the Hudi table and data in spark-shell, but in Hive, the table is visible, yet no data can be queried. Jan 19, 2025

ad1happy2go added priority:critical production down; pipelines stalled; Need help asap. hive Issues related to hive release-1.0.0 labels Jan 20, 2025

ad1happy2go added this to Hudi Issue Support Jan 20, 2025

github-project-automation bot moved this to ⏳ Awaiting Triage in Hudi Issue Support Jan 20, 2025

ad1happy2go moved this from ⏳ Awaiting Triage to 👤 User Action in Hudi Issue Support Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SUPPORT]I can see the Hudi table and data in spark-shell, but in Hive, the table is visible, yet no data can be queried. #12673

[SUPPORT]I can see the Hudi table and data in spark-shell, but in Hive, the table is visible, yet no data can be queried. #12673

TigerTORA commented Jan 19, 2025 •

edited

Loading

ad1happy2go commented Jan 20, 2025

TigerTORA commented Jan 20, 2025 •

edited

Loading

rangareddy commented Jan 20, 2025

TigerTORA commented Jan 21, 2025

rangareddy commented Jan 21, 2025

[SUPPORT]I can see the Hudi table and data in spark-shell, but in Hive, the table is visible, yet no data can be queried. #12673

[SUPPORT]I can see the Hudi table and data in spark-shell, but in Hive, the table is visible, yet no data can be queried. #12673

Comments

TigerTORA commented Jan 19, 2025 • edited Loading

ad1happy2go commented Jan 20, 2025

TigerTORA commented Jan 20, 2025 • edited Loading

rangareddy commented Jan 20, 2025

TigerTORA commented Jan 21, 2025

rangareddy commented Jan 21, 2025

TigerTORA commented Jan 19, 2025 •

edited

Loading

TigerTORA commented Jan 20, 2025 •

edited

Loading