You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are some files with the suffix ".compact" in the sink folder of Spark checkpoint location.
The files compact all the history EsSinkStatus logs since the first index action.
I found the reason is that org.elasticsearch.spark.sql.streaming.EsSinkMetadataLog#compactLogs() returns all the input logs rather than drops the expired ones.
I think the following code snippet can resolve the issue. override def compactLogs(logs: Seq[EsSinkStatus]): Seq[EsSinkStatus] = { val minMills = System.currentTimeMillis() - fileCleanupDelayMs; logs.filter(_.execTimeMillis >= minMills) }
There are some files with the suffix ".compact" in the sink folder of Spark checkpoint location.
The files compact all the history EsSinkStatus logs since the first index action.
I found the reason is that org.elasticsearch.spark.sql.streaming.EsSinkMetadataLog#compactLogs() returns all the input logs rather than drops the expired ones.
override def compactLogs(logs: Seq[EsSinkStatus]): Seq[EsSinkStatus] = logs
I think it's better to add a code snippet which filter the logs and drop the expired ones.
The text was updated successfully, but these errors were encountered: