You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Specify the partition range for Amoro optimization by adding self-optimizing.partition-filter, similar to the function of the where parameter in Spark's rewrite_data_files procedures.
Use case/motivation
Currently, when Amoro optimizes Iceberg tables, it defaults to optimizing data from all partitions. However, this can bring about the following issues in practical use:
High cost of optimizing historical data: Some tables' historical data may not conform to Amoro's optimization rules, and optimizing all historical data can lead to resource waste and performance degradation.
Conflict between concurrent writing and optimization: Some historical partitions may have data repair operations involving deletions, which can take a relatively long duration. In such cases, it's preferable for Amoro to skip the optimization of these partitions to avoid conflicts.
Describe the solution
Add the self-optimizing.partition-filter parameter.
Subtasks
Add the self-optimizing.partition-filter parameter. @lintingbin
Description
Specify the partition range for Amoro optimization by adding self-optimizing.partition-filter, similar to the function of the where parameter in Spark's rewrite_data_files procedures.
Use case/motivation
Currently, when Amoro optimizes Iceberg tables, it defaults to optimizing data from all partitions. However, this can bring about the following issues in practical use:
High cost of optimizing historical data: Some tables' historical data may not conform to Amoro's optimization rules, and optimizing all historical data can lead to resource waste and performance degradation.
Conflict between concurrent writing and optimization: Some historical partitions may have data repair operations involving deletions, which can take a relatively long duration. In such cases, it's preferable for Amoro to skip the optimization of these partitions to avoid conflicts.
Describe the solution
Add the self-optimizing.partition-filter parameter.
Subtasks
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: