Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle timestamp_ntz in delta conversion target #647

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,9 @@
import org.apache.xtable.spi.sync.ConversionTarget;

public class DeltaConversionTarget implements ConversionTarget {
private static final String MIN_READER_VERSION = String.valueOf(1);
private static final String MIN_READER_VERSION = String.valueOf(3);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Handle this gracefully by calling the function which upgrades version in delta codebase, this was the pending comment from the previous PR that wasn't addressed.

#428

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to timestamp_ntz? Increasing the min_reader version could break certain consumers using old libraries. If it's unrelated, could we create a separate issue for this and perhaps make it configurable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delta does the automatic version upgrade today based on the schema of the table and other properties, if we call the same function when initializing the delta table it shouldn't break anything.

https://docs.delta.io/2.0.0/versioning.html
When creating a table, Delta Lake chooses the minimum required protocol version based on table characteristics such as the schema or table properties

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pointers.
However, I believe this change will force an upgrade to the min_reader version, even if none of the table features require it. I support updating the version, but lets not mix it in this PR unless it is needed for timezone handling.

// gets access to generated columns.
private static final String MIN_WRITER_VERSION = String.valueOf(4);
private static final String MIN_WRITER_VERSION = String.valueOf(7);

private DeltaLog deltaLog;
private DeltaSchemaExtractor schemaExtractor;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ public class DeltaSchemaExtractor {
private static final String DELTA_COLUMN_MAPPING_ID = "delta.columnMapping.id";
private static final String COMMENT = "comment";
private static final DeltaSchemaExtractor INSTANCE = new DeltaSchemaExtractor();
// Timestamps in Delta are microsecond precision by default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does XTable need to handle nanoseconds precision?

private static final Map<InternalSchema.MetadataKey, Object>
DEFAULT_TIMESTAMP_PRECISION_METADATA =
Collections.singletonMap(
InternalSchema.MetadataKey.TIMESTAMP_PRECISION, InternalSchema.MetadataValue.MICROS);

public static DeltaSchemaExtractor getInstance() {
return INSTANCE;
Expand Down Expand Up @@ -88,7 +93,6 @@ private DataType convertFieldType(InternalField field) {
case INT:
return DataTypes.IntegerType;
case LONG:
case TIMESTAMP_NTZ:
return DataTypes.LongType;
case BYTES:
case FIXED:
Expand All @@ -102,6 +106,8 @@ private DataType convertFieldType(InternalField field) {
return DataTypes.DateType;
case TIMESTAMP:
return DataTypes.TimestampType;
case TIMESTAMP_NTZ:
return DataTypes.TimestampNTZType;
case DOUBLE:
return DataTypes.DoubleType;
case DECIMAL:
Expand Down Expand Up @@ -206,11 +212,11 @@ private InternalSchema toInternalSchema(
break;
case "timestamp":
type = InternalType.TIMESTAMP;
// Timestamps in Delta are microsecond precision by default
metadata =
Collections.singletonMap(
InternalSchema.MetadataKey.TIMESTAMP_PRECISION,
InternalSchema.MetadataValue.MICROS);
metadata = DEFAULT_TIMESTAMP_PRECISION_METADATA;
break;
case "timestamp_ntz":
type = InternalType.TIMESTAMP_NTZ;
metadata = DEFAULT_TIMESTAMP_PRECISION_METADATA;
break;
case "struct":
StructType structType = (StructType) dataType;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -395,8 +395,8 @@ public void testTimestamps() {

StructType structRepresentationTimestampNtz =
new StructType()
.add("requiredTimestampNtz", DataTypes.LongType, false)
.add("optionalTimestampNtz", DataTypes.LongType, true);
.add("requiredTimestampNtz", DataTypes.TimestampNTZType, false)
.add("optionalTimestampNtz", DataTypes.TimestampNTZType, true);

Assertions.assertEquals(
structRepresentationTimestamp,
Expand Down
Loading