Skip to content

Commit 4cc50bc

Browse files
authored
[Spark][Kernel][Protocol] Merge InCommitTimestamp RFC, and remove the -preview suffix from feature name and configs. (#3416)
<!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> #### Which Delta project/connector is this regarding? <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [X] Spark - [ ] Standalone - [ ] Flink - [X] Kernel - [X] Other (Protocol) ## Description <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> 1. Merges the InCommitTimestamp RFC 2. Removes the -preview suffix from the feature name and properties. ## How was this patch tested? <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> Existing tests should cover this change. ## Does this PR introduce _any_ user-facing changes? <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> No
1 parent 3cebe54 commit 4cc50bc

File tree

10 files changed

+70
-33
lines changed

10 files changed

+70
-33
lines changed

PROTOCOL.md

Lines changed: 38 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -521,7 +521,7 @@ Specifically, to read the row-level changes made in a version, the following str
521521
Field Name | Data Type | Description
522522
-|-|-
523523
_commit_version|`Long`| The table version containing the change. This can be derived from the name of the Delta log file that contains actions.
524-
_commit_timestamp|`Timestamp`| The timestamp associated when the commit was created. This can be derived from the file modification time of the Delta log file that contains actions.
524+
_commit_timestamp|`Timestamp`| The timestamp associated when the commit was created. Depending on whether [In-Commit Timestamps](#in-commit-timestamps) are enabled, this is derived from either the `inCommitTimestamp` field of the `commitInfo` action of the version's Delta log file, or from the Delta log file's modification time.
525525

526526
##### Note for non-change data readers
527527

@@ -620,6 +620,8 @@ A delta file can optionally contain additional provenance information about what
620620

621621
Implementations are free to store any valid JSON-formatted data via the `commitInfo` action.
622622

623+
When [In-Commit Timestamps](#in-commit-timestamps) are enabled, writers are required to include a `commitInfo` action with every commit, which must include the `inCommitTimestamp` field. Also, the `commitInfo` action must be first action in the commit.
624+
623625
An example of storing provenance information related to an `INSERT` operation:
624626
```json
625627
{
@@ -1255,6 +1257,41 @@ The example above converts `configuration` field into JSON format, including esc
12551257
}
12561258
```
12571259

1260+
# In-Commit Timestamps
1261+
1262+
The In-Commit Timestamps writer feature strongly associates a monotonically increasing timestamp with each commit by storing it in the commit's metadata.
1263+
1264+
Enablement:
1265+
- The table must be on Writer Version 7.
1266+
- The feature `inCommitTimestamps` must exist in the table `protocol`'s `writerFeatures`.
1267+
- The table property `delta.enableInCommitTimestamps` must be set to `true`.
1268+
1269+
## Writer Requirements for In-Commit Timestamps
1270+
1271+
When In-Commit Timestamps is enabled, then:
1272+
1. Writers must write the `commitInfo` (see [Commit Provenance Information](#commit-provenance-information)) action in the commit.
1273+
2. The `commitInfo` action must be the first action in the commit.
1274+
3. The `commitInfo` action must include a field named `inCommitTimestamp`, of type `long` (see [Primitive Types](#primitive-types)), which represents the time (in milliseconds since the Unix epoch) when the commit is considered to have succeeded. It is the larger of two values:
1275+
- The time, in milliseconds since the Unix epoch, at which the writer attempted the commit
1276+
- One millisecond later than the previous commit's `inCommitTimestamp`
1277+
4. If the table has commits from a period when this feature was not enabled, provenance information around when this feature was enabled must be tracked in table properties:
1278+
- The property `delta.inCommitTimestampEnablementVersion` must be used to track the version of the table when this feature was enabled.
1279+
- The property `delta.inCommitTimestampEnablementTimestamp` must be the same as the `inCommitTimestamp` of the commit when this feature was enabled.
1280+
5. The `inCommitTimestamp` of the commit that enables this feature must be greater than the file modification time of the immediately preceding commit.
1281+
1282+
## Recommendations for Readers of Tables with In-Commit Timestamps
1283+
1284+
For tables with In-Commit timestamps enabled, readers should use the `inCommitTimestamp` as the commit timestamp for operations like time travel and [`DESCRIBE HISTORY`](https://docs.delta.io/latest/delta-utility.html#retrieve-delta-table-history).
1285+
If a table has commits from a period before In-Commit timestamps were enabled, the table properties `delta.inCommitTimestampEnablementVersion` and `delta.inCommitTimestampEnablementTimestamp` would be set and can be used to identify commits that don't have `inCommitTimestamp`.
1286+
To correctly determine the commit timestamp for these tables, readers can use the following rules:
1287+
1. For commits with version >= `delta.inCommitTimestampEnablementVersion`, readers should use the `inCommitTimestamp` field of the `commitInfo` action.
1288+
2. For commits with version < `delta.inCommitTimestampEnablementVersion`, readers should use the file modification timestamp.
1289+
1290+
Furthermore, when attempting timestamp-based time travel where table state must be fetched as of `timestamp X`, readers should use the following rules:
1291+
1. If `timestamp X` >= `delta.inCommitTimestampEnablementTimestamp`, only table versions >= `delta.inCommitTimestampEnablementVersion` should be considered for the query.
1292+
2. Otherwise, only table versions less than `delta.inCommitTimestampEnablementVersion` should be considered for the query.
1293+
1294+
12581295
# Requirements for Writers
12591296
This section documents additional requirements that writers must follow in order to preserve some of the higher level guarantees that Delta provides.
12601297

kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableConfig.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ public class TableConfig<T> {
8585
*/
8686
public static final TableConfig<Boolean> IN_COMMIT_TIMESTAMPS_ENABLED =
8787
new TableConfig<>(
88-
"delta.enableInCommitTimestamps-preview",
88+
"delta.enableInCommitTimestamps",
8989
"false", /* default values */
9090
(engineOpt, v) -> Boolean.valueOf(v),
9191
value -> true,
@@ -97,7 +97,7 @@ public class TableConfig<T> {
9797
*/
9898
public static final TableConfig<Optional<Long>> IN_COMMIT_TIMESTAMP_ENABLEMENT_VERSION =
9999
new TableConfig<>(
100-
"delta.inCommitTimestampEnablementVersion-preview",
100+
"delta.inCommitTimestampEnablementVersion",
101101
null, /* default values */
102102
(engineOpt, v) -> Optional.ofNullable(v).map(Long::valueOf),
103103
value -> true,
@@ -110,7 +110,7 @@ public class TableConfig<T> {
110110
*/
111111
public static final TableConfig<Optional<Long>> IN_COMMIT_TIMESTAMP_ENABLEMENT_TIMESTAMP =
112112
new TableConfig<>(
113-
"delta.inCommitTimestampEnablementTimestamp-preview",
113+
"delta.inCommitTimestampEnablementTimestamp",
114114
null, /* default values */
115115
(engineOpt, v) -> Optional.ofNullable(v).map(Long::valueOf),
116116
value -> true,

kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableFeatures.java

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ public class TableFeatures {
3838
new HashSet<String>() {
3939
{
4040
add("appendOnly");
41-
add("inCommitTimestamp-preview");
41+
add("inCommitTimestamp");
4242
add("columnMapping");
4343
}
4444
});
@@ -84,8 +84,8 @@ public static void validateReadSupportedTable(
8484
* <ul>
8585
* <li>protocol writer version 1.
8686
* <li>protocol writer version 2 only with appendOnly feature enabled.
87-
* <li>protocol writer version 7 with {@code appendOnly}, {@code inCommitTimestamp-preview},
88-
* {@code columnMapping} feature enabled.
87+
* <li>protocol writer version 7 with {@code appendOnly}, {@code inCommitTimestamp}, {@code
88+
* columnMapping} feature enabled.
8989
* </ul>
9090
*
9191
* @param protocol Table protocol
@@ -121,7 +121,7 @@ public static void validateWriteSupportedTable(
121121
// Only supported writer features as of today in Kernel
122122
case "appendOnly":
123123
break;
124-
case "inCommitTimestamp-preview":
124+
case "inCommitTimestamp":
125125
break;
126126
case "columnMapping":
127127
break;
@@ -158,9 +158,9 @@ public static Tuple2<Integer, Integer> minProtocolVersionFromAutomaticallyEnable
158158

159159
/**
160160
* Extract the writer features that should be enabled automatically based on the metadata which
161-
* are not already enabled. For example, the {@code inCommitTimestamp-preview} feature should be
162-
* enabled when the delta property name (delta.enableInCommitTimestamps-preview) is set to true in
163-
* the metadata if it is not already enabled.
161+
* are not already enabled. For example, the {@code inCommitTimestamp} feature should be enabled
162+
* when the delta property name (delta.enableInCommitTimestamps) is set to true in the metadata if
163+
* it is not already enabled.
164164
*
165165
* @param engine the engine to use for IO operations
166166
* @param metadata the metadata of the table
@@ -184,7 +184,7 @@ public static Set<String> extractAutomaticallyEnabledWriterFeatures(
184184
*/
185185
private static int getMinReaderVersion(String feature) {
186186
switch (feature) {
187-
case "inCommitTimestamp-preview":
187+
case "inCommitTimestamp":
188188
return 3;
189189
default:
190190
return 1;
@@ -199,7 +199,7 @@ private static int getMinReaderVersion(String feature) {
199199
*/
200200
private static int getMinWriterVersion(String feature) {
201201
switch (feature) {
202-
case "inCommitTimestamp-preview":
202+
case "inCommitTimestamp":
203203
return 7;
204204
default:
205205
return 2;
@@ -218,7 +218,7 @@ private static int getMinWriterVersion(String feature) {
218218
private static boolean metadataRequiresWriterFeatureToBeEnabled(
219219
Engine engine, Metadata metadata, String feature) {
220220
switch (feature) {
221-
case "inCommitTimestamp-preview":
221+
case "inCommitTimestamp":
222222
return TableConfig.isICTEnabled(engine, metadata);
223223
default:
224224
return false;

kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/CommitInfo.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,7 @@ public static long getRequiredInCommitTimestamp(
177177
new InvalidTableException(
178178
dataPath.toString(),
179179
String.format(
180-
"This table has the feature inCommitTimestamp-preview "
180+
"This table has the feature inCommitTimestamp "
181181
+ "enabled which requires the presence of the CommitInfo action "
182182
+ "in every commit. However, the CommitInfo action is "
183183
+ "missing from commit version %s.",
@@ -187,7 +187,7 @@ public static long getRequiredInCommitTimestamp(
187187
new InvalidTableException(
188188
dataPath.toString(),
189189
String.format(
190-
"This table has the feature inCommitTimestamp-preview "
190+
"This table has the feature inCommitTimestamp "
191191
+ "enabled which requires the presence of inCommitTimestamp in the "
192192
+ "CommitInfo action. However, this field has not "
193193
+ "been set in commit version %s.",

kernel/kernel-api/src/test/scala/io/delta/kernel/internal/TableFeaturesSuite.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ class TableFeaturesSuite extends AnyFunSuite {
6868
checkSupported(createTestProtocol(minWriterVersion = 7))
6969
}
7070

71-
Seq("appendOnly", "inCommitTimestamp-preview", "columnMapping")
71+
Seq("appendOnly", "inCommitTimestamp", "columnMapping")
7272
.foreach { supportedWriterFeature =>
7373
test(s"validateWriteSupported: protocol 7 with $supportedWriterFeature") {
7474
checkSupported(createTestProtocol(minWriterVersion = 7, supportedWriterFeature))

kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/InCommitTimestampSuite.scala

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase {
7777
assert(ver0Snapshot.getTimestamp(engine) === beforeCommitAttemptStartTime + 1)
7878
assert(
7979
getInCommitTimestamp(engine, table, version = 0).get === ver0Snapshot.getTimestamp(engine))
80-
assertHasWriterFeature(ver0Snapshot, "inCommitTimestamp-preview")
80+
assertHasWriterFeature(ver0Snapshot, "inCommitTimestamp")
8181
}
8282
}
8383

@@ -94,7 +94,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase {
9494

9595
val ver0Snapshot = table.getLatestSnapshot(engine).asInstanceOf[SnapshotImpl]
9696
assertMetadataProp(engine, ver0Snapshot, IN_COMMIT_TIMESTAMPS_ENABLED, false)
97-
assertHasNoWriterFeature(ver0Snapshot, "inCommitTimestamp-preview")
97+
assertHasNoWriterFeature(ver0Snapshot, "inCommitTimestamp")
9898
assert(getInCommitTimestamp(engine, table, version = 0).isEmpty)
9999

100100
setTablePropAndVerify(
@@ -106,7 +106,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase {
106106
expectedValue = true)
107107

108108
val ver1Snapshot = table.getLatestSnapshot(engine).asInstanceOf[SnapshotImpl]
109-
assertHasWriterFeature(ver1Snapshot, "inCommitTimestamp-preview")
109+
assertHasWriterFeature(ver1Snapshot, "inCommitTimestamp")
110110
assert(ver1Snapshot.getTimestamp(engine) > ver0Snapshot.getTimestamp(engine))
111111
assert(
112112
getInCommitTimestamp(engine, table, version = 1).get === ver1Snapshot.getTimestamp(engine))
@@ -168,7 +168,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase {
168168
assert(ex.getMessage.contains(String.format(
169169
"This table has the feature %s enabled which requires the presence of the " +
170170
"CommitInfo action in every commit. However, the CommitInfo action is " +
171-
"missing from commit version %s.", "inCommitTimestamp-preview", "0")))
171+
"missing from commit version %s.", "inCommitTimestamp", "0")))
172172
}
173173
}
174174

@@ -214,7 +214,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase {
214214
assert(ex.getMessage.contains(String.format(
215215
"This table has the feature %s enabled which requires the presence of " +
216216
"inCommitTimestamp in the CommitInfo action. However, this field has not " +
217-
"been set in commit version %s.", "inCommitTimestamp-preview", "0")))
217+
"been set in commit version %s.", "inCommitTimestamp", "0")))
218218
}
219219
}
220220

@@ -299,7 +299,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase {
299299
expectedValue = true)
300300
val protocol = getProtocolActionFromCommit(engine, table, 0)
301301
assert(protocol.isDefined)
302-
assert(VectorUtils.toJavaList(protocol.get.getArray(3)).contains("inCommitTimestamp-preview"))
302+
assert(VectorUtils.toJavaList(protocol.get.getArray(3)).contains("inCommitTimestamp"))
303303

304304
setTablePropAndVerify(
305305
engine = engine,
@@ -349,9 +349,9 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase {
349349
" \"name\" : \"id\",\n \"type\" : \"integer\",\n \"nullable\" : true, \n" +
350350
" \"metadata\" : {}\n} ]\n}', " +
351351
"partitionColumns=List(), createdTime=Optional[%s], " +
352-
"configuration={delta.enableInCommitTimestamps-preview=true, " +
353-
"delta.inCommitTimestampEnablementVersion-preview=1, " +
354-
"delta.inCommitTimestampEnablementTimestamp-preview=%s}}",
352+
"configuration={delta.inCommitTimestampEnablementTimestamp=%s, " +
353+
"delta.enableInCommitTimestamps=true, " +
354+
"delta.inCommitTimestampEnablementVersion=1}}",
355355
metadata.getId,
356356
metadata.getCreatedTime.get,
357357
inCommitTimestamp.toString))
@@ -397,7 +397,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase {
397397
verifyWrittenContent(tablePath, testSchema, expData)
398398
verifyTableProperties(tablePath,
399399
ListMap(IN_COMMIT_TIMESTAMPS_ENABLED.getKey -> true,
400-
"delta.feature.inCommitTimestamp-preview" -> "supported",
400+
"delta.feature.inCommitTimestamp" -> "supported",
401401
IN_COMMIT_TIMESTAMP_ENABLEMENT_TIMESTAMP.getKey
402402
-> getInCommitTimestamp(engine, table, version = 1).get,
403403
IN_COMMIT_TIMESTAMP_ENABLEMENT_VERSION.getKey -> 1L),
@@ -542,7 +542,7 @@ class InCommitTimestampSuite extends DeltaTableWriteSuiteBase {
542542
assert(ex.getMessage.contains(String.format(
543543
"This table has the feature %s enabled which requires the presence of the " +
544544
"CommitInfo action in every commit. However, the CommitInfo action is " +
545-
"missing from commit version %s.", "inCommitTimestamp-preview", "2")))
545+
"missing from commit version %s.", "inCommitTimestamp", "2")))
546546
}
547547
}
548548

protocol_rfcs/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,6 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024,
1818

1919
| Date proposed | RFC file | Github issue | RFC title |
2020
|:--------------|:---------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------|:---------------------------------------|
21-
| 2023-02-02 | [in-commit-timestamps.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/in-commit-timestamps.md) | https://github.com/delta-io/delta/issues/2532 | In-Commit Timestamps |
2221
| 2023-02-09 | [type-widening.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/type-widening.md) | https://github.com/delta-io/delta/issues/2623 | Type Widening |
2322
| 2023-02-14 | [managed-commits.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/managed-commits.md) | https://github.com/delta-io/delta/issues/2598 | Managed Commits |
2423
| 2023-02-26 | [column-mapping-usage.tracking.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/column-mapping-usage-tracking.md) | https://github.com/delta-io/delta/issues/2682 | Column Mapping Usage Tracking |
@@ -30,6 +29,7 @@ Here is the history of all the RFCs propose/accepted/rejected since Feb 6, 2024,
3029
| Date proposed | Date accepted | RFC file | Github issue | RFC title |
3130
|:-|:-|:-|:-|:-|
3231
| 2023-02-28 | 2023-03-26 |[vacuum-protocol-check.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/vacuum-protocol-check.md)| https://github.com/delta-io/delta/issues/2630 | Enforce Vacuum Protocol Check |
32+
| 2023-02-02 | 2023-07-24 |[in-commit-timestamps.md](https://github.com/delta-io/delta/blob/master/protocol_rfcs/in-commit-timestamps.md) | https://github.com/delta-io/delta/issues/2532 | In-Commit Timestamps |
3333

3434
### Rejected RFCs
3535

File renamed without changes.

spark/src/main/scala/org/apache/spark/sql/delta/DeltaConfig.scala

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -766,7 +766,7 @@ trait DeltaConfigsBase extends DeltaLogging {
766766
" commit-coordinator.")
767767

768768
val IN_COMMIT_TIMESTAMPS_ENABLED = buildConfig[Boolean](
769-
"enableInCommitTimestamps-preview",
769+
"enableInCommitTimestamps",
770770
false.toString,
771771
_.toBoolean,
772772
validationFunction = _ => true,
@@ -778,7 +778,7 @@ trait DeltaConfigsBase extends DeltaLogging {
778778
* inCommitTimestamps were enabled.
779779
*/
780780
val IN_COMMIT_TIMESTAMP_ENABLEMENT_VERSION = buildConfig[Option[Long]](
781-
"inCommitTimestampEnablementVersion-preview",
781+
"inCommitTimestampEnablementVersion",
782782
null,
783783
v => Option(v).map(_.toLong),
784784
validationFunction = _ => true,
@@ -791,7 +791,7 @@ trait DeltaConfigsBase extends DeltaLogging {
791791
* the version specified in [[IN_COMMIT_TIMESTAMP_ENABLEMENT_VERSION]].
792792
*/
793793
val IN_COMMIT_TIMESTAMP_ENABLEMENT_TIMESTAMP = buildConfig[Option[Long]](
794-
"inCommitTimestampEnablementTimestamp-preview",
794+
"inCommitTimestampEnablementTimestamp",
795795
null,
796796
v => Option(v).map(_.toLong),
797797
validationFunction = _ => true,

spark/src/main/scala/org/apache/spark/sql/delta/TableFeature.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -803,7 +803,7 @@ object TypeWideningTableFeature
803803
* every writer write a monotonically increasing timestamp inside the commit file.
804804
*/
805805
object InCommitTimestampTableFeature
806-
extends WriterFeature(name = "inCommitTimestamp-preview")
806+
extends WriterFeature(name = "inCommitTimestamp")
807807
with FeatureAutomaticallyEnabledByMetadata
808808
with RemovableFeature {
809809

0 commit comments

Comments
 (0)