Skip to content

[Protocol Change Request] Improving Time Travel using In-Commit Timestamps #2532

@dhruvarya-db

Description

@dhruvarya-db

Feature request

Overview

This feature request is about changing Delta commit timestamps to improve time travel.

Motivation

Delta currently relies on the file modification time to identify the timestamp of a commit. This timestamp is used for time travel queries, log cleanup, and staleness checks. However, file modification time is not a very reliable way of getting a timestamp — this can easily change when the files are copied/moved to another directory (e.g. for disaster recovery purposes) or when any manual fixes are performed to the Delta log. In such cases, time travel on the delta table breaks as of today. The possibility of non-monotonic file timestamps also adds lots of code complexity in Delta as we try to handle it heuristically in the best possible way.

Further details

We propose a new Writer feature that will require clients to generate a timestamp just before performing a commit and store it in the commit itself.

Compliant writers will ensure that the timestamp stored in Commit X+1 is always greater than Commit X. To be able to ensure this, the client will need to perform conflict detection for these timestamps.

  1. The writer will write this timestamp in the CommitInfo action. Furthermore, the writer will always write CommitInfo as the first action in a commit.
  2. Clients that understand these new timestamps will now read the commit file to get the actual timestamp. These timestamps will now be used for time travel queries and by other operations that use timestamps.

The detail proposal and the required protocol changes are sketched out in this doc.

Willingness to contribute

The Delta Lake Community encourages new feature contributions. Would you or another member of your organization be willing to contribute an implementation of this feature?

  • Yes. I can contribute this feature independently.
  • Yes. I would be willing to contribute this feature with guidance from the Delta Lake community.
  • No. I cannot contribute this feature at this time.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    Status

    Done

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions