Commit bba0e94
authored
[Spark] Skip collecting commit stats to prevent computing Snapshot State (#2718)
## Description
Before this PR, Delta computes a
[SnapshotState](https://github.com/delta-io/delta/blob/v3.1.0/spark/src/main/scala/org/apache/spark/sql/delta/SnapshotState.scala#L46-L58)
during every commit. Computing a SnapshotState is fairly slow and
expensive, because it involves reading the entirety of a checkpoint,
sidecars, and log segment.
For many types of commit, it should be unnecessary to compute the
SnapshotState.
After this PR, a transaction can avoid computing the SnapshotState of a
newly created snapshot. Skipping the computation is enabled via a spark
configuration option `spark.databricks.delta.commitStats.collect=false`
This change can have a big performance impact when writing into a Delta
Table. Especially when the table comprises a large number of underlying
data files.
## How was this patch tested?
- Locally built delta-spark
- Ran a small spark job to insert rows into a delta table
- Inspected log4j output to see if snapshot state was computed
- Repeated again, this time setting
`spark.databricks.delta.commitStats.collect=false`
Simple demo job that triggers computing SnapshotState, before this PR:
```scala
val spark = SparkSession
.builder
.appName("myapp")
.master("local[*]")
.config("spark.sql.warehouse.dir", "./warehouse")
.config("spark.sql.extensions", "io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog", "org.apache.spark.sql.delta.catalog.DeltaCatalog")
.getOrCreate
spark.sql("""CREATE TABLE test_delta(id string) USING DELTA """)
spark.sql("""
INSERT INTO test_delta (id) VALUES (42)
""")
spark.close()
```
## Does this PR introduce _any_ user-facing changes?
Yes, after this PR the user can set spark config option
`spark.databricks.delta.commitStats.collect=false` to avoid computing
SnapshotState after a commit.1 parent 1b210c2 commit bba0e94
File tree
6 files changed
+51
-3
lines changed- spark/src
- main/scala/org/apache/spark/sql/delta
- sources
- test/scala/org/apache/spark/sql/delta
6 files changed
+51
-3
lines changedLines changed: 9 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1874 | 1874 | | |
1875 | 1875 | | |
1876 | 1876 | | |
| 1877 | + | |
| 1878 | + | |
| 1879 | + | |
| 1880 | + | |
| 1881 | + | |
| 1882 | + | |
| 1883 | + | |
1877 | 1884 | | |
1878 | 1885 | | |
1879 | 1886 | | |
| |||
1887 | 1894 | | |
1888 | 1895 | | |
1889 | 1896 | | |
1890 | | - | |
1891 | | - | |
| 1897 | + | |
| 1898 | + | |
1892 | 1899 | | |
1893 | 1900 | | |
1894 | 1901 | | |
| |||
Lines changed: 5 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | | - | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
124 | 128 | | |
125 | 129 | | |
126 | 130 | | |
| |||
Lines changed: 11 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
86 | 97 | | |
87 | 98 | | |
88 | 99 | | |
| |||
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
326 | 326 | | |
327 | 327 | | |
328 | 328 | | |
| 329 | + | |
| 330 | + | |
329 | 331 | | |
330 | 332 | | |
331 | 333 | | |
| |||
347 | 349 | | |
348 | 350 | | |
349 | 351 | | |
| 352 | + | |
| 353 | + | |
350 | 354 | | |
351 | 355 | | |
352 | 356 | | |
| |||
Lines changed: 18 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
890 | 890 | | |
891 | 891 | | |
892 | 892 | | |
| 893 | + | |
| 894 | + | |
| 895 | + | |
| 896 | + | |
| 897 | + | |
| 898 | + | |
| 899 | + | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
893 | 911 | | |
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
696 | 696 | | |
697 | 697 | | |
698 | 698 | | |
| 699 | + | |
| 700 | + | |
699 | 701 | | |
700 | 702 | | |
701 | 703 | | |
| |||
717 | 719 | | |
718 | 720 | | |
719 | 721 | | |
| 722 | + | |
| 723 | + | |
720 | 724 | | |
721 | 725 | | |
722 | 726 | | |
| |||
0 commit comments