Commit 669dca9
authored
[Spark] Improve Delta Protocol Transitions (#2848)
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
3. Be sure to keep the PR description updated to reflect all changes.
4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->
#### Which Delta project/connector is this regarding?
<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->
- [x] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)
## Description
<!--
- Describe what this PR changes.
- Describe why we need the change.
If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->
Currently, protocol transitions can be hard to manage. A few examples:
- It is hard to predict the output of certain operations.
- Once a legacy protocol transitions to a Table Features protocol it is
quite hard to transition back to a legacy protocol.
- Adding a feature in a protocol and then removing it might lead to a
different protocol.
- Adding an explicit feature to a legacy protocol always leads to a
table features protocol although it might not be necessary.
- Dropping features from legacy protocols is not supported. As a result,
the order the features are dropped matters.
- Default protocol versions are ignored in some cases.
- Enabling table features by default results in feature loss in legacy
protocols.
- CREATE TABLE ignores any legacy versions set if there is also a table
feature in the definition.
This PR proposes several protocol transition improvements in order to
simplify user journeys. The high level proposal is the following:
Two protocol representations with singular operational semantics. This
means that we have two ways to represent a protocol: a) The legacy
representation and b) the table features representation. The latter
representation is more powerful than the former, i.e the table features
representation can represent all legacy protocols but the opposite is
not true. This is followed by three simple rules:
1. All operations should be allowed to be performed on both protocol
representations and should yield equivalent results.
2. The result should always be represented with the weaker form when
possible.
3. Conversely, if the result of an operation on a legacy protocol cannot
be represented with the legacy representation, use the Table Features
representation.
**The PR introduces the following behavioural changes:**
1. Now all protocol operations are followed by denormalisation and then
normalisation. Up to now, normalisation would only be performed after
dropping a features.
2. Legacy features can now be dropped directly from a legacy protocol.
The result is represented with table features if it cannot be
represented with a legacy protocol.
3. Operations on table feature protocols now take into account the
default versions. For example, enabling deletion vectors on table
results to protocol `(3, 7, AppendOnly, Invariants, DeletionVectors)`.
5. Operations on table feature protocols now take into account any
protocol versions set on the table. For example, creating a table with
protocol `(1, 3)` and deletion vectors results to protocol `(3, 7,
AppendOnly, Invariants, CheckConstraints, DeletionVectors)`.
6. It is not possible now to have a table features protocol without
table features. For example, creating a table with `(3, 7)` and no table
features is now normalised to `(1, 1)`.
7. Column Mapping can now be automatically enabled on legacy protocols
when the mode is changed explicitly.
## How was this patch tested?
Added `DeltaProtocolTransitionsSuite`. Also modified existing tests in
`DeltaProtocolVersionSuite`.
<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->
## Does this PR introduce _any_ user-facing changes?
<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->
Yes.1 parent 4430dc1 commit 669dca9
File tree
22 files changed
+1142
-515
lines changed- spark/src
- main
- resources/error
- scala/org/apache/spark/sql/delta
- actions
- commands
- sources
- test
- scala-spark-master/org/apache/spark/sql/delta
- scala
- io/delta/tables
- org/apache/spark/sql/delta
- columnmapping
- schema
22 files changed
+1142
-515
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2514 | 2514 | | |
2515 | 2515 | | |
2516 | 2516 | | |
2517 | | - | |
2518 | | - | |
2519 | | - | |
2520 | | - | |
2521 | | - | |
2522 | | - | |
2523 | | - | |
2524 | | - | |
2525 | | - | |
2526 | | - | |
2527 | | - | |
2528 | | - | |
2529 | | - | |
2530 | | - | |
2531 | 2517 | | |
2532 | 2518 | | |
2533 | 2519 | | |
| |||
Lines changed: 3 additions & 30 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
89 | | - | |
90 | | - | |
91 | | - | |
92 | 89 | | |
93 | 90 | | |
94 | 91 | | |
| |||
134 | 131 | | |
135 | 132 | | |
136 | 133 | | |
137 | | - | |
138 | | - | |
139 | | - | |
140 | | - | |
141 | | - | |
142 | | - | |
143 | | - | |
144 | | - | |
145 | | - | |
146 | | - | |
147 | | - | |
148 | | - | |
149 | | - | |
150 | | - | |
151 | | - | |
152 | | - | |
153 | | - | |
154 | | - | |
155 | | - | |
156 | | - | |
157 | | - | |
158 | | - | |
159 | | - | |
160 | | - | |
161 | | - | |
162 | | - | |
163 | | - | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
164 | 137 | | |
165 | 138 | | |
166 | 139 | | |
| |||
Lines changed: 5 additions & 28 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2043 | 2043 | | |
2044 | 2044 | | |
2045 | 2045 | | |
2046 | | - | |
2047 | | - | |
2048 | | - | |
2049 | | - | |
2050 | | - | |
2051 | | - | |
2052 | | - | |
2053 | | - | |
2054 | | - | |
2055 | | - | |
2056 | | - | |
2057 | | - | |
2058 | | - | |
2059 | | - | |
2060 | | - | |
2061 | | - | |
2062 | | - | |
2063 | | - | |
2064 | | - | |
2065 | | - | |
2066 | | - | |
2067 | | - | |
| 2046 | + | |
2068 | 2047 | | |
| 2048 | + | |
| 2049 | + | |
2069 | 2050 | | |
2070 | 2051 | | |
2071 | 2052 | | |
2072 | 2053 | | |
2073 | | - | |
2074 | 2054 | | |
2075 | 2055 | | |
2076 | | - | |
2077 | | - | |
2078 | | - | |
2079 | | - | |
2080 | | - | |
| 2056 | + | |
| 2057 | + | |
2081 | 2058 | | |
2082 | 2059 | | |
2083 | 2060 | | |
| |||
Lines changed: 21 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
538 | 538 | | |
539 | 539 | | |
540 | 540 | | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
| 550 | + | |
| 551 | + | |
| 552 | + | |
| 553 | + | |
| 554 | + | |
| 555 | + | |
| 556 | + | |
541 | 557 | | |
542 | 558 | | |
543 | 559 | | |
| |||
620 | 636 | | |
621 | 637 | | |
622 | 638 | | |
623 | | - | |
624 | | - | |
| 639 | + | |
| 640 | + | |
625 | 641 | | |
626 | 642 | | |
627 | 643 | | |
628 | 644 | | |
629 | 645 | | |
630 | 646 | | |
631 | | - | |
632 | | - | |
633 | 647 | | |
634 | 648 | | |
635 | 649 | | |
| |||
639 | 653 | | |
640 | 654 | | |
641 | 655 | | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
642 | 659 | | |
643 | 660 | | |
644 | 661 | | |
| |||
Lines changed: 14 additions & 5 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
332 | 332 | | |
333 | 333 | | |
334 | 334 | | |
335 | | - | |
336 | | - | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
337 | 346 | | |
338 | 347 | | |
339 | 348 | | |
| |||
355 | 364 | | |
356 | 365 | | |
357 | 366 | | |
358 | | - | |
| 367 | + | |
359 | 368 | | |
360 | 369 | | |
361 | 370 | | |
| |||
405 | 414 | | |
406 | 415 | | |
407 | 416 | | |
408 | | - | |
409 | | - | |
| 417 | + | |
| 418 | + | |
410 | 419 | | |
411 | 420 | | |
412 | 421 | | |
| |||
Lines changed: 61 additions & 54 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
| 25 | + | |
| 26 | + | |
25 | 27 | | |
26 | 28 | | |
27 | | - | |
28 | | - | |
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
| |||
229 | 229 | | |
230 | 230 | | |
231 | 231 | | |
232 | | - | |
233 | | - | |
234 | 232 | | |
235 | 233 | | |
236 | 234 | | |
237 | | - | |
238 | | - | |
239 | | - | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
240 | 238 | | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
251 | 242 | | |
252 | 243 | | |
253 | 244 | | |
| |||
287 | 278 | | |
288 | 279 | | |
289 | 280 | | |
290 | | - | |
291 | | - | |
292 | | - | |
293 | | - | |
294 | | - | |
295 | | - | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
296 | 289 | | |
297 | 290 | | |
298 | 291 | | |
| |||
323 | 316 | | |
324 | 317 | | |
325 | 318 | | |
326 | | - | |
327 | | - | |
| 319 | + | |
328 | 320 | | |
329 | 321 | | |
330 | 322 | | |
| 323 | + | |
331 | 324 | | |
332 | 325 | | |
333 | | - | |
| 326 | + | |
334 | 327 | | |
335 | | - | |
| 328 | + | |
336 | 329 | | |
337 | 330 | | |
338 | 331 | | |
339 | | - | |
| 332 | + | |
340 | 333 | | |
341 | 334 | | |
| 335 | + | |
342 | 336 | | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
347 | 344 | | |
348 | | - | |
349 | | - | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
350 | 348 | | |
351 | | - | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
365 | 352 | | |
366 | 353 | | |
367 | 354 | | |
368 | 355 | | |
369 | 356 | | |
370 | | - | |
371 | | - | |
372 | | - | |
373 | | - | |
374 | | - | |
375 | 357 | | |
376 | 358 | | |
377 | 359 | | |
378 | 360 | | |
379 | | - | |
| 361 | + | |
| 362 | + | |
380 | 363 | | |
381 | 364 | | |
382 | 365 | | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
383 | 390 | | |
384 | 391 | | |
385 | 392 | | |
| |||
0 commit comments