Commit 7d41fb7
Use inventory reservoir as source for all files and dirs
- Currently, users have large tables with daily/hourly partitions for many years, among all these partitions only recent ones are subjected to change due to job reruns, corrections, and late arriving events.
- When Vacuum is run on these tables, the listing of files is performed on all the partitions and it runs for several hours/days. This duration grows as tables grow and vacuum becomes a major overhead for customers especially when they have hundreds or thousands of such delta tables. File system scan takes the most amount of time in Vacuum operation for large tables, mostly due to the parallelism achievable and API throttling on the object stores.
- This change provides a way for users to pass a reservoir of files generated externally (eg: from inventory reports of cloud stores) as a delta table or as a spark SQL query (having a predefined schema). The vacuum operation when provided with such a reservoir data frame will skip the listing operation and use it as a source of all files in the storage.
"Resolves #1691".
- Unit Testing (` build/sbt 'testOnly org.apache.spark.sql.delta.DeltaVacuumSuite'`)
yes, the MR accepts an optional method to pass inventory.
`VACUUM table_name [USING INVENTORY <reservoir-delta-table>] [RETAIN num HOURS] [DRY RUN]` `VACUUM table_name [USING INVENTORY <reservoir-query>] [RETAIN num HOURS] [DRY RUN]`
eg: `VACUUM test_db.table using inventory select * from reservoir_table RETAIN 168 HOURS dry run`
Closes #2257
Co-authored-by: Arun Ravi M V <[email protected]>
Signed-off-by: Bart Samwel <[email protected]>
GitOrigin-RevId: 2bc824e524c677dd5f3a7ed787762df60c3b6d861 parent 4c8a442 commit 7d41fb7
File tree
8 files changed
+310
-32
lines changed- spark/src
- main
- antlr4/io/delta/sql/parser
- resources/error
- scala
- io/delta
- sql/parser
- tables/execution
- org/apache/spark/sql/delta
- commands
- test/scala
- io/delta/sql/parser
- org/apache/spark/sql/delta
8 files changed
+310
-32
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
76 | 77 | | |
77 | 78 | | |
78 | 79 | | |
| |||
214 | 215 | | |
215 | 216 | | |
216 | 217 | | |
| 218 | + | |
| 219 | + | |
| 220 | + | |
| 221 | + | |
| 222 | + | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
217 | 226 | | |
218 | 227 | | |
219 | 228 | | |
| |||
223 | 232 | | |
224 | 233 | | |
225 | 234 | | |
226 | | - | |
| 235 | + | |
227 | 236 | | |
228 | 237 | | |
229 | 238 | | |
| |||
266 | 275 | | |
267 | 276 | | |
268 | 277 | | |
| 278 | + | |
269 | 279 | | |
270 | 280 | | |
271 | 281 | | |
| |||
296 | 306 | | |
297 | 307 | | |
298 | 308 | | |
| 309 | + | |
299 | 310 | | |
300 | 311 | | |
301 | 312 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1176 | 1176 | | |
1177 | 1177 | | |
1178 | 1178 | | |
| 1179 | + | |
| 1180 | + | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
1179 | 1185 | | |
1180 | 1186 | | |
1181 | 1187 | | |
| |||
Lines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
319 | 319 | | |
320 | 320 | | |
321 | 321 | | |
322 | | - | |
323 | | - | |
324 | | - | |
325 | | - | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
326 | 328 | | |
327 | 329 | | |
328 | 330 | | |
| |||
Lines changed: 17 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
23 | 25 | | |
24 | 26 | | |
25 | 27 | | |
| |||
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
33 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
34 | 38 | | |
35 | 39 | | |
36 | 40 | | |
37 | 41 | | |
38 | 42 | | |
| 43 | + | |
| 44 | + | |
39 | 45 | | |
40 | 46 | | |
41 | 47 | | |
| |||
53 | 59 | | |
54 | 60 | | |
55 | 61 | | |
56 | | - | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
57 | 67 | | |
58 | 68 | | |
59 | 69 | | |
60 | 70 | | |
61 | 71 | | |
62 | 72 | | |
63 | 73 | | |
| 74 | + | |
| 75 | + | |
64 | 76 | | |
65 | 77 | | |
66 | 78 | | |
67 | | - | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
68 | 82 | | |
69 | 83 | | |
Lines changed: 6 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
566 | 566 | | |
567 | 567 | | |
568 | 568 | | |
569 | | - | |
| 569 | + | |
| 570 | + | |
| 571 | + | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
570 | 575 | | |
571 | 576 | | |
572 | 577 | | |
| |||
Lines changed: 67 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | 23 | | |
25 | | - | |
26 | 24 | | |
27 | 25 | | |
28 | 26 | | |
| |||
31 | 29 | | |
32 | 30 | | |
33 | 31 | | |
34 | | - | |
35 | 32 | | |
36 | | - | |
| 33 | + | |
37 | 34 | | |
38 | 35 | | |
39 | | - | |
| 36 | + | |
| 37 | + | |
40 | 38 | | |
41 | 39 | | |
42 | 40 | | |
| |||
51 | 49 | | |
52 | 50 | | |
53 | 51 | | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
54 | 67 | | |
55 | 68 | | |
56 | 69 | | |
| |||
125 | 138 | | |
126 | 139 | | |
127 | 140 | | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
128 | 173 | | |
129 | | - | |
130 | | - | |
131 | | - | |
132 | | - | |
133 | | - | |
134 | | - | |
135 | | - | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
136 | 183 | | |
137 | 184 | | |
138 | 185 | | |
139 | 186 | | |
140 | 187 | | |
| 188 | + | |
| 189 | + | |
141 | 190 | | |
142 | 191 | | |
143 | 192 | | |
| |||
146 | 195 | | |
147 | 196 | | |
148 | 197 | | |
| 198 | + | |
149 | 199 | | |
150 | 200 | | |
151 | 201 | | |
| |||
189 | 239 | | |
190 | 240 | | |
191 | 241 | | |
192 | | - | |
193 | | - | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
194 | 245 | | |
195 | 246 | | |
196 | 247 | | |
197 | 248 | | |
198 | 249 | | |
199 | 250 | | |
200 | 251 | | |
201 | | - | |
| 252 | + | |
| 253 | + | |
202 | 254 | | |
203 | 255 | | |
204 | 256 | | |
| |||
Lines changed: 11 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
43 | | - | |
| 43 | + | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
| 46 | + | |
46 | 47 | | |
47 | | - | |
| 48 | + | |
| 49 | + | |
48 | 50 | | |
49 | | - | |
| 51 | + | |
| 52 | + | |
50 | 53 | | |
51 | 54 | | |
52 | | - | |
| 55 | + | |
53 | 56 | | |
54 | 57 | | |
55 | | - | |
| 58 | + | |
56 | 59 | | |
57 | 60 | | |
58 | 61 | | |
59 | | - | |
| 62 | + | |
60 | 63 | | |
61 | 64 | | |
62 | 65 | | |
63 | | - | |
| 66 | + | |
64 | 67 | | |
65 | 68 | | |
66 | 69 | | |
| |||
0 commit comments