Skip to content

[SPARK-41636][SQL] Make sure selectFilters returns predicates in deterministic order #42265

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Hisoka-X
Copy link
Member

@Hisoka-X Hisoka-X commented Aug 1, 2023

What changes were proposed in this pull request?

Method DataSourceStrategy#selectFilters, which is used to determine "pushdown-able" filters, does not preserve the order of the input Seq[Expression] nor does it return the same order across the same plans. This is resulting in CodeGenerator cache misses even when the exact same LogicalPlan is executed.

This PR to make sure selectFilters returns predicates in deterministic order.

Why are the changes needed?

Make sure selectFilters returns predicates in deterministic order, to reduce the probability of codegen cache misses.

Does this PR introduce any user-facing change?

No

How was this patch tested?

add new test.

@github-actions github-actions bot added the SQL label Aug 1, 2023
Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sense to me

@Hisoka-X
Copy link
Member Author

Hisoka-X commented Aug 6, 2023

cc @cloud-fan @viirya

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@HyukjinKwon
Copy link
Member

Merged to master and branch-3.5.

@HyukjinKwon HyukjinKwon closed this Aug 7, 2023
@HyukjinKwon HyukjinKwon reopened this Aug 7, 2023
HyukjinKwon pushed a commit that referenced this pull request Aug 7, 2023
…terministic order

### What changes were proposed in this pull request?
Method `DataSourceStrategy#selectFilters`, which is used to determine "pushdown-able" filters, does not preserve the order of the input Seq[Expression] nor does it return the same order across the same plans. This is resulting in CodeGenerator cache misses even when the exact same LogicalPlan is executed.

This PR to make sure `selectFilters` returns predicates in deterministic order.

### Why are the changes needed?
Make sure `selectFilters` returns predicates in deterministic order, to reduce the probability of codegen cache misses.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

Closes #42265 from Hisoka-X/SPARK-41636_selectfilters_order.

Authored-by: Jia Fan <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
(cherry picked from commit 9462dcd)
Signed-off-by: Hyukjin Kwon <[email protected]>
@Hisoka-X
Copy link
Member Author

Hisoka-X commented Aug 7, 2023

Thanks @HyukjinKwon @viirya

@cloud-fan
Copy link
Contributor

late LGTM

@Hisoka-X Hisoka-X deleted the SPARK-41636_selectfilters_order branch August 11, 2023 00:48
valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
…terministic order

### What changes were proposed in this pull request?
Method `DataSourceStrategy#selectFilters`, which is used to determine "pushdown-able" filters, does not preserve the order of the input Seq[Expression] nor does it return the same order across the same plans. This is resulting in CodeGenerator cache misses even when the exact same LogicalPlan is executed.

This PR to make sure `selectFilters` returns predicates in deterministic order.

### Why are the changes needed?
Make sure `selectFilters` returns predicates in deterministic order, to reduce the probability of codegen cache misses.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

Closes apache#42265 from Hisoka-X/SPARK-41636_selectfilters_order.

Authored-by: Jia Fan <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
ragnarok56 pushed a commit to ragnarok56/spark that referenced this pull request Mar 2, 2024
…terministic order

### What changes were proposed in this pull request?
Method `DataSourceStrategy#selectFilters`, which is used to determine "pushdown-able" filters, does not preserve the order of the input Seq[Expression] nor does it return the same order across the same plans. This is resulting in CodeGenerator cache misses even when the exact same LogicalPlan is executed.

This PR to make sure `selectFilters` returns predicates in deterministic order.

### Why are the changes needed?
Make sure `selectFilters` returns predicates in deterministic order, to reduce the probability of codegen cache misses.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
add new test.

Closes apache#42265 from Hisoka-X/SPARK-41636_selectfilters_order.

Authored-by: Jia Fan <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants