test: Add E2E tests for array functions#26937
test: Add E2E tests for array functions#26937jkhaliqi wants to merge 1 commit intoprestodb:masterfrom
Conversation
Reviewer's GuideAdds new native-worker end-to-end tests for array_max_by, array_min_by, and array_top_n to cover normal, edge, and NULL-handling cases in Presto's array function suite. File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 1 issue, and left some high level feedback:
- The empty array literals (
ARRAY[]) in the new tests will have ambiguous element types in Presto; consider explicitly typing them (e.g.,ARRAY[]::integer[]orCAST(ARRAY[] AS ARRAY(INTEGER))) to avoid type resolution issues. - For
array_top_n, you may want to add at least one test wherenis negative or otherwise invalid to validate the function’s behavior on bad input in the native engine.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The empty array literals (`ARRAY[]`) in the new tests will have ambiguous element types in Presto; consider explicitly typing them (e.g., `ARRAY[]::integer[]` or `CAST(ARRAY[] AS ARRAY(INTEGER))`) to avoid type resolution issues.
- For `array_top_n`, you may want to add at least one test where `n` is negative or otherwise invalid to validate the function’s behavior on bad input in the native engine.
## Individual Comments
### Comment 1
<location> `presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeArrayFunctionQueries.java:146-147` </location>
<code_context>
assertQuery("SELECT array_union(ARRAY[ARRAY[linenumber], ARRAY[suppkey, orderkey]], ARRAY[ARRAY[linenumber], ARRAY[partkey, partkey, null, orderkey], ARRAY[suppkey, orderkey]]) FROM lineitem");
}
+
+ @Test
+ public void testArrayMaxBy()
+ {
+ assertQuery("SELECT array_max_by(ARRAY['a', 'bbb', 'cc'], x -> length(x))");
</code_context>
<issue_to_address>
**suggestion (testing):** Consider adding cases where the lambda returns NULL (or partially NULL) and where the input array itself is NULL to cover additional edge conditions for array_max_by/array_min_by.
You already cover empty arrays, NULL elements, and mixed sign values. To round out coverage, please add 1–2 assertions where:
- The lambda returns NULL for some/all elements (e.g., `array_max_by(ARRAY[1,2,3], x -> IF(x = 2, NULL, x))`) and similarly for `array_min_by`, to validate behavior when the *key* is NULL.
- The input array itself is NULL (if the function contract supports this), e.g. `array_max_by(CAST(NULL AS array(integer)), x -> x)`, to ensure native vs. reference behavior matches.
These will better exercise nullability semantics for both the lambda and input array.
Suggested implementation:
```java
@Test
public void testArrayMaxBy()
{
assertQuery("SELECT array_max_by(ARRAY['a', 'bbb', 'cc'], x -> length(x))");
assertQuery("SELECT array_max_by(ARRAY['aa', 'bb', 'c'], x -> length(x))");
assertQuery("SELECT array_max_by(ARRAY['a', NULL, 'bbb'], x -> length(x))");
assertQuery("SELECT array_max_by(ARRAY[NULL, NULL], x -> length(x))");
assertQuery("SELECT array_max_by(ARRAY[], x -> x)");
assertQuery("SELECT array_max_by(ARRAY[-10, 5, 7], x -> abs(x))");
// Lambda returns NULL for some elements (key nullability)
assertQuery("SELECT array_max_by(ARRAY[1, 2, 3], x -> IF(x = 2, NULL, x))");
// Input array itself is NULL
assertQuery("SELECT array_max_by(CAST(NULL AS array(integer)), x -> x)");
}
@Test
public void testArrayMinBy()
{
```
To fully implement your suggestion for `array_min_by` as well, you should add analogous assertions in `testArrayMinBy`, for example:
- A case where the lambda returns NULL for some elements, e.g.:
- `assertQuery("SELECT array_min_by(ARRAY[1, 2, 3], x -> IF(x = 2, NULL, x))");`
- A case where the input array itself is NULL, e.g.:
- `assertQuery("SELECT array_min_by(CAST(NULL AS array(integer)), x -> x)");`
Place these inside the `testArrayMinBy` method body alongside whatever existing assertions you add for the basic behavior, mirroring the coverage pattern used in `testArrayMaxBy`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
...n/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeArrayFunctionQueries.java
Outdated
Show resolved
Hide resolved
|
Thanks for the release note! You can delete everything after the Add E2E tests for array_max_by, line including the NO RELEASE NOTES line and its instructions (but keep the formatting line of three ` after Add E2E tests for array_max_by,). |
69ef1bd to
f68b769
Compare
czentgr
left a comment
There was a problem hiding this comment.
Thanks,
we don't need release notes for adding tests.
Can we look at presto-native-tests and see what the test situation is like there?
| @Test | ||
| public void testArrayMaxBy() | ||
| { | ||
| assertQuery("SELECT array_max_by(ARRAY['a', 'bbb', 'cc'], x -> length(x))"); |
There was a problem hiding this comment.
All your tests are using constant arrays... Can you check if they get constant folded on the co-ordinator ? If yes, then we should write tests that end up executing on the worker. So change the test to use a table with an array column instead.
There was a problem hiding this comment.
Test's are not being constant folded and I have confirmed that they use the worker for all test cases. I beleive it is because of the lambda transform parameter.
There was a problem hiding this comment.
Perhaps we can be deliberate for all of these queries and use a values clause to provide the values?
For example,
assertQuery("SELECT array_max_by(a, x -> length(x)) from (values (ARRAY['a', 'bbb', 'cc'])) as t(a)");
(I didn't try this but syntax looks correct). And we can do this for all queries?
There was a problem hiding this comment.
Sounds good, updated!
| QueryRunner queryRunner = PrestoNativeQueryRunnerUtils.nativeHiveQueryRunnerBuilder() | ||
| .setAddStorageFormatToPath(true) | ||
| .build(); | ||
| queryRunner.installPlugin(new SqlInvokedFunctionsPlugin()); |
There was a problem hiding this comment.
Seems like not all tests need SqlInvokedFunctionsPlugin, so adding it by default to the builders would be unnecessary overhead for tests that don't use SQL-invoked functions
There was a problem hiding this comment.
@aditi-pandit
No need to add it by default, its only required for functions defined under SqlInvokedFunctionsPlugin. We separated out the SQL invoked functions into a plugin here: #25818.
| QueryRunner queryRunner = PrestoNativeQueryRunnerUtils.nativeHiveQueryRunnerBuilder() | ||
| .setAddStorageFormatToPath(true) | ||
| .build(); | ||
| queryRunner.installPlugin(new SqlInvokedFunctionsPlugin()); |
There was a problem hiding this comment.
@jkhaliqi We should not add SqlInvokedFunctionsPlugin on a native cluster, there's another plugin for that : NativeSqlInvokedFunctionsPlugin, can you try that?
| protected QueryRunner createQueryRunner() | ||
| throws Exception | ||
| { | ||
| QueryRunner queryRunner = PrestoNativeQueryRunnerUtils.nativeHiveQueryRunnerBuilder() |
There was a problem hiding this comment.
Also, can we test this with sidecarEnabled too? Please refer to the other test files.
| return "RETURN IF(n < 0, fail('n must be greater than or equal to 0'), IF(COALESCE(CARDINALITY(REMOVE_NULLS(input)), 0) = 0, NULL, TRANSFORM(SLICE(ARRAY_SORT(TRANSFORM(MAP_ENTRIES(ARRAY_FREQUENCY(REMOVE_NULLS(input))), x -> ROW(x[2], x[1]))), 1, n), x -> x[2])))"; | ||
| } | ||
|
|
||
| @SqlInvokedScalarFunction(value = "array_max_by", deterministic = true, calledOnNullInput = true) |
There was a problem hiding this comment.
We add them here because without the sidecar we won't have support for these functions otherwise.
Do we know why they weren't defined already?
There was a problem hiding this comment.
It seems they weren't defined since defining it here is causing them to be double registered now and failing this existing test testOverriddenInlinedSqlInvokedFunctions. Will look into it some more.
There was a problem hiding this comment.
We shouldn't need them here, these are getting pulled from the sidecar.
I chatted offline with @jkhaliqi and seems like we weren't loading the correct plugin leading to this issue.
|
Potential missing coverage |
Description
Add E2E tests for array_max_by, array_min_by, and array_top_n
Motivation and Context
closes #26934
Impact
Test Plan
Contributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.