Skip to content

test: Add E2E tests for array functions#26937

Open
jkhaliqi wants to merge 1 commit intoprestodb:masterfrom
jkhaliqi:array_e2e
Open

test: Add E2E tests for array functions#26937
jkhaliqi wants to merge 1 commit intoprestodb:masterfrom
jkhaliqi:array_e2e

Conversation

@jkhaliqi
Copy link
Copy Markdown
Contributor

@jkhaliqi jkhaliqi commented Jan 9, 2026

Description

Add E2E tests for array_max_by, array_min_by, and array_top_n

Motivation and Context

closes #26934

Impact

Test Plan

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== NO RELEASE NOTE ==

@jkhaliqi jkhaliqi requested review from a team as code owners January 9, 2026 23:32
@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Jan 9, 2026
@prestodb-ci prestodb-ci requested review from a team and removed request for a team January 9, 2026 23:32
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Jan 9, 2026

Reviewer's Guide

Adds new native-worker end-to-end tests for array_max_by, array_min_by, and array_top_n to cover normal, edge, and NULL-handling cases in Presto's array function suite.

File-Level Changes

Change Details Files
Add E2E coverage for array_max_by, array_min_by, and array_top_n in the native worker array function test suite.
  • Introduce testArrayMaxBy() with cases covering varying element lengths, NULL elements, empty arrays, and numeric arrays with abs() as the key function.
  • Introduce testArrayMinBy() mirroring max-by scenarios, asserting behavior with different string lengths, NULLs, empty arrays, and numeric key functions.
  • Introduce testArrayTopN() with scenarios for basic top-N extraction, N larger than array size, duplicate values, NULL handling, empty arrays, and zero-sized N.
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeArrayFunctionQueries.java

Assessment against linked issues

Issue Objective Addressed Explanation
#26934 Add end-to-end tests for the array_max_by function.
#26934 Add end-to-end tests for the array_min_by function.
#26934 Add end-to-end tests for the array_top_n function.

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • The empty array literals (ARRAY[]) in the new tests will have ambiguous element types in Presto; consider explicitly typing them (e.g., ARRAY[]::integer[] or CAST(ARRAY[] AS ARRAY(INTEGER))) to avoid type resolution issues.
  • For array_top_n, you may want to add at least one test where n is negative or otherwise invalid to validate the function’s behavior on bad input in the native engine.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The empty array literals (`ARRAY[]`) in the new tests will have ambiguous element types in Presto; consider explicitly typing them (e.g., `ARRAY[]::integer[]` or `CAST(ARRAY[] AS ARRAY(INTEGER))`) to avoid type resolution issues.
- For `array_top_n`, you may want to add at least one test where `n` is negative or otherwise invalid to validate the function’s behavior on bad input in the native engine.

## Individual Comments

### Comment 1
<location> `presto-native-execution/src/test/java/com/facebook/presto/nativeworker/AbstractTestNativeArrayFunctionQueries.java:146-147` </location>
<code_context>
         assertQuery("SELECT array_union(ARRAY[ARRAY[linenumber], ARRAY[suppkey, orderkey]], ARRAY[ARRAY[linenumber], ARRAY[partkey, partkey, null, orderkey], ARRAY[suppkey, orderkey]]) FROM lineitem");
     }
+
+    @Test
+    public void testArrayMaxBy()
+    {
+        assertQuery("SELECT array_max_by(ARRAY['a', 'bbb', 'cc'], x -> length(x))");
</code_context>

<issue_to_address>
**suggestion (testing):** Consider adding cases where the lambda returns NULL (or partially NULL) and where the input array itself is NULL to cover additional edge conditions for array_max_by/array_min_by.

You already cover empty arrays, NULL elements, and mixed sign values. To round out coverage, please add 1–2 assertions where:

- The lambda returns NULL for some/all elements (e.g., `array_max_by(ARRAY[1,2,3], x -> IF(x = 2, NULL, x))`) and similarly for `array_min_by`, to validate behavior when the *key* is NULL.
- The input array itself is NULL (if the function contract supports this), e.g. `array_max_by(CAST(NULL AS array(integer)), x -> x)`, to ensure native vs. reference behavior matches.

These will better exercise nullability semantics for both the lambda and input array.

Suggested implementation:

```java
    @Test
    public void testArrayMaxBy()
    {
        assertQuery("SELECT array_max_by(ARRAY['a', 'bbb', 'cc'], x -> length(x))");
        assertQuery("SELECT array_max_by(ARRAY['aa', 'bb', 'c'], x -> length(x))");
        assertQuery("SELECT array_max_by(ARRAY['a', NULL, 'bbb'], x -> length(x))");
        assertQuery("SELECT array_max_by(ARRAY[NULL, NULL], x -> length(x))");
        assertQuery("SELECT array_max_by(ARRAY[], x -> x)");
        assertQuery("SELECT array_max_by(ARRAY[-10, 5, 7], x -> abs(x))");

        // Lambda returns NULL for some elements (key nullability)
        assertQuery("SELECT array_max_by(ARRAY[1, 2, 3], x -> IF(x = 2, NULL, x))");

        // Input array itself is NULL
        assertQuery("SELECT array_max_by(CAST(NULL AS array(integer)), x -> x)");
    }

    @Test
    public void testArrayMinBy()
    {

```

To fully implement your suggestion for `array_min_by` as well, you should add analogous assertions in `testArrayMinBy`, for example:

- A case where the lambda returns NULL for some elements, e.g.:
  - `assertQuery("SELECT array_min_by(ARRAY[1, 2, 3], x -> IF(x = 2, NULL, x))");`
- A case where the input array itself is NULL, e.g.:
  - `assertQuery("SELECT array_min_by(CAST(NULL AS array(integer)), x -> x)");`

Place these inside the `testArrayMinBy` method body alongside whatever existing assertions you add for the basic behavior, mirroring the coverage pattern used in `testArrayMaxBy`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@steveburnett
Copy link
Copy Markdown
Contributor

Thanks for the release note! You can delete everything after the Add E2E tests for array_max_by, line including the NO RELEASE NOTES line and its instructions (but keep the formatting line of three ` after Add E2E tests for array_max_by,).

@jkhaliqi jkhaliqi force-pushed the array_e2e branch 2 times, most recently from 69ef1bd to f68b769 Compare February 4, 2026 22:00
Copy link
Copy Markdown
Contributor

@czentgr czentgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks,
we don't need release notes for adding tests.

Can we look at presto-native-tests and see what the test situation is like there?

Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jkhaliqi

@Test
public void testArrayMaxBy()
{
assertQuery("SELECT array_max_by(ARRAY['a', 'bbb', 'cc'], x -> length(x))");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All your tests are using constant arrays... Can you check if they get constant folded on the co-ordinator ? If yes, then we should write tests that end up executing on the worker. So change the test to use a table with an array column instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test's are not being constant folded and I have confirmed that they use the worker for all test cases. I beleive it is because of the lambda transform parameter.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can be deliberate for all of these queries and use a values clause to provide the values?
For example,

assertQuery("SELECT array_max_by(a, x -> length(x)) from (values (ARRAY['a', 'bbb', 'cc'])) as t(a)");

(I didn't try this but syntax looks correct). And we can do this for all queries?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, updated!

aditi-pandit
aditi-pandit previously approved these changes Feb 27, 2026
Copy link
Copy Markdown
Contributor

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jkhaliqi

QueryRunner queryRunner = PrestoNativeQueryRunnerUtils.nativeHiveQueryRunnerBuilder()
.setAddStorageFormatToPath(true)
.build();
queryRunner.installPlugin(new SqlInvokedFunctionsPlugin());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkhaliqi : Should we be adding SqlInvokedFunctionsPlugin by default in
PrestoNativeQueryRunnerUtils.native(java)HiveQueryRunnerBuilder().build() ?

We could do that in a follow up PR.
@czentgr

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like not all tests need SqlInvokedFunctionsPlugin, so adding it by default to the builders would be unnecessary overhead for tests that don't use SQL-invoked functions

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkhaliqi : I'm coming from a point of view of interface.

@pdabre12 : Why do we have SqlInvokedFunctionsPlugin ? I don't remember having it in the past.

Copy link
Copy Markdown
Contributor

@pdabre12 pdabre12 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aditi-pandit
No need to add it by default, its only required for functions defined under SqlInvokedFunctionsPlugin. We separated out the SQL invoked functions into a plugin here: #25818.

allenshen13
allenshen13 previously approved these changes Mar 6, 2026
QueryRunner queryRunner = PrestoNativeQueryRunnerUtils.nativeHiveQueryRunnerBuilder()
.setAddStorageFormatToPath(true)
.build();
queryRunner.installPlugin(new SqlInvokedFunctionsPlugin());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkhaliqi We should not add SqlInvokedFunctionsPlugin on a native cluster, there's another plugin for that : NativeSqlInvokedFunctionsPlugin, can you try that?

protected QueryRunner createQueryRunner()
throws Exception
{
QueryRunner queryRunner = PrestoNativeQueryRunnerUtils.nativeHiveQueryRunnerBuilder()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, can we test this with sidecarEnabled too? Please refer to the other test files.

return "RETURN IF(n < 0, fail('n must be greater than or equal to 0'), IF(COALESCE(CARDINALITY(REMOVE_NULLS(input)), 0) = 0, NULL, TRANSFORM(SLICE(ARRAY_SORT(TRANSFORM(MAP_ENTRIES(ARRAY_FREQUENCY(REMOVE_NULLS(input))), x -> ROW(x[2], x[1]))), 1, n), x -> x[2])))";
}

@SqlInvokedScalarFunction(value = "array_max_by", deterministic = true, calledOnNullInput = true)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add them here because without the sidecar we won't have support for these functions otherwise.
Do we know why they weren't defined already?

Copy link
Copy Markdown
Contributor Author

@jkhaliqi jkhaliqi Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems they weren't defined since defining it here is causing them to be double registered now and failing this existing test testOverriddenInlinedSqlInvokedFunctions. Will look into it some more.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need them here, these are getting pulled from the sidecar.
I chatted offline with @jkhaliqi and seems like we weren't loading the correct plugin leading to this issue.

@allenshen13
Copy link
Copy Markdown
Member

Potential missing coverage
- NULL array input (CAST(NULL AS ARRAY(INTEGER))) for all three functions
- Negative N for array_top_n (n < 0)
- Empty array for array_top_n
- No table-sourced queries (only inline VALUES)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add end to end tests for array functions

7 participants