Skip to content

Support multi node integ tests #1320

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

owaiskazi19
Copy link
Member

@owaiskazi19 owaiskazi19 commented May 12, 2025

Description

Support multi node integ tests.
Improvements done for CI in this PR:

  1. example 1, example 2 integ test was running twice in precommit and check. Removed running integTest again in precommit
  2. jacocotestReport tasks depends on test and integTest. Since gradle check already runs both, moved codeCov CI under gradle check.
  3. Added multi node integration tests CI

Related Issues

Resolves ##1307

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@owaiskazi19 owaiskazi19 force-pushed the multi-node-tests branch 3 times, most recently from c4578a3 to e4cb9cf Compare May 13, 2025 08:44
@owaiskazi19 owaiskazi19 marked this pull request as ready for review May 13, 2025 18:05
@owaiskazi19 owaiskazi19 changed the title [DRAFT] Support multi node integ tests Support multi node integ tests May 13, 2025
@vibrantvarun
Copy link
Member

Gradle check is failing.

@yuye-aws
Copy link
Member

@owaiskazi19 Can you update the change log?

@owaiskazi19
Copy link
Member Author

Unrelated test is failing on main as well

REPRODUCE WITH: ./gradlew ':test' --tests "org.opensearch.neuralsearch.search.query.HybridCollectorManagerTests.testRescoreWithConcurrentSegmentSearch_whenMatchedDocsAndRescore_thenSuccessful" -Dtests.seed=31B75FEED86AF287 -Dtests.security.manager=false -Dtests.locale=fi-FI -Dtests.timezone=Europe/Brussels -Druntime.java=23

HybridCollectorManagerTests > testRescoreWithConcurrentSegmentSearch_whenMatchedDocsAndRescore_thenSuccessful FAILED
    java.lang.AssertionError: expected:<3> but was:<1>
        at __randomizedtesting.SeedInfo.seed([31B75FEED86AF287:BB4DF1B8A1D4DC19]:0)
        at org.junit.Assert.fail(Assert.java:89)
        at org.junit.Assert.failNotEquals(Assert.java:835)
        at org.junit.Assert.assertEquals(Assert.java:647)
        at org.junit.Assert.assertEquals(Assert.java:633)
        at org.opensearch.neuralsearch.search.query.HybridCollectorManagerTests.testRescoreWithConcurrentSegmentSearch_whenMatchedDocsAndRescore_thenSuccessful(HybridCollectorManagerTests.java:1019)

@owaiskazi19 owaiskazi19 requested a review from yuye-aws May 20, 2025 05:26
Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Thanks for the PR @owaiskazi19 !

@yuye-aws
Copy link
Member

Not have much context on the current flakey tests. Can @vibrantvarun help verify?

Copy link
Member

@junqiu-lei junqiu-lei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! @owaiskazi19

Copy link

codecov bot commented Jun 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.08%. Comparing base (42b1c3e) to head (7eaa3e4).

Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1320       +/-   ##
===========================================
+ Coverage        0   80.08%   +80.08%     
- Complexity      0     2185     +2185     
===========================================
  Files           0      159      +159     
  Lines           0     8326     +8326     
  Branches        0     1346     +1346     
===========================================
+ Hits            0     6668     +6668     
- Misses          0     1139     +1139     
- Partials        0      519      +519     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@heemin32
Copy link
Collaborator

Thanks for the PR. I have one more request—could you set the default number of shards to 3 for the entire test? This will ensure the test properly exercises the multi-node scenario.

@@ -52,3 +52,39 @@ jobs:
run: |
chown -R 1000:1000 `pwd`
su `id -un 1000` -c "whoami && java -version && ./gradlew integTest -Dsecurity.enabled=true"

multi-node-integ-test-with-security-linux:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is added only for secure cluster only, I think we need it for general non-secure mode at the first place

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also run integ tests in bunch of other CI actions, example 1, example 2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did coupe of improvements in the CI overall. Please see the description #1320 (comment)

@martin-gaievski
Copy link
Member

Thanks for the PR. I have one more request—could you set the default number of shards to 3 for the entire test? This will ensure the test properly exercises the multi-node scenario.

I'm not sure if you mean to move or copy the entire test suite for a multi-node setup. I think we need a one node tests, we had multiple cases in the past where issue appears only in a single shard/node setup. I suggest we have both single and multi node setup as part of CI

@heemin32
Copy link
Collaborator

Thanks for the PR. I have one more request—could you set the default number of shards to 3 for the entire test? This will ensure the test properly exercises the multi-node scenario.

I'm not sure if you mean to move or copy the entire test suite for a multi-node setup. I think we need a one node tests, we had multiple cases in the past where issue appears only in a single shard/node setup. I suggest we have both single and multi node setup as part of CI

I thought we are sharing test cases between single node and multi node? I that case, setting shard number to be 3 even for single node won't have any harm?

@martin-gaievski
Copy link
Member

Thanks for the PR. I have one more request—could you set the default number of shards to 3 for the entire test? This will ensure the test properly exercises the multi-node scenario.

I'm not sure if you mean to move or copy the entire test suite for a multi-node setup. I think we need a one node tests, we had multiple cases in the past where issue appears only in a single shard/node setup. I suggest we have both single and multi node setup as part of CI

I thought we are sharing test cases between single node and multi node? I that case, setting shard number to be 3 even for single node won't have any harm?

We can only change number of nodes from the outside of test logic.

If we have whole test suite running on both single node and multiple nodes that's fine, we have more coverage. Where I do have concerns is if we remove tests that run on a single node and will run everything on a multiple nodes. At first glance this should lead to a yellow cluster state if we have only one shard without replicas.

@heemin32
Copy link
Collaborator

At first glance this should lead to a yellow cluster state if we have only one shard without replicas.

This is true for single node as well. Cluster health will be yellow if there is no replica even with single node.

@owaiskazi19 owaiskazi19 force-pushed the multi-node-tests branch 8 times, most recently from b512144 to 7a5fda6 Compare June 26, 2025 15:15
@martin-gaievski
Copy link
Member

At first glance this should lead to a yellow cluster state if we have only one shard without replicas.

This is true for single node as well. Cluster health will be yellow if there is no replica even with single node.

Yes, you're correct, this will be same for a single node cluster, then it should be fine to run all tests on a multi node cluster

@owaiskazi19 owaiskazi19 force-pushed the multi-node-tests branch 3 times, most recently from 527203d to 7949a8f Compare June 26, 2025 18:03
Signed-off-by: Owais <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants