Skip to content

Implemented computation of segment replication stats at shard level #17055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Feb 27, 2025

Conversation

vinaykpud
Copy link
Contributor

@vinaykpud vinaykpud commented Jan 19, 2025

Description

The method implemented here computes the segment replication stats at the shard level, instead of relying on the primary shard to compute stats based on reports from its replicas.

Method implemented in this PR serves the segment replication stats for following core APIs:

  1. Nodes Stats API (/_nodes/stats)
  2. Cluster Stats API (/_cluster/stats)
  3. Indices Stats API (/_stats or /{index}/_stats)

Related Issues

Resolves #16801
Related to #15306

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

The method implemented here computes the segment replication stats at the shard level,
instead of relying on the primary shard to compute stats based on reports from its replicas.

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

❌ Gradle check result for 04ba008: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

✅ Gradle check result for 3d030d5: SUCCESS

Copy link
Member

@mch2 mch2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinaykpud Thanks for pushing on this, I think this is really close.

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

❌ Gradle check result for 25fd006: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Copy link
Contributor

❌ Gradle check result for d8585f7: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❕ Gradle check result for d8585f7: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@mch2 mch2 added the backport 2.x Backport to 2.x branch label Feb 27, 2025
@mch2 mch2 merged commit ee7fbbd into opensearch-project:main Feb 27, 2025
33 of 34 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-17055-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 ee7fbbd226b2be81128eaafe19aad0a39244368c
# Push it to GitHub
git push --set-upstream origin backport/backport-17055-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-17055-to-2.x.

@mch2
Copy link
Member

mch2 commented Feb 27, 2025

ignoring backport here, this is a 3.0 only change.

@mch2 mch2 removed backport 2.x Backport to 2.x branch backport-failed labels Feb 27, 2025
vinaykpud added a commit to vinaykpud/OpenSearch that referenced this pull request Mar 18, 2025
…pensearch-project#17055)

* Implemented computation of segment replication stats at shard level

The method implemented here computes the segment replication stats at the shard level,
instead of relying on the primary shard to compute stats based on reports from its replicas.

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Updated style checks in the test

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Updated changelog

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* fixed style issues

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Fix the failing integration test

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Fix stylecheck

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Fixed the comments for the initial revision

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Updated to use System.nanoTime() for lag calculation

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Fixed the integration test for node stats

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Modified the version in the ReplicationCheckpoint for backward compatibility

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added precomputation logic for the stats calculation

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Removed unwanted lines

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Clean up the maps when index closed

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added a null check for the indexshard checkpoint

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* fix style checks

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Updated version and added bwc for RemoteSegmentMetadata

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Upated the javadoc comments

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Address comments PR

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Removed the latestReceivedCheckpoint map from SegmentReplicationTargetService

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Added granular locks for the concurrency of stats methods

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Style check fixes

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* Changes to maintain atomicity

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* spotlessApply

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* removed querying the remotestore when replication is in progress

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

* spotlessApply

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>

---------

Signed-off-by: Vinay Krishna Pudyodu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Search:Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature Request] Redefine the computation of segment replication metrics in Node Stats
3 participants