Skip to content

Fix for wrong connection count on remote nodes #37

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Feb 6, 2024

Conversation

HarshDaryani896
Copy link

@HarshDaryani896 HarshDaryani896 commented Jan 24, 2024

Description
Issue: Even after setting CONNECTION_POOL_REMOTE_SIZE to zero, connections were getting created to remote nodes. Number of connections on REMOTE nodes were equal to Number of connections on LOCAL nodes.

On some investigation we found out that the private localDc variable of YugabyteDefaultLoadBalancingPolicy class [child class] was hiding the localDc variable of the BasicLoadBalancingPolicy class [parent class], due to which the driver was not properly identifying Local AND Remote nodes. It was considering all Up nodes as Local.

Fix
Removed the private localDc from YugabyteDefaultLoadBalancingPolicy and made the localDc variable in BasicLoadBalancingPolicy protected.
Also did this change for another private member variable of YugabyteDefaultLoadBalancingPolicy class: distanceReporter

Note
This PR also includes some formatting changes in PartitionAwarePolicy.java

Testing

  1. Tested locally using a sample app to verify remote and local node connections.
  2. nosqlbench test runs with different consistency level and local datacenter specifications has been done, and load routing behaviour is as expected.
  3. Tests in ycql-4.x-tests repo: 7 Tests, 5 Passed, 2 Errors in TestDMLRoutingLocality(these occur with 4.15.0-yb-1 also).
  4. Tests in Yugabyte-db repo: 13 Tests, all Passed.

@ashetkar
Copy link

Also, please run the formatting command and commit the changes

mvn com.coveo:fmt-maven-plugin:format

Copy link

@kneeraj kneeraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should get a clean run of nosqlbench with correct routing behaviour before merging.

@HarshDaryani896
Copy link
Author

Should get a clean run of nosqlbench with correct routing behaviour before merging.

@kneeraj nosqlbench run with different consistency level and local datacenter specifications has been done, and load routing behaviour is as expected. cc: @ashetkar

@ashetkar
Copy link

ashetkar commented Feb 5, 2024

@HarshDaryani896 Let's get this PR cleaned of all the debugging logs and merge it, if we have successfully completed all the testing.
We can work on debugging logs on another branch.
Also, please mention the test case which was failing earlier but is fixed by your change, for reference.

Copy link

@ashetkar ashetkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please add a log at debug/info 1) when metadata is refreshed and 2) when leader is not found for a particular tablet.

Copy link

@kneeraj kneeraj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@HarshDaryani896 HarshDaryani896 merged commit 24fc207 into 4.15.x Feb 6, 2024
@HarshDaryani896 HarshDaryani896 deleted the 4.15-localDc-fix branch February 6, 2024 12:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants