Skip to content

Adding integration of derived source feature across diff paths #18054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

tanik98
Copy link
Contributor

@tanik98 tanik98 commented Apr 23, 2025

Description

  • Integrating derive source feature in get and search path
  • Tested this integration for various cases.
  • Added integration tests for all relevant paths like create/update/get/search/reindex/recover.
  • For recover flow have added IntegrationTests around cases like node restart, index replication etc.

Considerations:

  • For the scenarios where we read from the translog, we have created the in-memory lucene index, which will ensure that we derive source from lucene docValues and stored fields only instead of reading _source directly from translog. This will help in making final resulted source consistent in terms of various fields(like Date, GeoPoint(we always return in lat/lon pair, input source can have different format like geohash)) where display format can vary as compared to suppkied format passed during ingestion.

This PR is a continuation of #17759

Related Issues

Resolves #17073

Part of feature #9568

Check List

  • Functionality includes testing.
  • API changes companion pull request created, if applicable.
  • Public documentation issue has been created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions github-actions bot added enhancement Enhancement or improvement to existing feature or request Indexing:Performance labels Apr 23, 2025
Copy link
Contributor

❌ Gradle check result for 116f32e: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 1d5e1aa: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@tanik98 tanik98 force-pushed the tanik-derived-source-feature-integration branch from 1d5e1aa to 2cf97a1 Compare April 23, 2025 20:25
Copy link
Contributor

❕ Gradle check result for 2cf97a1: UNSTABLE

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Copy link

codecov bot commented Apr 23, 2025

Codecov Report

Attention: Patch coverage is 68.90244% with 51 lines in your changes missing coverage. Please review.

Project coverage is 72.72%. Comparing base (3b018ad) to head (77f80d9).
Report is 39 commits behind head on main.

Files with missing lines Patch % Lines
...rg/opensearch/index/engine/TranslogLeafReader.java 43.10% 27 Missing and 6 partials ⚠️
...java/org/opensearch/index/mapper/ObjectMapper.java 33.33% 7 Missing and 1 partial ⚠️
...va/org/opensearch/index/engine/InternalEngine.java 68.75% 3 Missing and 2 partials ⚠️
.../main/java/org/opensearch/index/mapper/Mapper.java 0.00% 2 Missing ⚠️
.../org/opensearch/index/mapper/RootObjectMapper.java 80.00% 2 Missing ⚠️
...in/java/org/opensearch/index/shard/IndexShard.java 83.33% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #18054      +/-   ##
============================================
+ Coverage     72.52%   72.72%   +0.20%     
- Complexity    67509    67731     +222     
============================================
  Files          5496     5499       +3     
  Lines        311467   311781     +314     
  Branches      45253    45257       +4     
============================================
+ Hits         225891   226755     +864     
+ Misses        67180    66581     -599     
- Partials      18396    18445      +49     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

github-actions bot commented Jun 9, 2025

❌ Gradle check result for 98bb859: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

github-actions bot commented Jun 9, 2025

✅ Gradle check result for 77f80d9: SUCCESS

@tanik98 tanik98 requested a review from mgodwan June 9, 2025 20:02
@shwetathareja shwetathareja merged commit 190ea02 into opensearch-project:main Jun 10, 2025
30 of 31 checks passed
randomWriter.commit();

DirectoryReader randomDirectoryReader = DirectoryReader.open(randomWriter);
LeafReader randomLeafReader = randomDirectoryReader.leaves().get(0).reader();
Copy link
Contributor

@msfroh msfroh Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanik98 This part of the test only works if you can guarantee that you're merged down to a single segment.

Since you have a random IndexWriterConfig, sometimes it will flush in the middle of adding docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test failure:

Suite: Test class org.opensearch.common.lucene.index.DerivedSourceLeafReaderTests
  1> [2025-06-10T16:49:21,747][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testGetSequentialStoredFieldsReaderWithInvalidReader] before test
  1> [2025-06-10T16:49:21,748][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testGetSequentialStoredFieldsReaderWithInvalidReader] after test
  1> [2025-06-10T16:49:21,758][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testWithRandomDocuments] before test
  1> [2025-06-10T16:49:21,761][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testWithRandomDocuments] after test
  2> REPRODUCE WITH: ./gradlew ':server:test' --tests "org.opensearch.common.lucene.index.DerivedSourceLeafReaderTests.testWithRandomDocuments" -Dtests.seed=7EFEA17C239307B5 -Dtests.security.manager=true -Dtests.jvm.argline="-XX:TieredStopAtLevel=1 -XX:ReservedCodeCacheSize=64m" -Dtests.locale=ff-Adlm-SN -Dtests.timezone=America/Cayenne -Druntime.java=21
  2> java.lang.IndexOutOfBoundsException: Index 8 out of bounds for length 8
        at __randomizedtesting.SeedInfo.seed([7EFEA17C239307B5:480F885B0D6542F4]:0)
        at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:100)
        at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:106)
        at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:302)
        at java.base/java.util.Objects.checkIndex(Objects.java:385)
        at org.apache.lucene.index.CodecReader$1.document(CodecReader.java:100)
        at org.opensearch.common.lucene.index.DerivedSourceStoredFieldsReader$DerivedSourceStoredFields.document(DerivedSourceStoredFieldsReader.java:108)
        at org.opensearch.common.lucene.index.DerivedSourceLeafReaderTests.testWithRandomDocuments(DerivedSourceLeafReaderTests.java:184)
  1> [2025-06-10T16:49:21,775][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testGetCoreAndReaderCacheHelper] before test
  1> [2025-06-10T16:49:21,775][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testGetCoreAndReaderCacheHelper] after test
  1> [2025-06-10T16:49:21,779][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testStoredFields] before test
  1> [2025-06-10T16:49:21,780][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testStoredFields] after test
  1> [2025-06-10T16:49:21,783][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testGetSequentialStoredFieldsReaderWithSequentialReader] before test
  1> [2025-06-10T16:49:21,784][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testGetSequentialStoredFieldsReaderWithSequentialReader] after test
  1> [2025-06-10T16:49:21,787][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testGetSequentialStoredFieldsReaderWithCodecReader] before test
  1> [2025-06-10T16:49:21,788][INFO ][o.o.c.l.i.DerivedSourceLeafReaderTests] [testGetSequentialStoredFieldsReaderWithCodecReader] after test

Copy link
Contributor

@rishabhmaurya rishabhmaurya Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @msfroh for pointers.
I think we could have a bit more careful merging the changes when second last build failed with these flaky tests - #18054 (comment)
created #18485

Copy link
Member

@andrross andrross Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msfroh @rishabhmaurya Any reason not to revert this commit in the meantime while the test failure is investigated?

Edit: looks like a fix is ready: #18493

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrross looks like we have PR to address it #18493
we can merge and see if it helps, it not, its probably wiser to revert this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should help, we can merge and try it out. Please help with the review on #18493.

andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 11, 2025
abhita pushed a commit to abhita/OpenSearch that referenced this pull request Jun 17, 2025
…earch-project#18054)

* Adding integration of derived source feature across diff paths
* Using index setting instead of mapping field in _source
* Modifying IT to include all the supported field type for derived source

Signed-off-by: Tanik Pansuriya <[email protected]>
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 17, 2025
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 17, 2025
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 17, 2025
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 17, 2025
andrross added a commit that referenced this pull request Jun 17, 2025
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 17, 2025
andrross added a commit to andrross/OpenSearch that referenced this pull request Jun 17, 2025
andrross added a commit that referenced this pull request Jun 18, 2025
tanik98 added a commit to tanik98/OpenSearch that referenced this pull request Jun 20, 2025
…earch-project#18054)

* Adding integration of derived source feature across diff paths
* Using index setting instead of mapping field in _source
* Modifying IT to include all the supported field type for derived source

Signed-off-by: Tanik Pansuriya <[email protected]>
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
…earch-project#18054)

* Adding integration of derived source feature across diff paths
* Using index setting instead of mapping field in _source
* Modifying IT to include all the supported field type for derived source

Signed-off-by: Tanik Pansuriya <[email protected]>


Signed-off-by: TJ Neuenfeldt <[email protected]>
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
…earch-project#18054)

* Adding integration of derived source feature across diff paths
* Using index setting instead of mapping field in _source
* Modifying IT to include all the supported field type for derived source

Signed-off-by: Tanik Pansuriya <[email protected]>
neuenfeldttj pushed a commit to neuenfeldttj/OpenSearch that referenced this pull request Jun 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Indexing:Performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Derived Source] Add support for deriving source field in FieldMapper
7 participants