Skip to content

Add Lucene improvements for HNSW merging #129046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Jun 6, 2025

apache/lucene#14527 improved the heap usage for HNSW merging.

As this change still hasn't made any Lucene release, we'd like to incorporate this into Elasticsearch.

We have copied over the Lucene changes into Elasticsearch code and create a new Elasticsearch codec that will use the copy of the Lucene Lucene99HnswVectorsWriter, renamed to ES910HnswReducedHeapVectorsWriter.

Changes done:

  • Rename Elasticsearch900Lucene101Codec to Elasticsearch910Lucene102Codec, to signal the codec is for ES 9.10 and Lucene 10.2
  • Create a new ES910HnswVectorsFormat that will provide the entry point for the writer vector changes.
    • The new format is used to replace the Lucene99HnswVectorsFormat usages, as it is compatible with it - same file formats
  • Create a new ES910HnswVectorsWriter that uses the copied over classes from Lucene. HNSW improvements will be on the merge side, so we're only focused on the Writer aspect of the format.
  • Three packages are created to hold the copied code from Lucene:
    • org.elasticsearch.index.codec.vectors.es910.hnsw
    • org.elasticsearch.index.codec.vectors.es910.internal.hppc
    • org.elasticsearch.index.codec.vectors.es910.util

We can remove these changes when Lucene 10.3 is created and merged into Elasticsearch.

@carlosdelest carlosdelest added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0 :Search Relevance/Vectors Vector search :Search Relevance/Search Catch all for Search Relevance >enhancement labels Jun 6, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @carlosdelest, I've created a changelog YAML for you.

carlosdelest and others added 13 commits June 6, 2025 13:43
…reduce-heap' into feature/dense-vector-hnsw-reduce-heap
…reduce-heap' into feature/dense-vector-hnsw-reduce-heap
* additional candidates is predicated on the original candidate's filtered percentage.
* </ul>
*/
public class FilteredHnswGraphSearcher extends HnswGraphSearcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this? Only used on read, never during graph building.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, removed in 378111f. Thanks!

* thread-safe. The search method optionally takes a set of "accepted nodes", which can be used to
* exclude deleted documents.
*/
public abstract class HnswGraph {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these are public in Lucene, can we just rely on them or are they not exposed in the module?

Comment on lines 121 to 129
if (numMergeWorkers == 1 && mergeExec != null) {
throw new IllegalArgumentException("No executor service is needed as we'll use single thread to merge");
}
this.numMergeWorkers = numMergeWorkers;
if (mergeExec != null) {
this.mergeExec = new TaskExecutor(mergeExec);
} else {
this.mergeExec = null;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need any of this merge worker stuff, we don't use it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - removed in a6db01a

if (parallelMergeTaskExecutor != null && numParallelMergeWorkers > 1) {
return new ConcurrentHnswMerger(fieldInfo, scorerSupplier, M, beamWidth, parallelMergeTaskExecutor, numParallelMergeWorkers);
}
return new IncrementalHnswGraphMerger(fieldInfo, scorerSupplier, M, beamWidth);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we only need IncrementalHnswGraphMerger we never use the task exec stuff nor do concurrent merges.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed as part of a6db01a

import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;

/** This merger merges graph in a concurrent manner, by using {@link HnswConcurrentMergeBuilder} */
public class ConcurrentHnswMerger extends IncrementalHnswGraphMerger {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in a6db01a

*
* @lucene.experimental
*/
final class SeededHnswGraphSearcher extends AbstractHnswGraphSearcher {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need this

*
* @lucene.internal
*/
class HashContainers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can just extract what we want and directly place it into the fixed sized array thigns?


public class ArrayUtil {

public static float[] growInRange(float[] array, int minLength, int maxLength) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ES has access growInRange, I am not sure if any of the changes for growInRange are important?

* #insertWithOverflow(int, float)} and {@link #add(int, float)}, and provides MIN and MAX heap
* subclasses.
*/
public class NeighborQueue {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure we need to copy this? I thought this was available outside of lucene since its a public class. Maybe Lucene doesn't export the module?

private HnswUtil() {}

// Finds orphaned components on the graph level.
static List<Component> components(HnswGraph hnsw, int level, FixedBitSet notFullyConnected, int maxConn) throws IOException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of this is used now, maybe we can remove it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct - removed in 457e9a4


@Override
public long ramBytesUsed() {
return BASE_RAM_BYTES_USED + nodes.ramBytesUsed() + scores.ramBytesUsed();
Copy link
Member

@benwtrent benwtrent Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update the way Lucene does this (and consequently this) if we merge this. I think this will have a real performance impact as the way its done now in the builder is that it will iterate every array (e.g. every node) and do a calculation. That makes no sense IMO.

…hnsw-reduce-heap
@carlosdelest
Copy link
Member Author

@benwtrent @ChrisHegarty I've removed the non-needed classes and tried to prune the changes down to the minimum, including the removal of a new codec name.

Do you think there's anything else we can do to reduce the size of this change?

It's still a 3K LOC change. Is this something we should add to ES 9.1/8.19?

@carlosdelest
Copy link
Member Author

Closing as this is too big a change - let's wait for a Lucene release that contains the code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search Relevance/Search Catch all for Search Relevance :Search Relevance/Vectors Vector search Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants