-
Notifications
You must be signed in to change notification settings - Fork 25.4k
Add Lucene improvements for HNSW merging #129046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Lucene improvements for HNSW merging #129046
Conversation
Hi @carlosdelest, I've created a changelog YAML for you. |
…ne99 vector writers
…reduce-heap' into feature/dense-vector-hnsw-reduce-heap
…reduce-heap' into feature/dense-vector-hnsw-reduce-heap
* additional candidates is predicated on the original candidate's filtered percentage. | ||
* </ul> | ||
*/ | ||
public class FilteredHnswGraphSearcher extends HnswGraphSearcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this? Only used on read, never during graph building.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, removed in 378111f. Thanks!
* thread-safe. The search method optionally takes a set of "accepted nodes", which can be used to | ||
* exclude deleted documents. | ||
*/ | ||
public abstract class HnswGraph { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if these are public in Lucene, can we just rely on them or are they not exposed in the module?
if (numMergeWorkers == 1 && mergeExec != null) { | ||
throw new IllegalArgumentException("No executor service is needed as we'll use single thread to merge"); | ||
} | ||
this.numMergeWorkers = numMergeWorkers; | ||
if (mergeExec != null) { | ||
this.mergeExec = new TaskExecutor(mergeExec); | ||
} else { | ||
this.mergeExec = null; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need any of this merge worker stuff, we don't use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - removed in a6db01a
if (parallelMergeTaskExecutor != null && numParallelMergeWorkers > 1) { | ||
return new ConcurrentHnswMerger(fieldInfo, scorerSupplier, M, beamWidth, parallelMergeTaskExecutor, numParallelMergeWorkers); | ||
} | ||
return new IncrementalHnswGraphMerger(fieldInfo, scorerSupplier, M, beamWidth); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we only need IncrementalHnswGraphMerger
we never use the task exec stuff nor do concurrent merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed as part of a6db01a
import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS; | ||
|
||
/** This merger merges graph in a concurrent manner, by using {@link HnswConcurrentMergeBuilder} */ | ||
public class ConcurrentHnswMerger extends IncrementalHnswGraphMerger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed in a6db01a
* | ||
* @lucene.experimental | ||
*/ | ||
final class SeededHnswGraphSearcher extends AbstractHnswGraphSearcher { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need this
* | ||
* @lucene.internal | ||
*/ | ||
class HashContainers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we can just extract what we want and directly place it into the fixed sized array thigns?
|
||
public class ArrayUtil { | ||
|
||
public static float[] growInRange(float[] array, int minLength, int maxLength) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ES has access growInRange, I am not sure if any of the changes for growInRange are important?
* #insertWithOverflow(int, float)} and {@link #add(int, float)}, and provides MIN and MAX heap | ||
* subclasses. | ||
*/ | ||
public class NeighborQueue { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure we need to copy this? I thought this was available outside of lucene since its a public class. Maybe Lucene doesn't export the module?
private HnswUtil() {} | ||
|
||
// Finds orphaned components on the graph level. | ||
static List<Component> components(HnswGraph hnsw, int level, FixedBitSet notFullyConnected, int maxConn) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of this is used now, maybe we can remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct - removed in 457e9a4
|
||
@Override | ||
public long ramBytesUsed() { | ||
return BASE_RAM_BYTES_USED + nodes.ramBytesUsed() + scores.ramBytesUsed(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should update the way Lucene does this (and consequently this) if we merge this. I think this will have a real performance impact as the way its done now in the builder is that it will iterate every array (e.g. every node) and do a calculation. That makes no sense IMO.
…hnsw-reduce-heap
@benwtrent @ChrisHegarty I've removed the non-needed classes and tried to prune the changes down to the minimum, including the removal of a new codec name. Do you think there's anything else we can do to reduce the size of this change? It's still a 3K LOC change. Is this something we should add to ES 9.1/8.19? |
Closing as this is too big a change - let's wait for a Lucene release that contains the code |
apache/lucene#14527 improved the heap usage for HNSW merging.
As this change still hasn't made any Lucene release, we'd like to incorporate this into Elasticsearch.
We have copied over the Lucene changes into Elasticsearch code and create a new Elasticsearch codec that will use the copy of the Lucene
Lucene99HnswVectorsWriter
, renamed toES910HnswReducedHeapVectorsWriter
.Changes done:
Elasticsearch900Lucene101Codec
toElasticsearch910Lucene102Codec
, to signal the codec is for ES 9.10 and Lucene 10.2ES910HnswVectorsFormat
that will provide the entry point for the writer vector changes.Lucene99HnswVectorsFormat
usages, as it is compatible with it - same file formatsES910HnswVectorsWriter
that uses the copied over classes from Lucene. HNSW improvements will be on the merge side, so we're only focused on the Writer aspect of the format.org.elasticsearch.index.codec.vectors.es910.hnsw
org.elasticsearch.index.codec.vectors.es910.internal.hppc
org.elasticsearch.index.codec.vectors.es910.util
We can remove these changes when Lucene 10.3 is created and merged into Elasticsearch.