Description
Description
In reference to the experiments done in here: #1049, I was found out that when we flip to exact search and use the doc values do the exact search we spend like 60-70% of the time in de-serialization of vectors from binary format to float array. Hence as part of that there was few solutions provided, capturing them as part of this issue. Experiment details:
Experiment 1 (Baseline to Find latency for doing distance calculation on the different size dataset with different dimensions)
Below are the steps that are performed to run this experiment:- For every dimension and every run, generate a dataset with required number of vectors in it. We choose 1k, 2k, 5k, 8k, 10k, 15k vectors count.
- After query vector is generated.
- Now distance calculation is done on the dataset for every run and every dimension. The time to do distance calculation is stored and this is what we see in the tables below. Lucene distance calculation functions are used to do the distance calculation. code ref
- The experiment then collects all the latencies and format them in a table.
dimensions | 1000_latency(ms) | 2000_latency(ms) | 5000_latency(ms) | 8000_latency(ms) | 10000_latency(ms) | 15000_latency(ms) |
---|---|---|---|---|---|---|
64 | 0.0751 | 0.07755 | 0.18681 | 0.33033 | 0.40572 | 0.60473 |
128 | 0.11913 | 0.14973 | 0.38251 | 0.64307 | 0.92643 | 1.17756 |
256 | 0.21269 | 0.30679 | 0.78219 | 1.26894 | 1.8096 | 2.38399 |
512 | 0.38937 | 0.5992 | 1.50937 | 2.5473 | 3.08621 | 4.54336 |
768 | 0.60041 | 0.88308 | 2.35232 | 3.64441 | 4.63974 | 7.03523 |
968 | 0.5958 | 1.11113 | 2.90708 | 4.61883 | 5.92536 | 8.58447 |
1024 | 0.60106 | 1.20861 | 3.12053 | 4.83357 | 6.24293 | 9.18919 |
2048 | 1.16614 | 2.45611 | 6.04847 | 9.83145 | 12.46421 | 18.96537 |
4096 | 2.39995 | 4.86545 | 11.93488 | 19.87645 | 25.46671 | 40.70765 |
Experiment 2(Doing de-serialization on vectors and then doing distance calculation)
Below are the steps that are performed to run this experiment:
- For every dimension and every run, generate a dataset with required number of vectors in it. We choose 1k, 2k, 5k, 8k, 10k, 15k vectors count. The vectors in this dataset are serialized using the same code we use in K-NN(Code ref).
- After query vector is generated. We don’t need to serialize the query as this is not what happens in the exact search.
- Now to do the distance calculation we first de-serialized the vectors in the dataset using same code used by k-NN(Code ref) and then performed the distance calculation using Lucene provided functions. This is done for different dimensions and dataset. The time taken in this full step de-serialization and distance calculation is what being provided in below table. Lucene distance calculation functions are used to do the distance calculation. code ref
- The experiment then collects all the latencies and format them in a table.
So, the query latency include:
- Time taken to de-serialize the vectors to floats.
- Distance calculation between the query vector and indexed vector.
dimensions | 1000_latency_serialized(ms) | 2000_latency_serialized(ms) | 5000_latency_serialized(ms) | 8000_latency_serialized(ms) | 10000_latency_serialized(ms) | 15000_latency_serialized(ms) |
---|---|---|---|---|---|---|
64 | 0.5361 | 0.30428 | 0.77805 | 1.2804 | 1.8098 | 2.36014 |
128 | 0.43123 | 0.54735 | 1.24873 | 1.9306 | 2.48106 | 3.5845 |
256 | 0.58271 | 1.06349 | 2.07025 | 3.69847 | 4.90092 | 8.71385 |
512 | 0.79718 | 1.65931 | 4.18262 | 6.67985 | 8.53127 | 15.2144 |
768 | 1.15486 | 2.20225 | 5.68341 | 9.35401 | 12.03496 | 23.44284 |
968 | 1.4213 | 2.85394 | 7.74272 | 14.27054 | 14.97346 | 33.13775 |
1024 | 1.46418 | 3.15568 | 7.50078 | 14.56883 | 17.00156 | 29.74787 |
2048 | 2.97711 | 6.85896 | 16.52853 | 24.79757 | 30.17373 | 78.9689 |
4096 | 5.90317 | 11.74842 | 31.05969 | 64.13292 | 83.25422 | 122.25665 |
Combining the Results of Experiment 1 and Experiment 2
dimensions | 1000_latency(ms) | 1000_latency_serialized(ms) | 2000_latency(ms) | 2000_latency_serialized(ms) | 5000_latency(ms) | 5000_latency_serialized(ms) | 8000_latency(ms) | 8000_latency_serialized(ms) | 10000_latency(ms) | 10000_latency_serialized(ms) | 15000_latency(ms) | 15000_latency_serialized(ms) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
64 | 0.0751 | 0.5361 | 0.07755 | 0.30428 | 0.18681 | 0.77805 | 0.33033 | 1.2804 | 0.40572 | 1.8098 | 0.60473 | 2.36014 |
128 | 0.11913 | 0.43123 | 0.14973 | 0.54735 | 0.38251 | 1.24873 | 0.64307 | 1.9306 | 0.92643 | 2.48106 | 1.17756 | 3.5845 |
256 | 0.21269 | 0.58271 | 0.30679 | 1.06349 | 0.78219 | 2.07025 | 1.26894 | 3.69847 | 1.8096 | 4.90092 | 2.38399 | 8.71385 |
512 | 0.38937 | 0.79718 | 0.5992 | 1.65931 | 1.50937 | 4.18262 | 2.5473 | 6.67985 | 3.08621 | 8.53127 | 4.54336 | 15.2144 |
768 | 0.60041 | 1.15486 | 0.88308 | 2.20225 | 2.35232 | 5.68341 | 3.64441 | 9.35401 | 4.63974 | 12.03496 | 7.03523 | 23.44284 |
968 | 0.5958 | 1.4213 | 1.11113 | 2.85394 | 2.90708 | 7.74272 | 4.61883 | 14.27054 | 5.92536 | 14.97346 | 8.58447 | 33.13775 |
1024 | 0.60106 | 1.46418 | 1.20861 | 3.15568 | 3.12053 | 7.50078 | 4.83357 | 14.56883 | 6.24293 | 17.00156 | 9.18919 | 29.74787 |
2048 | 1.16614 | 2.97711 | 2.45611 | 6.85896 | 6.04847 | 16.52853 | 9.83145 | 24.79757 | 12.46421 | 30.17373 | 18.96537 | 78.9689 |
4096 | 2.39995 | 5.90317 | 4.86545 | 11.74842 | 11.93488 | 31.05969 | 19.87645 | 64.13292 | 25.46671 | 83.25422 | 40.70765 | 122.25665 |
Time Spent in De-Serialization
--
dimensions | 1000_latency_diff(ms) | 2000_latency_diff(ms) | 5000_latency_diff(ms) | 8000_latency_diff(ms) | 10000_latency_diff(ms) | 15000_latency_diff(ms) |
---|---|---|---|---|---|---|
64 | 0.461 | 0.22673 | 0.59125 | 0.95007 | 1.40408 | 1.75541 |
128 | 0.3121 | 0.39762 | 0.86622 | 1.28753 | 1.55463 | 2.40694 |
256 | 0.37002 | 0.7567 | 1.28807 | 2.42953 | 3.09132 | 6.32987 |
512 | 0.40781 | 1.06011 | 2.67325 | 4.13255 | 5.44505 | 10.67103 |
768 | 0.55445 | 1.31917 | 3.33109 | 5.7096 | 7.39523 | 16.40761 |
968 | 0.8255 | 1.7428 | 4.83564 | 9.65171 | 9.04811 | 24.55328 |
1024 | 0.86312 | 1.94707 | 4.38025 | 9.73526 | 10.75863 | 20.55868 |
2048 | 1.81097 | 4.40285 | 10.48007 | 14.96612 | 17.70952 | 60.00352 |
4096 | 3.50322 | 6.88298 | 19.12481 | 44.25647 | 57.78751 | 81.549 |
Experiments Observations
- De-serialization is most expensive operation while doing the Search. We can see that with 512 dimensions and 5K vectors we are taking around 2.6 ms for de-serialization which is around 63% of the time(4.18ms taken to compute the distances) and around 69% for lower dimensions like 128.
Time Spent In De-Serialization in %age
--
dimensions | 1000_latency_diff | 2000_latency_diff | 5000_latency_diff | 8000_latency_diff | 10000_latency_diff | 15000_latency_diff |
---|---|---|---|---|---|---|
64 | 85.99 | 74.51 | 75.99 | 74.20 | 77.58 | 74.38 |
128 | 72.38 | 72.64 | 69.37 | 66.69 | 62.66 | 67.15 |
256 | 63.50 | 71.15 | 62.22 | 65.69 | 63.08 | 72.64 |
512 | 51.16 | 63.89 | 63.91 | 61.87 | 63.82 | 70.14 |
768 | 48.01 | 59.90 | 58.61 | 61.04 | 61.45 | 69.99 |
968 | 58.08 | 61.07 | 62.45 | 67.63 | 60.43 | 74.09 |
1024 | 58.95 | 61.70 | 58.40 | 66.82 | 63.28 | 69.11 |
2048 | 60.83 | 64.19 | 63.41 | 60.35 | 58.69 | 75.98 |
4096 | 59.34 | 58.59 | 61.57 | 69.01 | 69.41 | 66.70 |
Experiment Conclusion
- The above experiments showed us that its the De-Serialization that is taking up most of the time during the exact search.
Possible solutions:
- Short Term: To improve the exact search time, we should enable a caching layer for this de-serialization so that we can improve the latencies for exact search. Along with this SIMD instructions set will help reducing the overall time.
- Figure out better format to store the vectors. Like can we use .vec file which is used by lucene engine too.
Benefits
This will feature will not only boost the filtered search but will also help in reducing the latency for script score query for k-NN where we read these vectors to perform the exact search.
This will also provide us some gains during the merge operation of segments as we read the binary doc values there too.
Metadata
Metadata
Assignees
Type
Projects
Status