Skip to content

add stats for text embedding processors with flags #1332

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

will-hwang
Copy link
Contributor

@will-hwang will-hwang commented May 19, 2025

Description

Enhance Stats for Text Embedding Processor, including stats for skip_existing option enabled

Updated Response

{
	"_nodes": {
		"total": 1,
		"successful": 1,
		"failed": 0
	},
	"cluster_name": "integTest",
	"info": {
		"cluster_version": "3.1.0",
		"processors": {
			"ingest": {
				"text_chunking_delimiter_processors": 0,
				"text_chunking_fixed_length_processors": 0,
				"text_embedding_processors_in_pipelines": 1,
				"text_embedding_skip_existing_processors": 1,
				"text_chunking_processors": 0
			}
		}
	},
	"all_nodes": {
		"processors": {
			"ingest": {
				"text_chunking_executions": 0,
				"text_embedding_executions": 2,
				"text_embedding_skip_existing_executions": 2,
				"text_chunking_fixed_length_executions": 0,
				"text_chunking_delimiter_executions": 0
			}
		}
	},
	"nodes": {
		"rMyVPGp2SsWL-sLQ3HSjCQ": {
			"processors": {
				"ingest": {
					"text_chunking_executions": 0,
					"text_embedding_executions": 2,
					"text_embedding_skip_existing_executions": 2,
					"text_chunking_fixed_length_executions": 0,
					"text_chunking_delimiter_executions": 0
				}
			}
		}
	}
}

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@heemin32
Copy link
Collaborator

Could you update the PR description with updated response of the stats api?

Copy link
Contributor

@q-andy q-andy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, minor nits

increment(stats, InfoStatName.TEXT_EMBEDDING_PROCESSORS);
Object skipExisting = processorConfig.get(TextEmbeddingProcessor.SKIP_EXISTING);
if (Objects.nonNull(skipExisting) && skipExisting.equals(Boolean.TRUE)) {
increment(stats, InfoStatName.TEXT_EMBEDDING_SKIP_EXISTING_PROCESSORS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it directly reads as boolean, not as map. I think it's cleaner this way

@@ -72,6 +72,7 @@ public void doExecute(
generateAndSetInference(ingestDocument, processMap, inferenceList, handler);
return;
}
EventStatsManager.increment(EventStatName.TEXT_EMBEDDING_PROCESSOR_SKIP_EXISTING_EXECUTIONS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this change but I think I noticed a bug in my initial PR, we should be incrementing the stat even when we run batch execute but currently we only increment during single execute so I think it might not be counted when we run pipelines in batch. Do you think you could include that change in this PR as well? If not I can try to catch it in the next one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, i can add it to this pr

@q-andy
Copy link
Contributor

q-andy commented May 20, 2025

Currently we increment the stat whenever we have an execution that has the option enabled. Should we also have a stat to record when we have a "cache hit" and successfully skip an inference, or do you think that would be redundant?

cc: @heemin32

@will-hwang
Copy link
Contributor Author

Currently we increment the stat whenever we have an execution that has the option enabled. Should we also have a stat to record when we have a "cache hit" and successfully skip an inference, or do you think that would be redundant?

cc: @heemin32

@q-andy It really comes down to what insight we want to gain from these stats.

  1. Do we want to know how many domains are using certain features?
  2. Do we also want to know how the features are performing with the features enabled?

I think we should start with 1 (which this PR addresses), and scope out how, and to what extent we should support 2.

@heemin32
Copy link
Collaborator

For cache_hit stats, we can wait until there is an ask from users with the feature.

@heemin32
Copy link
Collaborator

I'm not sure if the current stat name text_embedding_skip_existing_processors is appropriate, since it's not a standalone processor. Would it make more sense to track how many processors have the skip option enabled, regardless of type?

Also, do we really need a separate execution stat like text_embedding_skip_existing_executions? Is it providing distinct value?

@will-hwang
Copy link
Contributor Author

@heemin32 the naming convention follows the one for text chunking which have different available algorithm options (link). As for execution stats, it would provide stats for how many times the processor with skip_existing flag was executed, which would be different than how many processors have skip_existing flag enabled. If we're okay with just having the latter, i'm okay with it too. But It seems like EventStat and InfoStat share the same for other processors like text chunking and normalization

@junqiu-lei
Copy link
Member

@will-hwang You might need rebase with main branch to pass the CI.

@will-hwang will-hwang force-pushed the optimzed_embedding_processor_stats branch from e9ac623 to 2e3afdd Compare May 22, 2025 21:31
@heemin32
Copy link
Collaborator

Regarding the stats APIs in the neural plugin — do we really need to track processor executions metrics? Wouldn’t the number of processors alone be sufficient to measure adoption?

My main concern is scalability. The stats APIs have limitations in that area, and once we add a metric, it's difficult to remove it later. That’s why I’d prefer to keep the metrics as minimal as possible.

@will-hwang
Copy link
Contributor Author

will-hwang commented May 28, 2025

Regarding the stats APIs in the neural plugin — do we really need to track processor executions metrics? Wouldn’t the number of processors alone be sufficient to measure adoption?

My main concern is scalability. The stats APIs have limitations in that area, and once we add a metric, it's difficult to remove it later. That’s why I’d prefer to keep the metrics as minimal as possible.

@heemin32
if we are okay with simply tracking adoption, i think we can remove the eventStats metrics and only keep one for infoStats. The change will probably need to be made for other processors too then. If we do make this change, for what cases should event execution metrics be emitted?

@martin-gaievski
Copy link
Member

Regarding the stats APIs in the neural plugin — do we really need to track processor executions metrics? Wouldn’t the number of processors alone be sufficient to measure adoption?

My main concern is scalability. The stats APIs have limitations in that area, and once we add a metric, it's difficult to remove it later. That’s why I’d prefer to keep the metrics as minimal as possible.

detailed metrics would be useful too, we can see which search configuration is most used one and invest there our efforts to improve relevance or other aspects of search. If number of metrics is critical for infra team then we can have only number of processors, that is P0.

@heemin32
Copy link
Collaborator

Regarding the stats APIs in the neural plugin — do we really need to track processor executions metrics? Wouldn’t the number of processors alone be sufficient to measure adoption?
My main concern is scalability. The stats APIs have limitations in that area, and once we add a metric, it's difficult to remove it later. That’s why I’d prefer to keep the metrics as minimal as possible.

@heemin32 if we are okay with simply tracking adoption, i think we can remove the eventStats metrics and only keep one for infoStats. The change will probably need to be made for other processors too then. If we do make this change, for what cases should event execution metrics be emitted?

For event execution, we need to add metrics which cannot be retrieved from eventStats. For example, number of neural query execution of which data is not available from cluster info.

@will-hwang
Copy link
Contributor Author

Regarding the stats APIs in the neural plugin — do we really need to track processor executions metrics? Wouldn’t the number of processors alone be sufficient to measure adoption?
My main concern is scalability. The stats APIs have limitations in that area, and once we add a metric, it's difficult to remove it later. That’s why I’d prefer to keep the metrics as minimal as possible.

@heemin32 if we are okay with simply tracking adoption, i think we can remove the eventStats metrics and only keep one for infoStats. The change will probably need to be made for other processors too then. If we do make this change, for what cases should event execution metrics be emitted?

For event execution, we need to add metrics which cannot be retrieved from eventStats. For example, number of neural query execution of which data is not available from cluster info.

sounds reasonable to me. What do others think?
@q-andy @martin-gaievski

@heemin32
Copy link
Collaborator

Regarding the stats APIs in the neural plugin — do we really need to track processor executions metrics? Wouldn’t the number of processors alone be sufficient to measure adoption?
My main concern is scalability. The stats APIs have limitations in that area, and once we add a metric, it's difficult to remove it later. That’s why I’d prefer to keep the metrics as minimal as possible.

detailed metrics would be useful too, we can see which search configuration is most used one and invest there our efforts to improve relevance or other aspects of search. If number of metrics is critical for infra team then we can have only number of processors, that is P0.

I agree that more data can be better, but we also have to consider that adding a metric isn't without cost. Also, there's a chance that a single user could generate a large number of calls to a specific processor, which might skew the data and not accurately reflect its true popularity.
Perhaps tracking the number of processors used would be a more reliable metric for measuring adoption and it alone might be sufficient?

@@ -109,6 +110,7 @@ public void doBatchExecute(List<String> inferenceList, Consumer<List<?>> handler
@Override
public void subBatchExecute(List<IngestDocumentWrapper> ingestDocumentWrappers, Consumer<List<IngestDocumentWrapper>> handler) {
try {
EventStatsManager.increment(EventStatName.TEXT_EMBEDDING_PROCESSOR_EXECUTIONS);
Copy link
Collaborator

@bzhangam bzhangam Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we count this by the num of the docs that we are processing? I feel we may want to know how many docs are processed by the processor.

Or we may want to use another event name for the batch processing use case otherwise it can be confusing if we want to rely on this event to tell how many docs we have processed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we're more concerned with the number of execution than how many docs being processed per execution in general for other processors as well

@q-andy
Copy link
Contributor

q-andy commented Jun 2, 2025

My main concern is scalability. The stats APIs have limitations in that area, and once we add a metric, it's difficult to remove it later. That’s why I’d prefer to keep the metrics as minimal as possible.

Had a chat with infra team, the primary concern is high memory consumption on large clusters seen when calling APIs like _node/stats and _node/state due to large response payloads. For production clusters they mitigate this by calling multiple times and filtering specific stats. I opened #1360 to give an option to mitigate the size of the payload, and opened an issue to add more filtering options in #1363.

Based on this, for 3.1 it should be okay to add granular stats, and caller side can filter them as needed if we run into scalability concerns.

@heemin32 heemin32 merged commit c0faee8 into opensearch-project:main Jun 2, 2025
47 of 50 checks passed
Copy link

codecov bot commented Jun 2, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (979a9fc) to head (2e3afdd).
Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #1332       +/-   ##
============================================
- Coverage     82.62%       0   -82.63%     
============================================
  Files           149       0      -149     
  Lines          7257       0     -7257     
  Branches       1164       0     -1164     
============================================
- Hits           5996       0     -5996     
+ Misses          811       0      -811     
+ Partials        450       0      -450     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants