Skip to content

[V1][Metrics] Allow V1 AsyncLLM to use custom logger #14661

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 26, 2025

Conversation

liuzijing2014
Copy link
Collaborator

@liuzijing2014 liuzijing2014 commented Mar 12, 2025

Content

Update V1 AsynLLM to take in custom logger that extends StatLoggerBase. This would help to achieve function on parity as V0 regarding customization support for logging.

Follow V0 style on customizing loggers:

  • Take in optional stat_loggers from init.
  • Add add_logger and remove_logger functions.

Test Plan

pytest tests/entrypoints/openai/test_metrics.py 
...
======================== 13 passed, 1 skipped in 207.14s (0:03:27) ========================


pytest tests/v1/engine/test_async_llm.py
...
======================= 10 passed, 10 warnings in 160.87s (0:02:40) =======================

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label Mar 12, 2025
@robertgshaw2-redhat
Copy link
Collaborator

@markmc

Copy link
Member

@markmc markmc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this quite closely matches the V0 interface, but not quite - e.g. the record() method is new, and we're using new classes like IterationStats, and SchedulerStats

Personally, I'm not a fan of having all of this be part of the public API in such an ad-hoc way ... the logger behaviors and the stats classes etc. are all liable to change in future IMO, breaking users of the API

However, if this is something we want to retain from V0, then this PR looks reasonable

import os
import string
import sys
import threading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this change from the PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the auto format for imports ordering from isort. Is this formatting not desired?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liuzijing2014 the formatting change is to an unrelated file so shouldn't be included in the same PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is not desired

This code is copied from nvidia-ml-py and as per #12977:

If there are bugfixes in nvidia-ml-py in the future, we can periodically sync the code.

So, we would prefer not to modify this code from the original

We do use isort, but only using pre-commit and in .pre-commit-config.yaml we exclude this directory:

exclude: 'vllm/third_party/.*'

So, for example, if I checkout your branch and do:

$ pre-commit run --all-files isort

this file is not modified because of the exclude

Hope that helps!

self.stat_loggers[STANDARD_LOGGING_LOGGER_NAME] = (
LoggingStatLogger())
self.stat_loggers[PROMETHEUS_LOGGING_LOGGER_NAME] = (
PrometheusStatLogger(vllm_config))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we move all of this into the loggers module so that we just do e.g.

    self.stat_loggers = loggers.setup(vllm_config, stat_loggers)

@njhill
Copy link
Member

njhill commented Mar 12, 2025

Agree with @markmc these interfaces / data structures may still be in flux so we may want to be careful exposing as part of an external API at this stage. If we do it should be clearly documented as unstable.

@liuzijing2014
Copy link
Collaborator Author

liuzijing2014 commented Mar 12, 2025

Ok, this quite closely matches the V0 interface, but not quite - e.g. the record() method is new, and we're using new classes like IterationStats, and SchedulerStats

Personally, I'm not a fan of having all of this be part of the public API in such an ad-hoc way ... the logger behaviors and the stats classes etc. are all liable to change in future IMO, breaking users of the API

However, if this is something we want to retain from V0, then this PR looks reasonable

@markmc The main motivation is to allow users to have basic logging parity between V0 and V1 (v1 feedback thread ). We have some customized logging backend that we need to log stats to, and switching to V1 breaks the stats logging support for us.

@njhill Proper documentation sounds reasonable. Besides inline code comments, is there any other place that you think we need to call this part out?

Copy link

mergify bot commented Mar 12, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @liuzijing2014.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added needs-rebase documentation Improvements or additions to documentation ci/build frontend multi-modality Related to multi-modality (#4194) and removed needs-rebase labels Mar 12, 2025
@liuzijing2014 liuzijing2014 force-pushed the v1-custom-logger branch 2 times, most recently from 38bf2d6 to c9b95af Compare March 12, 2025 22:30
@liuzijing2014 liuzijing2014 marked this pull request as ready for review March 12, 2025 22:39
@liuzijing2014 liuzijing2014 requested a review from markmc March 12, 2025 22:39
@markmc
Copy link
Member

markmc commented Mar 13, 2025

See liuzijing2014#1 for some changes to finish this PR up. Hope that helps!

Commits at https://github.com/markmc/vllm/commits/5caa8fbfb00e3a23f4d461fac2f26e059c5efd7e

@markmc
Copy link
Member

markmc commented Mar 13, 2025

Agree with @markmc these interfaces / data structures may still be in flux so we may want to be careful exposing as part of an external API at this stage. If we do it should be clearly documented as unstable.

In liuzijing2014#1 I added:

class BuiltinLoggerName(str, Enum):
    """Internal names for built-in metric loggers                                                                                                                                                           
                                                                                                                                                                                                            
    These values are intended for internal use only. Accessing to                                                                                                                                           
    these built-in loggers is not considered to be part of the                                                                                                                                              
    public API.                                                                                                                                                                                             
    """
    LOGGING = "logging"
    PROMETHEUS = "prometheus"


class StatLoggerBase(ABC):
    """Interface for logging metrics.                                                                                                                                                                       
                                                                                                                                                                                                            
    API users may define custom loggers that implement this interface.                                                                                                                                      
    However, note that the `SchedulerStats` and `IterationStats` classes                                                                                                                                    
    are not considered stable interfaces and may change in future versions.                                                                                                                                 
    """

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@njhill Proper documentation sounds reasonable. Besides inline code comments, is there any other place that you think we need to call this part out?

I don't think so. The main place is in the docstring for the new stat_loggers arg to AsyncLLM (and also the add/remove methods but I would prefer to just omit those for now).

import os
import string
import sys
import threading
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liuzijing2014 the formatting change is to an unrelated file so shouldn't be included in the same PR.

Comment on lines 59 to 66
self.stat_loggers: dict[str, StatLoggerBase] = dict()
if self.log_stats:
if logger.isEnabledFor(logging.INFO):
self.stat_loggers.append(LoggingStatLogger())
self.stat_loggers.append(PrometheusStatLogger(vllm_config))
if stat_loggers is not None:
self.stat_loggers = stat_loggers
else:
setup_default_loggers(vllm_config,
logger.isEnabledFor(logging.INFO),
self.stat_loggers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest moving all of this logic into the setup function:

self.stat_loggers = setup_default_loggers(vllm_config, log_stats, stat_loggers)

Also:

  • No need to pass in logger.isEnabledFor(logging.INFO), that can be done inside the function
  • It would probably be cleaner to eliminate self.log_stats. Empty self.stat_loggers can serve that purpose (or possibly None)

PrometheusStatLogger(vllm_config))

if enable_stats_log:
stat_loggers[STANDARD_LOGGING_LOGGER_NAME] = (LoggingStatLogger())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
stat_loggers[STANDARD_LOGGING_LOGGER_NAME] = (LoggingStatLogger())
stat_loggers[STANDARD_LOGGING_LOGGER_NAME] = LoggingStatLogger()

Comment on lines 20 to 21
STANDARD_LOGGING_LOGGER_NAME = "logging"
PROMETHEUS_LOGGING_LOGGER_NAME = "prometheus"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unnecessarily long names

Suggested change
STANDARD_LOGGING_LOGGER_NAME = "logging"
PROMETHEUS_LOGGING_LOGGER_NAME = "prometheus"
STANDARD_LOGGER_NAME = "logging"
PROMETHEUS_LOGGER_NAME = "prometheus"

Comment on lines 362 to 464
def add_logger(self, logger_name: str, logger: StatLoggerBase) -> None:
if not self.log_stats:
raise RuntimeError(
"Stat logging is disabled. Set `disable_log_stats=False` "
"argument to enable.")
if logger_name in self.stat_loggers:
raise KeyError(f"Logger with name {logger_name} already exists.")
self.stat_loggers[logger_name] = logger

def remove_logger(self, logger_name: str) -> None:
if not self.log_stats:
raise RuntimeError(
"Stat logging is disabled. Set `disable_log_stats=False` "
"argument to enable.")
if logger_name not in self.stat_loggers:
raise KeyError(f"Logger with name {logger_name} does not exist.")
del self.stat_loggers[logger_name]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these methods really necessary if you can pass in the dict?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I add them here to keep the interface on parity as V0. Either way could work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to exclude them for now, since the API is different to V0 anyhow. We're trying to avoid as much unnecessary code in V1 as possible.

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @liuzijing2014 it looks good now, just have a couple more minor comments.

self.stat_loggers.append(LoggingStatLogger())
self.stat_loggers.append(PrometheusStatLogger(vllm_config))
self.stat_loggers = setup_default_loggers(
vllm_config, self.log_stats, logger.isEnabledFor(logging.INFO),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the logger.isEnabledFor(logging.INFO) check inside the method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liuzijing2014 WDYT about this suggestion?

@liuzijing2014 liuzijing2014 requested a review from mgoin as a code owner March 26, 2025 01:10
@markmc markmc removed ci/build multi-modality Related to multi-modality (#4194) labels Apr 23, 2025
@markmc
Copy link
Member

markmc commented Apr 24, 2025

@njhill FYI

Error: vllm/v1/metrics/loggers.py:510: error: Name "factories" already defined on line 508  [no-redef]

Copy link

mergify bot commented Apr 24, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @liuzijing2014.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Apr 24, 2025
@njhill
Copy link
Member

njhill commented Apr 24, 2025

@njhill FYI

Error: vllm/v1/metrics/loggers.py:510: error: Name "factories" already defined on line 508  [no-redef]

@markmc oops, now fixed

@mergify mergify bot removed the needs-rebase label Apr 24, 2025
Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmc @liuzijing2014 I think this is ready now?

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 25, 2025
@liuzijing2014
Copy link
Collaborator Author

LGTM, thanks for all the help! @njhill @markmc

@liuzijing2014 liuzijing2014 force-pushed the v1-custom-logger branch 2 times, most recently from 86a1452 to 9f08bd5 Compare April 25, 2025 21:49
liuzijing2014 and others added 4 commits April 25, 2025 17:45
Signed-off-by: Zijing Liu <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Zijing Liu <[email protected]>
More general than passing sub-class types, requiring sub-class
constructors to conform to a protocol.

Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Zijing Liu <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Signed-off-by: Zijing Liu <[email protected]>
@njhill njhill merged commit 53e8cf5 into vllm-project:main Apr 26, 2025
43 checks passed
@liuzijing2014 liuzijing2014 deleted the v1-custom-logger branch April 28, 2025 23:53
jikunshang pushed a commit to jikunshang/vllm that referenced this pull request Apr 29, 2025
)

Signed-off-by: Zijing Liu <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025
)

Signed-off-by: Zijing Liu <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
adobrzyn pushed a commit to HabanaAI/vllm-fork that referenced this pull request Apr 30, 2025
)

Signed-off-by: Zijing Liu <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: Agata Dobrzyniewicz <[email protected]>
RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025
)

Signed-off-by: Zijing Liu <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: Mu Huai <[email protected]>
zzzyq pushed a commit to zzzyq/vllm that referenced this pull request May 24, 2025
)

Signed-off-by: Zijing Liu <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: Yuqi Zhang <[email protected]>
minpeter pushed a commit to minpeter/vllm that referenced this pull request Jun 24, 2025
)

Signed-off-by: Zijing Liu <[email protected]>
Signed-off-by: Mark McLoughlin <[email protected]>
Signed-off-by: Nick Hill <[email protected]>
Co-authored-by: Mark McLoughlin <[email protected]>
Co-authored-by: Nick Hill <[email protected]>
Signed-off-by: minpeter <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
frontend ready ONLY add when PR is ready to merge/full CI is needed v1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants