-
-
Notifications
You must be signed in to change notification settings - Fork 8.5k
[V1][Metrics] Allow V1 AsyncLLM to use custom logger #14661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this quite closely matches the V0 interface, but not quite - e.g. the record()
method is new, and we're using new classes like IterationStats
, and SchedulerStats
Personally, I'm not a fan of having all of this be part of the public API in such an ad-hoc way ... the logger behaviors and the stats classes etc. are all liable to change in future IMO, breaking users of the API
However, if this is something we want to retain from V0, then this PR looks reasonable
vllm/third_party/pynvml.py
Outdated
import os | ||
import string | ||
import sys | ||
import threading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this change from the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the auto format for imports ordering from isort
. Is this formatting not desired?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liuzijing2014 the formatting change is to an unrelated file so shouldn't be included in the same PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is not desired
This code is copied from nvidia-ml-py and as per #12977:
If there are bugfixes in nvidia-ml-py in the future, we can periodically sync the code.
So, we would prefer not to modify this code from the original
We do use isort
, but only using pre-commit
and in .pre-commit-config.yaml
we exclude this directory:
exclude: 'vllm/third_party/.*'
So, for example, if I checkout your branch and do:
$ pre-commit run --all-files isort
this file is not modified because of the exclude
Hope that helps!
vllm/v1/engine/async_llm.py
Outdated
self.stat_loggers[STANDARD_LOGGING_LOGGER_NAME] = ( | ||
LoggingStatLogger()) | ||
self.stat_loggers[PROMETHEUS_LOGGING_LOGGER_NAME] = ( | ||
PrometheusStatLogger(vllm_config)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about we move all of this into the loggers module so that we just do e.g.
self.stat_loggers = loggers.setup(vllm_config, stat_loggers)
Agree with @markmc these interfaces / data structures may still be in flux so we may want to be careful exposing as part of an external API at this stage. If we do it should be clearly documented as unstable. |
@markmc The main motivation is to allow users to have basic logging parity between V0 and V1 (v1 feedback thread ). We have some customized logging backend that we need to log stats to, and switching to V1 breaks the stats logging support for us. @njhill Proper documentation sounds reasonable. Besides inline code comments, is there any other place that you think we need to call this part out? |
This pull request has merge conflicts that must be resolved before it can be |
38bf2d6
to
c9b95af
Compare
See liuzijing2014#1 for some changes to finish this PR up. Hope that helps! Commits at https://github.com/markmc/vllm/commits/5caa8fbfb00e3a23f4d461fac2f26e059c5efd7e |
In liuzijing2014#1 I added:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njhill Proper documentation sounds reasonable. Besides inline code comments, is there any other place that you think we need to call this part out?
I don't think so. The main place is in the docstring for the new stat_loggers
arg to AsyncLLM
(and also the add/remove methods but I would prefer to just omit those for now).
vllm/third_party/pynvml.py
Outdated
import os | ||
import string | ||
import sys | ||
import threading |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liuzijing2014 the formatting change is to an unrelated file so shouldn't be included in the same PR.
vllm/v1/engine/async_llm.py
Outdated
self.stat_loggers: dict[str, StatLoggerBase] = dict() | ||
if self.log_stats: | ||
if logger.isEnabledFor(logging.INFO): | ||
self.stat_loggers.append(LoggingStatLogger()) | ||
self.stat_loggers.append(PrometheusStatLogger(vllm_config)) | ||
if stat_loggers is not None: | ||
self.stat_loggers = stat_loggers | ||
else: | ||
setup_default_loggers(vllm_config, | ||
logger.isEnabledFor(logging.INFO), | ||
self.stat_loggers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest moving all of this logic into the setup function:
self.stat_loggers = setup_default_loggers(vllm_config, log_stats, stat_loggers)
Also:
- No need to pass in
logger.isEnabledFor(logging.INFO)
, that can be done inside the function - It would probably be cleaner to eliminate
self.log_stats
. Emptyself.stat_loggers
can serve that purpose (or possiblyNone
)
vllm/v1/metrics/loggers.py
Outdated
PrometheusStatLogger(vllm_config)) | ||
|
||
if enable_stats_log: | ||
stat_loggers[STANDARD_LOGGING_LOGGER_NAME] = (LoggingStatLogger()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stat_loggers[STANDARD_LOGGING_LOGGER_NAME] = (LoggingStatLogger()) | |
stat_loggers[STANDARD_LOGGING_LOGGER_NAME] = LoggingStatLogger() |
vllm/v1/metrics/loggers.py
Outdated
STANDARD_LOGGING_LOGGER_NAME = "logging" | ||
PROMETHEUS_LOGGING_LOGGER_NAME = "prometheus" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unnecessarily long names
STANDARD_LOGGING_LOGGER_NAME = "logging" | |
PROMETHEUS_LOGGING_LOGGER_NAME = "prometheus" | |
STANDARD_LOGGER_NAME = "logging" | |
PROMETHEUS_LOGGER_NAME = "prometheus" |
vllm/v1/engine/async_llm.py
Outdated
def add_logger(self, logger_name: str, logger: StatLoggerBase) -> None: | ||
if not self.log_stats: | ||
raise RuntimeError( | ||
"Stat logging is disabled. Set `disable_log_stats=False` " | ||
"argument to enable.") | ||
if logger_name in self.stat_loggers: | ||
raise KeyError(f"Logger with name {logger_name} already exists.") | ||
self.stat_loggers[logger_name] = logger | ||
|
||
def remove_logger(self, logger_name: str) -> None: | ||
if not self.log_stats: | ||
raise RuntimeError( | ||
"Stat logging is disabled. Set `disable_log_stats=False` " | ||
"argument to enable.") | ||
if logger_name not in self.stat_loggers: | ||
raise KeyError(f"Logger with name {logger_name} does not exist.") | ||
del self.stat_loggers[logger_name] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these methods really necessary if you can pass in the dict?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I add them here to keep the interface on parity as V0. Either way could work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would be to exclude them for now, since the API is different to V0 anyhow. We're trying to avoid as much unnecessary code in V1 as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @liuzijing2014 it looks good now, just have a couple more minor comments.
vllm/v1/engine/async_llm.py
Outdated
self.stat_loggers.append(LoggingStatLogger()) | ||
self.stat_loggers.append(PrometheusStatLogger(vllm_config)) | ||
self.stat_loggers = setup_default_loggers( | ||
vllm_config, self.log_stats, logger.isEnabledFor(logging.INFO), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the logger.isEnabledFor(logging.INFO)
check inside the method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@liuzijing2014 WDYT about this suggestion?
@njhill FYI
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@markmc @liuzijing2014 I think this is ready now?
86a1452
to
9f08bd5
Compare
Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Zijing Liu <[email protected]>
More general than passing sub-class types, requiring sub-class constructors to conform to a protocol. Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Zijing Liu <[email protected]>
Signed-off-by: Zijing Liu <[email protected]>
Signed-off-by: Nick Hill <[email protected]> Signed-off-by: Zijing Liu <[email protected]>
9f08bd5
to
303e7ef
Compare
) Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Nick Hill <[email protected]>
) Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Nick Hill <[email protected]>
) Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Agata Dobrzyniewicz <[email protected]>
) Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Mu Huai <[email protected]>
) Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Nick Hill <[email protected]> Signed-off-by: Yuqi Zhang <[email protected]>
) Signed-off-by: Zijing Liu <[email protected]> Signed-off-by: Mark McLoughlin <[email protected]> Signed-off-by: Nick Hill <[email protected]> Co-authored-by: Mark McLoughlin <[email protected]> Co-authored-by: Nick Hill <[email protected]> Signed-off-by: minpeter <[email protected]>
Content
Update V1 AsynLLM to take in custom logger that extends
StatLoggerBase
. This would help to achieve function on parity as V0 regarding customization support for logging.Follow V0 style on customizing loggers:
stat_loggers
frominit
.add_logger
andremove_logger
functions.Test Plan