Skip to content

server: Enable mtmd in llama-server /completion endpoint #14016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

92MING
Copy link

@92MING 92MING commented Jun 4, 2025

ref: #13872

Currently passing media(image/audio) to mtmd is only supported under chat/completion in llama-server.
It is still necessary for allowing mtmd in /completion endpoint, since /completion can have more freedom modification in prompt template(e.g. prefix), or even no template for long article completion.

This PR added an extra field medias(list[object]) under /completion, and each media should contains 2 fields:

  • type: audio/image(image_url as alias), other unknown types will be ignored
  • data: url/base64 of the image/audio.

Then, make sure you have the same number of <__media__> tag in your prompt to label the position.
Here is an example:

data = {
        'stream': False,
        'top_p': 0.95,
        'temperature': 0.8,
        'top_k': 40,
        'prompt': '<start_of_turn>user\nAnalyze the image and provide a short description\n<__media__><end_of_turn>\n<start_of_turn>model\n',
        'medias': [
            {
                'type': 'image',
                'data': <your img_b64 or url>,
            }
        ],
    }
response = requests.post('http://localhost:8080/completion', json=data)

Comment on lines +4341 to +4359
if (medias.is_array()) {
for (auto & m : medias) {
std::string type = json_value(m, "type", std::string());
std::string data = json_value(m, "data", std::string());
if (type.empty() || data.empty()) {
continue;
}
if (type == "image_url" || type == "image" || type == "img") {
if (!opt.allow_image) {
throw std::runtime_error("image input is not supported - hint: if this is unexpected, you may need to provide the mmproj");
}
if (string_starts_with(data, "http")) {
// download remote image
common_remote_params params;
params.headers.push_back("User-Agent: llama.cpp/" + build_info);
params.max_size = 1024 * 1024 * 10; // 10MB
params.timeout = 10; // seconds
SRV_INF("downloading image from '%s'\n", data.c_str());
auto res = common_remote_get_content(data, params);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of duplicating this whole code block, extract it to a general function and reuse it in /chat/completion and /completion. DRY code principle

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants