server: Enable mtmd in llama-server `/completion` endpoint #14016

92MING · 2025-06-04T19:33:10Z

Currently passing media(image/audio) to mtmd is only supported under chat/completion in llama-server.
It is still necessary for allowing mtmd in /completion endpoint, since /completion can have more freedom modification in prompt template(e.g. prefix), or even no template for long article completion.

This PR added an extra field medias(list[object]) under /completion, and each media should contains 2 fields:

type: audio/image(image_url as alias), other unknown types will be ignored
data: url/base64 of the image/audio.

Then, make sure you have the same number of <__media__> tag in your prompt to label the position.
Here is an example:

data = {
        'stream': False,
        'top_p': 0.95,
        'temperature': 0.8,
        'top_k': 40,
        'prompt': '<start_of_turn>user\nAnalyze the image and provide a short description\n<__media__><end_of_turn>\n<start_of_turn>model\n',
        'medias': [
            {
                'type': 'image',
                'data': <your img_b64 or url>,
            }
        ],
    }
response = requests.post('http://localhost:8080/completion', json=data)

ngxson · 2025-06-04T19:50:12Z

tools/server/server.cpp

+        if (medias.is_array()) {
+            for (auto & m : medias) {
+                std::string type = json_value(m, "type", std::string());
+                std::string data = json_value(m, "data", std::string());
+                if (type.empty() || data.empty()) {
+                    continue;
+                }
+                if (type == "image_url" || type == "image" || type == "img") {
+                    if (!opt.allow_image) {
+                        throw std::runtime_error("image input is not supported - hint: if this is unexpected, you may need to provide the mmproj");
+                    }
+                    if (string_starts_with(data, "http")) {
+                        // download remote image
+                        common_remote_params params;
+                        params.headers.push_back("User-Agent: llama.cpp/" + build_info);
+                        params.max_size = 1024 * 1024 * 10; // 10MB
+                        params.timeout  = 10; // seconds
+                        SRV_INF("downloading image from '%s'\n", data.c_str());
+                        auto res = common_remote_get_content(data, params);


Instead of duplicating this whole code block, extract it to a general function and reuse it in /chat/completion and /completion. DRY code principle

ThinkThinkSyn added 6 commits June 5, 2025 02:22

add mm support for /completion

0dac7c2

fix url bug

2600518

fix url bug

c126200

fix data bug

9dc3dc7

fix data bug

490af6e

enable mtmd in completion

38b20c5

92MING requested a review from ngxson as a code owner June 4, 2025 19:33

github-actions bot added examples server labels Jun 4, 2025

ngxson requested changes Jun 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server: Enable mtmd in llama-server `/completion` endpoint #14016

server: Enable mtmd in llama-server `/completion` endpoint #14016

Uh oh!

92MING commented Jun 4, 2025

Uh oh!

ngxson Jun 4, 2025

Uh oh!

Uh oh!

server: Enable mtmd in llama-server /completion endpoint #14016

Are you sure you want to change the base?

server: Enable mtmd in llama-server /completion endpoint #14016

Uh oh!

Conversation

92MING commented Jun 4, 2025

Uh oh!

ngxson Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

server: Enable mtmd in llama-server `/completion` endpoint #14016

server: Enable mtmd in llama-server `/completion` endpoint #14016