-
Notifications
You must be signed in to change notification settings - Fork 501
Description
Summary
The goal of this API is to expose a scalable, distributed video generation service for xDiT using models such as CogVideoX, ConsisID, and Latte. The API allows clients to submit text prompts with optional generation parameters and receive video outputs either as a file path or base64-encoded data.
Motivation
The API is designed to enhance AIBrix's ability to generate image and video, based on an example engine xDiT :
- Support both disk-based and memory-based outputs.
- Scale across multiple GPUs via Ray remote workers.
- Provide a simple OpenAI-style endpoint for common video generation.
- Allow future extensibility for multi-stage pipelines or DAG-based workflows.
Proposed Change
POST /generatevideo
Description: Generates a video from a text prompt.
Content-Type: application/json
Authentication: TBD (future optional API key support)
Request Model
{
"prompt": "A fox running through a forest",
"num_inference_steps": 50,
"num_frames": 17,
"seed": 42,
"cfg": 7.5,
"save_disk_path": "output/",
"height": 1024,
"width": 1024,
"fps": 8
}
curl request example with disk output
curl -X POST http://127.0.0.1:6000/generatevideo \
-H "Content-Type: application/json" \
-d '{
"prompt": "A fox running through a forest",
"num_frames": 17,
"fps": 12,
"save_disk_path": "output/"
}'
curl request with base64 output
curl -X POST http://127.0.0.1:6000/generatevideo \
-H "Content-Type: application/json" \
-d '{
"prompt": "A fox running through a forest",
"num_frames": 17,
"fps": 12,
"save_disk_path": "output/"
}'
| Field | Type | Default | Description |
|---|---|---|---|
| prompt | string | N/A | Text prompt describing the video content. Required. |
| num_inference_steps | int | 50 | Number of denoising steps in the diffusion process. |
| num_frames | int | 17 | Number of frames in the video. |
| seed | int | 42 | RNG seed for reproducibility. |
| cfg | float | 7.5 | Guidance scale for the diffusion model. |
| save_disk_path | string | None | If provided, video will be saved to path; otherwise, returned as base64. |
| height | int | 1024 | Video frame height in pixels. |
| width | int | 1024 | Video frame width in pixels. |
| fps | int | 8 | Frames per second of output video. |
Response
-
Synchronous Response
- Response with path provided
{ "message": "Video generated successfully", "elapsed_time": "12.34 sec", "output": "output/generated_video_20250922-141500.mp4", "save_to_disk": true }- Response without path provided
{ "message": "Video generated successfully", "elapsed_time": "12.34 sec", "output": "<base64-encoded-video>", "save_to_disk": false, "format": "mp4" } -
Asynchronous Response
- TBD
Error Handling
| HTTP Status | Condition | Response |
|---|---|---|
| 400 | Missing or invalid prompt/parameters | {"detail": "Prompt cannot be empty"} |
| 500 | Model execution error | {"detail": "Error generating video: <error_message>"} |
Alternatives Considered
ComfyUI style workflow request like:
POST /prompt
Content-Type: application/json
{
"1": {
"class_type": "TextPrompt",
"inputs": {
"text": "A fox running through a forest"
}
},
"2": {
"class_type": "CogWidowXVideo",
"inputs": {
"prompt": ["1", 0],
"duration": 5,
"fps": 12,
"width": 720,
"height": 480,
"seed": 123
}
}
}
This forces ComfyUI to be deployed along with AIBrix.
Deploying along with ComfyUI
ComfyUI could be used as a frontend client that connects to third-party/service API, see link1 and link2.
Currently, ComfyUI runs/manages model to run in a single instance and model management could be found here. To make it support distributed environment, we either need to update ComfyUI so it supports distributed serving (through library like xfuser) or we let ComfyUI to call backend services through API (API node, see below) and the backend services could deploying large model distributedly, and/or have multiple replicas serving requests. I found one project that attempted to do distributed ComfyUI here.
In order to adapt to ComfyUI, we could first enable this API as a service and forward ComfyUI's request through customized external API using ComfyUI's API node. It would look somewhat like the following:
