You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Release 0.7 is a major release focused on completing the transition to OpenAI API conformance, introducing comprehensive observability metrics, and significant API cleanup. This release removes the fine-tuning API, completes the FastAPI router migration, removes legacy providers (TGI, HuggingFace), renames core concepts for clarity, and adds structured logging via structlog.
10
+
11
+
### Highlights
12
+
13
+
-**Agents API renamed to Responses API** aligning with OpenAI naming ([#5195](https://github.com/llamastack/llama-stack/pull/5195))
14
+
-**Reasoning output support** in Responses API ([#5206](https://github.com/llamastack/llama-stack/pull/5206))
15
+
-**Comprehensive observability metrics** for API, inference, and vector IO ([#5201](https://github.com/llamastack/llama-stack/pull/5201), [#5320](https://github.com/llamastack/llama-stack/pull/5320), [#5096](https://github.com/llamastack/llama-stack/pull/5096))
16
+
-**Structured logging via structlog** with key-value output ([#5215](https://github.com/llamastack/llama-stack/pull/5215))
17
+
-**Inline neural rerank for RAG** without external services ([#4877](https://github.com/llamastack/llama-stack/pull/4877))
18
+
-**Inline Docling provider** for structure-aware PDF parsing ([#5049](https://github.com/llamastack/llama-stack/pull/5049))
19
+
-**Infinispan vector-io provider** for distributed vector storage ([#4839](https://github.com/llamastack/llama-stack/pull/4839))
20
+
-**Connector API promoted to v1beta** ([#5129](https://github.com/llamastack/llama-stack/pull/5129))
21
+
-**FastAPI router migration complete** with `@webmethod` removal ([#5248](https://github.com/llamastack/llama-stack/pull/5248))
22
+
-**Performance**: lazy-loading of torch, numpy, faiss, and braintrust to reduce startup memory ([#5116](https://github.com/llamastack/llama-stack/pull/5116), [#5118](https://github.com/llamastack/llama-stack/pull/5118), [#5078](https://github.com/llamastack/llama-stack/pull/5078))
23
+
24
+
### Breaking Changes
25
+
26
+
| Change | Type | PR |
27
+
|--------|------|-----|
28
+
| Fine-tuning API removed | Hard |[#5104](https://github.com/llamastack/llama-stack/pull/5104)|
29
+
|`meta-reference` providers renamed to `builtin`| Hard |[#5131](https://github.com/llamastack/llama-stack/pull/5131)|
30
+
|`knowledge_search` renamed to `file_search`| Hard |[#5186](https://github.com/llamastack/llama-stack/pull/5186)|
31
+
| Agents API renamed to Responses API | Hard |[#5195](https://github.com/llamastack/llama-stack/pull/5195)|
32
+
|`tool_groups` removed from public API | Hard |[#4997](https://github.com/llamastack/llama-stack/pull/4997)|
33
+
| TGI and HuggingFace providers removed | Hard |[#5333](https://github.com/llamastack/llama-stack/pull/5333)|
34
+
|`register`/`unregister` model endpoints removed | Hard |[#5341](https://github.com/llamastack/llama-stack/pull/5341)|
35
+
|`@webmethod` decorator removed | Hard |[#5248](https://github.com/llamastack/llama-stack/pull/5248)|
36
+
|`rag-runtime` provider renamed to `file-search`| Hard |[#5187](https://github.com/llamastack/llama-stack/pull/5187)|
37
+
| Duplicate `dataset_id` parameter removed | Hard |[#4849](https://github.com/llamastack/llama-stack/pull/4849)|
38
+
|`/files/{file_id}` GET response unified | Hard |[#5154](https://github.com/llamastack/llama-stack/pull/5154)|
39
+
| OpenAI API schema transforms | Hard |[#5166](https://github.com/llamastack/llama-stack/pull/5166)|
40
+
|`starter-gpu` distribution removed | Hard |[#5279](https://github.com/llamastack/llama-stack/pull/5279)|
41
+
|`sentence_transformers``trust_remote_code` defaults to `False`| Behavior |[#4602](https://github.com/llamastack/llama-stack/pull/4602)|
42
+
43
+
See the [full release notes](docs/releases/RELEASE_NOTES_0.7.md) for migration instructions and detailed upgrade guide.
Release 0.5 brings significant improvements to API consistency, OpenAI conformance, provider capabilities, and a major architectural refactoring of all APIs to use FastAPI routers.
0 commit comments