Description
Describe the bug
Attempting to use of the LoRA adapters I have created for meta-llama/Llama-3.2-1B I get an error that suggests the model is not supported with LoRa however the documentation suggests otherwise
Organization | Base Model Name | Base Model String | Quantization |
---|---|---|---|
Meta | Llama 3.2 1B Instruct | meta-llama/Llama-3.2-1B-Instruct | FP8 |
Meta | Llama 3.2 3B Instruct | meta-llama/Llama-3.2-3B-Instruct | FP8 |
Meta | Llama 3.1 8B Instruct | meta-llama/Meta-Llama-3.1-8B-Instruct | FP8 |
Meta | Llama 3.1 70B Instruct | meta-llama/Meta-Llama-3.1-70B-Instruct | FP8 |
Alibaba | Qwen2.5 14B Instruct | Qwen/Qwen2.5-14B-Instruct | FP8 |
Alibaba | Qwen2.5 72B Instruct | Qwen/Qwen2.5-72B-Instruct | FP8 |
To Reproduce
import os
from together import Together
client = Together(api_key = "_key")
user_prompt = "debate the pros and cons of AI"
response = client.chat.completions.create(
model="$account/Llama-3.2-1B-Instruct-Test2031-60596f98",
messages=[
{
"role": "user",
"content": user_prompt,
}
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)
Expected behavior
A valid prediction should be returned.
What I get instead is the following -> InvalidRequestError: Error code: 400 - {"message": "meta-llama/Llama-3.2-1B is not supported for LORA", "type_": "invalid_request_error", "param": "model", "code": "lora_model"}
together library version -> 1.3.10
python version -> 3.11
I have also noticed in the documentation it is suggested to use "https://api.together.xyz/v1/completions" if requests are made using curl - I would like to be sure that this is actually the expected behaviour or we should be using "https://api.together.xyz/v1/chat/completions" which is the endpoint we use for all the other instruct models