Skip to content

Add OpenAI o3 & o4-mini #10065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

PeterDaveHello
Copy link
Contributor

@PeterDaveHello PeterDaveHello commented Apr 16, 2025

Title

Add OpenAI o3 & 4o-mini

Reference:

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • I have added a screenshot of my new test passing locally
  • My PR passes all unit tests on (make test-unit)[https://docs.litellm.ai/docs/extras/contributing_code]
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

🆕 New Feature

Changes

Added:

  • "o3"
  • "o3-2025-04-16"
  • "o4-mini"
  • "o4-mini-2025-04-16"

Copy link

vercel bot commented Apr 16, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 16, 2025 6:29pm

@PeterDaveHello
Copy link
Contributor Author

cc @krrishdholakia @ishaan-jaff

"max_tokens": 100000,
"max_input_tokens": 200000,
"max_output_tokens": 100000,
"input_cost_per_token": 0.00001,
Copy link
Contributor

@krrishdholakia krrishdholakia Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @PeterDaveHello can you please use scientific notation (e.g. 1e-6) and mention the supported_endpoints, modalities, output_modalities for these models?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly, I’m happy to help with that. Could you also please tell me when I should use scientific notation here?

@elabbarw
Copy link
Contributor

Hey guys, can you please include azure/ models as well? Always assume there's an Azure-hosted model available shortly after OpenAI 😄

@krrishdholakia krrishdholakia merged commit 5c078af into BerriAI:main Apr 16, 2025
5 checks passed
@PeterDaveHello PeterDaveHello deleted the OpenAI-o3-and-4o-mini branch April 16, 2025 19:40
@krrishdholakia
Copy link
Contributor

Hey @elabbarw good point - do you see the pricing for the azure models though?

@PeterDaveHello
Copy link
Contributor Author

Their list price should be the same.

@elabbarw
Copy link
Contributor

Prices have historically always been the same. So far i've got the 4.1 models live in Sweden Central and US regions. I'm expecting to see the new O models soon.

@jrkropp
Copy link

jrkropp commented Apr 16, 2025

Also ensuring that they support the 'reasoning_effort'. I added the models manually and get the error below when trying to use reasoning_effort. I'm creating a model called o4-mini-high which is technically just o4-mini with the reasoning set to high.


400: litellm.UnsupportedParamsError: openai does not support parameters: ['reasoning_effort'], for model=o4-mini. To drop these, set litellm.drop_params=True or for proxy:

litellm_settings: drop_params: true
.
If you want to use these params dynamically send allowed_openai_params=['reasoning_effort'] in your request.. Received Model Group=o4-mini-high
Available Model Group Fallbacks=None

@krrishdholakia
Copy link
Contributor

Ack @wagnerjt
Hey @jrkropp i think we need to update our o-series check (currently explicitly checking 'o1' and 'o3' in str)

@PeterDaveHello PeterDaveHello changed the title Add OpenAI o3 & 4o-mini Add OpenAI o3 & o4-mini Apr 17, 2025
@miguelwon
Copy link

Hi,
I'm having the error I show bellow with "o3", only when stream=True. My organization is verified and if I make a request using openai package it works ok. I'm working with the latest version 1.66.2

from litellm import completion
response = completion(
    model="o3",
    messages=[{"role":"user","content":"Hello"}],
    stream=True
)
for chunk in response:
    print(chunk)
---------------------------------------------------------------------------
BadRequestError                           Traceback (most recent call last)
File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/llms/openai/openai.py:724, in OpenAIChatCompletion.completion(self, model_response, timeout, optional_params, litellm_params, logging_obj, model, messages, print_verbose, api_key, api_base, api_version, dynamic_params, azure_ad_token, acompletion, logger_fn, headers, custom_prompt_dict, client, organization, custom_llm_provider, drop_params)
    723             else:
--> 724                 raise e
    725 except OpenAIError as e:

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/llms/openai/openai.py:607, in OpenAIChatCompletion.completion(self, model_response, timeout, optional_params, litellm_params, logging_obj, model, messages, print_verbose, api_key, api_base, api_version, dynamic_params, azure_ad_token, acompletion, logger_fn, headers, custom_prompt_dict, client, organization, custom_llm_provider, drop_params)
    606 elif stream is True and fake_stream is False:
--> 607     return self.streaming(
    608         logging_obj=logging_obj,
    609         headers=headers,
    610         data=data,
    611         model=model,
    612         api_base=api_base,
    613         api_key=api_key,
    614         api_version=api_version,
    615         timeout=timeout,
    616         client=client,
    617         max_retries=max_retries,
    618         organization=organization,
    619         stream_options=stream_options,
    620     )
    621 else:

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/llms/openai/openai.py:885, in OpenAIChatCompletion.streaming(self, logging_obj, timeout, data, model, api_key, api_base, api_version, organization, client, max_retries, headers, stream_options)
    875 logging_obj.pre_call(
    876     input=data["messages"],
    877     api_key=api_key,
   (...)
    883     },
    884 )
--> 885 headers, response = self.make_sync_openai_chat_completion_request(
    886     openai_client=openai_client,
    887     data=data,
    888     timeout=timeout,
    889     logging_obj=logging_obj,
    890 )
    892 logging_obj.model_call_details["response_headers"] = headers

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/litellm_core_utils/logging_utils.py:149, in track_llm_api_timing.<locals>.decorator.<locals>.sync_wrapper(*args, **kwargs)
    148 try:
--> 149     result = func(*args, **kwargs)
    150     return result

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/llms/openai/openai.py:471, in OpenAIChatCompletion.make_sync_openai_chat_completion_request(self, openai_client, data, timeout, logging_obj)
    470 else:
--> 471     raise e

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/llms/openai/openai.py:453, in OpenAIChatCompletion.make_sync_openai_chat_completion_request(self, openai_client, data, timeout, logging_obj)
    452 try:
--> 453     raw_response = openai_client.chat.completions.with_raw_response.create(
    454         **data, timeout=timeout
    455     )
    457     if hasattr(raw_response, "headers"):

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/openai/_legacy_response.py:364, in to_raw_response_wrapper.<locals>.wrapped(*args, **kwargs)
    362 kwargs["extra_headers"] = extra_headers
--> 364 return cast(LegacyAPIResponse[R], func(*args, **kwargs))

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/openai/_utils/_utils.py:279, in required_args.<locals>.inner.<locals>.wrapper(*args, **kwargs)
    278     raise TypeError(msg)
--> 279 return func(*args, **kwargs)

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/openai/resources/chat/completions/completions.py:914, in Completions.create(self, messages, model, audio, frequency_penalty, function_call, functions, logit_bias, logprobs, max_completion_tokens, max_tokens, metadata, modalities, n, parallel_tool_calls, prediction, presence_penalty, reasoning_effort, response_format, seed, service_tier, stop, store, stream, stream_options, temperature, tool_choice, tools, top_logprobs, top_p, user, web_search_options, extra_headers, extra_query, extra_body, timeout)
    913 validate_response_format(response_format)
--> 914 return self._post(
    915     "/chat/completions",
    916     body=maybe_transform(
    917         {
    918             "messages": messages,
    919             "model": model,
    920             "audio": audio,
    921             "frequency_penalty": frequency_penalty,
    922             "function_call": function_call,
    923             "functions": functions,
    924             "logit_bias": logit_bias,
    925             "logprobs": logprobs,
    926             "max_completion_tokens": max_completion_tokens,
    927             "max_tokens": max_tokens,
    928             "metadata": metadata,
    929             "modalities": modalities,
    930             "n": n,
    931             "parallel_tool_calls": parallel_tool_calls,
    932             "prediction": prediction,
    933             "presence_penalty": presence_penalty,
    934             "reasoning_effort": reasoning_effort,
    935             "response_format": response_format,
    936             "seed": seed,
    937             "service_tier": service_tier,
    938             "stop": stop,
    939             "store": store,
    940             "stream": stream,
    941             "stream_options": stream_options,
    942             "temperature": temperature,
    943             "tool_choice": tool_choice,
    944             "tools": tools,
    945             "top_logprobs": top_logprobs,
    946             "top_p": top_p,
    947             "user": user,
    948             "web_search_options": web_search_options,
    949         },
    950         completion_create_params.CompletionCreateParams,
    951     ),
    952     options=make_request_options(
    953         extra_headers=extra_headers, extra_query=extra_query, extra_body=extra_body, timeout=timeout
    954     ),
    955     cast_to=ChatCompletion,
    956     stream=stream or False,
    957     stream_cls=Stream[ChatCompletionChunk],
    958 )

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/openai/_base_client.py:1242, in SyncAPIClient.post(self, path, cast_to, body, options, files, stream, stream_cls)
   1239 opts = FinalRequestOptions.construct(
   1240     method="post", url=path, json_data=body, files=to_httpx_files(files), **options
   1241 )
-> 1242 return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/openai/_base_client.py:919, in SyncAPIClient.request(self, cast_to, options, remaining_retries, stream, stream_cls)
    917     retries_taken = 0
--> 919 return self._request(
    920     cast_to=cast_to,
    921     options=options,
    922     stream=stream,
    923     stream_cls=stream_cls,
    924     retries_taken=retries_taken,
    925 )

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/openai/_base_client.py:1023, in SyncAPIClient._request(self, cast_to, options, retries_taken, stream, stream_cls)
   1022     log.debug("Re-raising status error")
-> 1023     raise self._make_status_error_from_response(err.response) from None
   1025 return self._process_response(
   1026     cast_to=cast_to,
   1027     options=options,
   (...)
   1031     retries_taken=retries_taken,
   1032 )

BadRequestError: Error code: 400 - {'error': {'message': 'Your organization must be verified to stream this model. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization.', 'type': 'invalid_request_error', 'param': 'stream', 'code': 'unsupported_value'}}

During handling of the above exception, another exception occurred:

OpenAIError                               Traceback (most recent call last)
File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/main.py:1765, in completion(model, messages, timeout, temperature, top_p, n, stream, stream_options, stop, max_completion_tokens, max_tokens, modalities, prediction, audio, presence_penalty, frequency_penalty, logit_bias, user, reasoning_effort, response_format, seed, tools, tool_choice, logprobs, top_logprobs, parallel_tool_calls, deployment_id, extra_headers, functions, function_call, base_url, api_version, api_key, model_list, thinking, **kwargs)
   1759     logging.post_call(
   1760         input=messages,
   1761         api_key=api_key,
   1762         original_response=str(e),
   1763         additional_args={"headers": headers},
   1764     )
-> 1765     raise e
   1767 if optional_params.get("stream", False):
   1768     ## LOGGING

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/main.py:1738, in completion(model, messages, timeout, temperature, top_p, n, stream, stream_options, stop, max_completion_tokens, max_tokens, modalities, prediction, audio, presence_penalty, frequency_penalty, logit_bias, user, reasoning_effort, response_format, seed, tools, tool_choice, logprobs, top_logprobs, parallel_tool_calls, deployment_id, extra_headers, functions, function_call, base_url, api_version, api_key, model_list, thinking, **kwargs)
   1737 try:
-> 1738     response = openai_chat_completions.completion(
   1739         model=model,
   1740         messages=messages,
   1741         headers=headers,
   1742         model_response=model_response,
   1743         print_verbose=print_verbose,
   1744         api_key=api_key,
   1745         api_base=api_base,
   1746         acompletion=acompletion,
   1747         logging_obj=logging,
   1748         optional_params=optional_params,
   1749         litellm_params=litellm_params,
   1750         logger_fn=logger_fn,
   1751         timeout=timeout,  # type: ignore
   1752         custom_prompt_dict=custom_prompt_dict,
   1753         client=client,  # pass AsyncOpenAI, OpenAI client
   1754         organization=organization,
   1755         custom_llm_provider=custom_llm_provider,
   1756     )
   1757 except Exception as e:
   1758     ## LOGGING - log the original exception returned

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/llms/openai/openai.py:735, in OpenAIChatCompletion.completion(self, model_response, timeout, optional_params, litellm_params, logging_obj, model, messages, print_verbose, api_key, api_base, api_version, dynamic_params, azure_ad_token, acompletion, logger_fn, headers, custom_prompt_dict, client, organization, custom_llm_provider, drop_params)
    734     error_headers = getattr(error_response, "headers", None)
--> 735 raise OpenAIError(
    736     status_code=status_code,
    737     message=error_text,
    738     headers=error_headers,
    739     body=error_body,
    740 )

OpenAIError: Error code: 400 - {'error': {'message': 'Your organization must be verified to stream this model. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization.', 'type': 'invalid_request_error', 'param': 'stream', 'code': 'unsupported_value'}}

During handling of the above exception, another exception occurred:

BadRequestError                           Traceback (most recent call last)
Cell In[31], line 2
      1 from litellm import completion
----> 2 response = completion(
      3     model="o3",
      4     messages=[{"role":"user","content":"Hello"}],
      5     stream=True
      6 )
      7 for chunk in response:
      8     print(chunk)

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/utils.py:1247, in client.<locals>.wrapper(*args, **kwargs)
   1243 if logging_obj:
   1244     logging_obj.failure_handler(
   1245         e, traceback_exception, start_time, end_time
   1246     )  # DO NOT MAKE THREADED - router retry fallback relies on this!
-> 1247 raise e

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/utils.py:1125, in client.<locals>.wrapper(*args, **kwargs)
   1123         print_verbose(f"Error while checking max token limit: {str(e)}")
   1124 # MODEL CALL
-> 1125 result = original_function(*args, **kwargs)
   1126 end_time = datetime.datetime.now()
   1127 if "stream" in kwargs and kwargs["stream"] is True:

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/main.py:3150, in completion(model, messages, timeout, temperature, top_p, n, stream, stream_options, stop, max_completion_tokens, max_tokens, modalities, prediction, audio, presence_penalty, frequency_penalty, logit_bias, user, reasoning_effort, response_format, seed, tools, tool_choice, logprobs, top_logprobs, parallel_tool_calls, deployment_id, extra_headers, functions, function_call, base_url, api_version, api_key, model_list, thinking, **kwargs)
   3147     return response
   3148 except Exception as e:
   3149     ## Map to OpenAI Exception
-> 3150     raise exception_type(
   3151         model=model,
   3152         custom_llm_provider=custom_llm_provider,
   3153         original_exception=e,
   3154         completion_kwargs=args,
   3155         extra_kwargs=kwargs,
   3156     )

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py:2214, in exception_type(model, original_exception, custom_llm_provider, completion_kwargs, extra_kwargs)
   2212 if exception_mapping_worked:
   2213     setattr(e, "litellm_response_headers", litellm_response_headers)
-> 2214     raise e
   2215 else:
   2216     for error_type in litellm.LITELLM_EXCEPTION_TYPES:

File ~/opt/miniconda3/envs/dsta/lib/python3.10/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py:328, in exception_type(model, original_exception, custom_llm_provider, completion_kwargs, extra_kwargs)
    323 elif (
    324     "invalid_request_error" in error_str
    325     and "Incorrect API key provided" not in error_str
    326 ):
    327     exception_mapping_worked = True
--> 328     raise BadRequestError(
    329         message=f"{exception_provider} - {message}",
    330         llm_provider=custom_llm_provider,
    331         model=model,
    332         response=getattr(original_exception, "response", None),
    333         litellm_debug_info=extra_information,
    334         body=getattr(original_exception, "body", None),
    335     )
    336 elif (
    337     "Web server is returning an unknown error" in error_str
    338     or "The server had an error processing your request." in error_str
    339 ):
    340     exception_mapping_worked = True

BadRequestError: litellm.BadRequestError: OpenAIException - Your organization must be verified to stream this model. Please go to: https://platform.openai.com/settings/organization/general and click on Verify Organization.

@krrishdholakia
Copy link
Contributor

Hey @miguelwon that seems like an OpenAI error. litellm._turn_on_debug() should show you the raw request made

@miguelwon
Copy link

Thanks @krrishdholakia. It is working now.
Yes, very likely their side. I just did the organization verification and so perhaps some delay in permissions update.

@valda-z
Copy link

valda-z commented Apr 17, 2025

please can you fix it also for azure? because there is same logic in file o_series_transformation.py like for openai.

@krrishdholakia
Copy link
Contributor

Ack @valda-z Fix pushed on main

@valda-z
Copy link

valda-z commented Apr 18, 2025

Thanks! Any plans to public new version for pip?

@krrishdholakia
Copy link
Contributor

Yes - hopefully today

@valda-z
Copy link

valda-z commented Apr 18, 2025

Wow :) !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants