-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Pull Request: Update Hugging Face Integration for LLM Inference #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ialization in HuggingFaceProvider
WalkthroughThe changes introduce HuggingFace as a new provider for language model inference and embedding generation in the backend. This includes new provider classes, configuration options, environment variable samples, and updates to the agent manager logic to support selecting HuggingFace based on configuration or environment flags. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant AgentManager
participant HuggingFaceProvider
participant HuggingFace API
User->>AgentManager: Request model inference
AgentManager->>HuggingFaceProvider: Select provider (if use_huggingface)
HuggingFaceProvider->>HuggingFace API: POST prompt and parameters
HuggingFace API-->>HuggingFaceProvider: Return generated text
HuggingFaceProvider-->>AgentManager: Return response
AgentManager-->>User: Return generated text
sequenceDiagram
participant User
participant EmbeddingManager
participant HuggingFaceEmbeddingProvider
participant SentenceTransformer
User->>EmbeddingManager: Request text embedding
EmbeddingManager->>HuggingFaceEmbeddingProvider: Select provider (if use_huggingface)
HuggingFaceEmbeddingProvider->>SentenceTransformer: Encode text
SentenceTransformer-->>HuggingFaceEmbeddingProvider: Return embedding
HuggingFaceEmbeddingProvider-->>EmbeddingManager: Return embedding
EmbeddingManager-->>User: Return embedding
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
🧹 Nitpick comments (3)
apps/backend/.env.sample (1)
6-9
: Inconsistent environment variable naming convention.The environment variables use mixed naming conventions -
USE_HUGGINGFACE
andHF_API_KEY
are uppercase whilehf_inference_url
is lowercase. Consider using consistent naming for better maintainability.Apply this diff to standardize the naming:
-hf_inference_url="https://api-inference.huggingface.co/models/" +HF_INFERENCE_URL="https://api-inference.huggingface.co/models/"apps/backend/app/agent/providers/huggingface.py (2)
4-4
: Remove unused import.The
OpenAI
import is not used in this file and should be removed.Apply this diff:
-from openai import OpenAI
🧰 Tools
🪛 Ruff (0.11.9)
4-4:
openai.OpenAI
imported but unusedRemove unused import:
openai.OpenAI
(F401)
48-52
: Improve response validation with better error messages.The response validation is good but could provide more specific error messages for debugging.
Apply this diff:
if not response_data or not isinstance(response_data, list) or len(response_data) == 0: - raise ProviderError("Invalid response format from Hugging Face API") + raise ProviderError(f"Invalid response format from Hugging Face API: expected non-empty list, got {type(response_data)}") if "generated_text" not in response_data[0]: - raise ProviderError("Missing 'generated_text' in Hugging Face API response") + raise ProviderError(f"Missing 'generated_text' in Hugging Face API response. Available keys: {list(response_data[0].keys())}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
apps/backend/.env.sample
(1 hunks)apps/backend/app/agent/manager.py
(3 hunks)apps/backend/app/agent/providers/huggingface.py
(1 hunks)apps/backend/app/core/config.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
apps/backend/app/agent/providers/huggingface.py
4-4: openai.OpenAI
imported but unused
Remove unused import: openai.OpenAI
(F401)
🪛 Pylint (3.3.7)
apps/backend/app/agent/providers/huggingface.py
[error] 7-7: Attempted relative import beyond top-level package
(E0402)
[error] 8-8: Attempted relative import beyond top-level package
(E0402)
[refactor] 12-12: Too few public methods (1/2)
(R0903)
[refactor] 56-56: Too few public methods (1/2)
(R0903)
🔇 Additional comments (4)
apps/backend/app/agent/manager.py (2)
8-8
: LGTM: Clean import addition.The import statement properly adds the new HuggingFace provider classes.
31-31
:✅ Verification successful
Verify the default HuggingFace model availability.
The default model
"microsoft/Phi-3-mini-4k-instruct"
should be verified to ensure it's available via the HuggingFace Inference API.
🌐 Web query:
Is the model "microsoft/Phi-3-mini-4k-instruct" available on Hugging Face Inference API?
💡 Result:
Yes, the model "microsoft/Phi-3-mini-4k-instruct" is available on Hugging Face and can be accessed via the Hugging Face Inference API.
- The official Hugging Face model card details the model and explicitly lists "4K [HF]" under its available formats, indicating support directly through Hugging Face[1].
- The model is also included in announcement materials specifying its availability on Hugging Face, Azure AI Studio, and other platforms[5].
- Multiple technical resources and guides reference using the model with Hugging Face, covering both Transformers library loading and ONNX Runtime support[7][10].
Therefore, "microsoft/Phi-3-mini-4k-instruct" is not only downloadable but also ready for inference via the Hugging Face Inference API and Hugging Face Spaces[1][5][7].
Citations:
- 1: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
- 2: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf
- 3: https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-onnx
- 4: https://huggingface.co/spaces/way2mhemanth/microsoft-Phi-3-mini-4k-instruct
- 5: https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/
- 6: https://techcommunity.microsoft.com/blog/machinelearningblog/affordable-innovation-unveiling-the-pricing-of-phi-3-slms-on-models-as-a-service/4156495
- 7: https://replicate.com/microsoft/phi-3-mini-4k-instruct/readme
- 8: https://huggingface.co/Sreenington/Phi-3-mini-4k-instruct-AWQ
- 9: https://github.com/marketplace/models/azureml/Phi-3-mini-4k-instruct
- 10: https://github.com/amd/RyzenAI-SW/blob/main/example/llm/cpu/Phi_3_mini_4k_instruct.md
Default Hugging Face model availability confirmed
The model"microsoft/Phi-3-mini-4k-instruct"
is hosted on Hugging Face (https://huggingface.co/microsoft/Phi-3-mini-4k-instruct) and fully supported by the Inference API. No changes required.apps/backend/app/agent/providers/huggingface.py (2)
15-17
: LGTM: Proper API key validation.Good practice to validate the API key presence during initialization and raise a clear error.
65-66
: LGTM: Proper async embedding generation.Good use of thread pool for CPU-intensive embedding generation and proper conversion to list format.
use_huggingface: bool = False | ||
hf_api_key: Optional[str] | ||
hf_inference_url: str = "https://router.huggingface.co/hf-inference/models/" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
URL inconsistency between config and environment sample.
The default hf_inference_url
in config.py differs from the one in .env.sample
:
- Config:
"https://router.huggingface.co/hf-inference/models/"
- .env.sample:
"https://api-inference.huggingface.co/models/"
This inconsistency could lead to configuration issues. Please verify which URL is correct and ensure consistency across both files.
🌐 Web query:
What is the correct base URL for Hugging Face Inference API in 2024?
💡 Result:
In 2024, the base URL for Hugging Face's Inference API is:
https://api-inference.huggingface.co/models/{model_id}
Replace {model_id}
with the identifier of the model you wish to use. For example, to use the meta-llama/Meta-Llama-3-8B-Instruct
model, the URL would be:
https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct
Ensure you include your Hugging Face API token in the request headers for authentication. For detailed information on using the Inference API, refer to the Hugging Face documentation. (huggingface.co)
Citations:
Update default Hugging Face Inference API URL
Please align the default hf_inference_url
in apps/backend/app/core/config.py
with the official Hugging Face Inference API endpoint (https://api-inference.huggingface.co/models/
), as confirmed by the docs. Also ensure .env.sample
and any related documentation use the same URL.
Affected location:
apps/backend/app/core/config.py
, line defininghf_inference_url
Suggested diff:
- hf_inference_url: str = "https://router.huggingface.co/hf-inference/models/"
+ hf_inference_url: str = "https://api-inference.huggingface.co/models/"
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
use_huggingface: bool = False | |
hf_api_key: Optional[str] | |
hf_inference_url: str = "https://router.huggingface.co/hf-inference/models/" | |
use_huggingface: bool = False | |
hf_api_key: Optional[str] | |
hf_inference_url: str = "https://api-inference.huggingface.co/models/" |
🤖 Prompt for AI Agents
In apps/backend/app/core/config.py around lines 18 to 21, the default
hf_inference_url is set to "https://router.huggingface.co/hf-inference/models/",
which is inconsistent with the official Hugging Face Inference API URL and the
.env.sample file. Update the hf_inference_url default value to
"https://api-inference.huggingface.co/models/" to match the official endpoint.
Also verify and update the .env.sample and any related documentation to use this
same URL for consistency.
use_hf = kwargs.get("use_huggingface", os.getenv("USE_HUGGINGFACE") == "true") | ||
if use_hf: | ||
model = kwargs.get("model", "microsoft/Phi-3-mini-4k-instruct") | ||
return HuggingFaceProvider(model_name=model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve boolean environment variable parsing.
The current implementation only accepts lowercase "true" which is fragile. Consider using more robust boolean parsing that accepts common variations.
Apply this diff to improve boolean parsing:
-use_hf = kwargs.get("use_huggingface", os.getenv("USE_HUGGINGFACE") == "true")
+use_hf = kwargs.get("use_huggingface", os.getenv("USE_HUGGINGFACE", "false").lower() in ("true", "1", "yes", "on"))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
use_hf = kwargs.get("use_huggingface", os.getenv("USE_HUGGINGFACE") == "true") | |
if use_hf: | |
model = kwargs.get("model", "microsoft/Phi-3-mini-4k-instruct") | |
return HuggingFaceProvider(model_name=model) | |
use_hf = kwargs.get( | |
"use_huggingface", | |
os.getenv("USE_HUGGINGFACE", "false") | |
.lower() in ("true", "1", "yes", "on") | |
) | |
if use_hf: | |
model = kwargs.get("model", "microsoft/Phi-3-mini-4k-instruct") | |
return HuggingFaceProvider(model_name=model) |
🤖 Prompt for AI Agents
In apps/backend/app/agent/manager.py around lines 29 to 32, the environment
variable USE_HUGGINGFACE is currently checked by comparing its value to the
lowercase string "true", which is fragile. Improve this by normalizing the
environment variable value to lowercase and checking if it matches any common
truthy strings like "true", "1", or "yes". Update the use_hf assignment to parse
the environment variable more robustly to handle these variations.
use_hf = kwargs.get("use_huggingface", os.getenv("USE_HUGGINGFACE") == "true") | ||
if use_hf: | ||
model = kwargs.get("embedding_model", "sentence-transformers/all-MiniLM-L6-v2") | ||
return HuggingFaceEmbeddingProvider(embedding_model=model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consistent boolean parsing needed.
Same boolean parsing issue exists here as in the AgentManager. Apply the same improvement for consistency.
Apply this diff:
-use_hf = kwargs.get("use_huggingface", os.getenv("USE_HUGGINGFACE") == "true")
+use_hf = kwargs.get("use_huggingface", os.getenv("USE_HUGGINGFACE", "false").lower() in ("true", "1", "yes", "on"))
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
use_hf = kwargs.get("use_huggingface", os.getenv("USE_HUGGINGFACE") == "true") | |
if use_hf: | |
model = kwargs.get("embedding_model", "sentence-transformers/all-MiniLM-L6-v2") | |
return HuggingFaceEmbeddingProvider(embedding_model=model) | |
use_hf = kwargs.get( | |
"use_huggingface", | |
os.getenv("USE_HUGGINGFACE", "false").lower() in ("true", "1", "yes", "on"), | |
) | |
if use_hf: | |
model = kwargs.get("embedding_model", "sentence-transformers/all-MiniLM-L6-v2") | |
return HuggingFaceEmbeddingProvider(embedding_model=model) |
🤖 Prompt for AI Agents
In apps/backend/app/agent/manager.py around lines 64 to 67, the boolean parsing
for the "use_huggingface" flag is inconsistent and may cause errors. Update the
code to parse the environment variable "USE_HUGGINGFACE" using a consistent
method that correctly interprets string values like "true" or "false" as
booleans, matching the approach used in AgentManager. This ensures reliable and
uniform boolean handling across the codebase.
response = requests.post( | ||
f"{os.getenv('HF_INFERENCE_URL')}{self._model_name}", | ||
headers=headers, | ||
json=payload | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add timeout to HTTP request for security and reliability.
The HTTP request lacks a timeout parameter, which could lead to hanging requests and potential security issues.
Apply this diff:
response = requests.post(
f"{os.getenv('HF_INFERENCE_URL')}{self._model_name}",
headers=headers,
- json=payload
+ json=payload,
+ timeout=30
)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
response = requests.post( | |
f"{os.getenv('HF_INFERENCE_URL')}{self._model_name}", | |
headers=headers, | |
json=payload | |
) | |
response = requests.post( | |
f"{os.getenv('HF_INFERENCE_URL')}{self._model_name}", | |
headers=headers, | |
json=payload, | |
timeout=30 | |
) |
🤖 Prompt for AI Agents
In apps/backend/app/agent/providers/huggingface.py around lines 41 to 45, the
requests.post call lacks a timeout parameter, which can cause the request to
hang indefinitely. Add a timeout argument to the requests.post call with a
reasonable value (e.g., timeout=10) to ensure the request fails gracefully if
the server does not respond in time.
self._api_key = api_key or os.getenv("HF_API_KEY") | ||
if not self._api_key: | ||
raise ProviderError("Hugging Face API key is missing") | ||
self._model = SentenceTransformer(embedding_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Add error handling for SentenceTransformer model loading.
The SentenceTransformer model loading could fail (network issues, invalid model name) but there's no explicit error handling during initialization.
Apply this diff:
-self._model = SentenceTransformer(embedding_model)
+try:
+ self._model = SentenceTransformer(embedding_model)
+except Exception as e:
+ raise ProviderError(f"Failed to load SentenceTransformer model '{embedding_model}': {e}") from e
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
self._model = SentenceTransformer(embedding_model) | |
try: | |
self._model = SentenceTransformer(embedding_model) | |
except Exception as e: | |
raise ProviderError( | |
f"Failed to load SentenceTransformer model '{embedding_model}': {e}" | |
) from e |
🤖 Prompt for AI Agents
In apps/backend/app/agent/providers/huggingface.py at line 61, the
SentenceTransformer model loading lacks error handling, which can cause
unhandled exceptions if loading fails. Wrap the SentenceTransformer
initialization in a try-except block to catch exceptions such as network errors
or invalid model names, and handle them appropriately by logging the error or
raising a custom exception with a clear message.
opts = { | ||
"temperature": generation_args.get("temperature", 0.7), | ||
"top_p": generation_args.get("top_p", 0.9), | ||
"max_tokens": generation_args.get("max_length", 20000), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Parameter name inconsistency.
The parameter is named max_length
in the generation args but max_tokens
in the options dictionary. This inconsistency could cause confusion.
Apply this diff for consistency:
-"max_tokens": generation_args.get("max_length", 20000),
+"max_tokens": generation_args.get("max_tokens", 20000),
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
"max_tokens": generation_args.get("max_length", 20000), | |
"max_tokens": generation_args.get("max_tokens", 20000), |
🤖 Prompt for AI Agents
In apps/backend/app/agent/providers/huggingface.py at line 24, the parameter key
in the options dictionary is named "max_tokens" while the generation_args uses
"max_length". To fix this inconsistency, rename the key "max_tokens" to
"max_length" in the options dictionary to match the parameter name used in
generation_args, ensuring consistent naming throughout the code.
} | ||
# Send the request to the Hugging Face Inference API | ||
response = requests.post( | ||
f"{os.getenv('HF_INFERENCE_URL')}{self._model_name}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Environment variable name inconsistency.
The code uses HF_INFERENCE_URL
but the config and .env.sample use different variations. This needs to be consistent across all files.
🏁 Script executed:
#!/bin/bash
# Check for environment variable usage consistency
echo "Checking HF_INFERENCE_URL usage:"
rg -n "HF_INFERENCE_URL|hf_inference_url" --type py
echo -e "\nChecking .env files:"
fd -e env -e sample | xargs cat | grep -i inference
Length of output: 476
🏁 Script executed:
#!/bin/bash
# Display the contents of config.py to verify how hf_inference_url is defined and its env var mapping
sed -n '1,200p' apps/backend/app/core/config.py
Length of output: 2000
#!/bin/bash
# Check for .env.sample for inference URL variable definitions
echo "Checking for .env.sample:"
if [[ -f .env.sample ]]; then
echo ".env.sample contents (first 200 lines):"
sed -n '1,200p' .env.sample
else
echo ".env.sample not found at repository root"
fi
echo -e "\nSearching all files (including hidden) for 'inference_url':"
grep -R -n "inference_url" . --exclude-dir=.git --exclude-dir=node_modules
#!/bin/bash
# Locate and display any .env, .env.sample or similar files to verify inference URL vars
echo "Searching for environment files:"
find . -type f \( -name ".env.sample" -o -name ".env" -o -name ".env.*" \) \
! -path "./.git/*" ! -path "./node_modules/*" \
-print \
-exec echo "---- Contents of {} ----" \; \
-exec sed -n '1,200p' {} \; \
-exec echo "" \;
🏁 Script executed:
#!/bin/bash
# Locate and display any .env, .env.sample, or related environment files to verify variable names
find . -type f \( -iname ".env.sample" -o -iname ".env" -o -iname ".env.*" \) \
-not -path "./.git/*" -not -path "./node_modules/*" | while read file; do
echo "---- Contents of: $file ----"
sed -n '1,200p' "$file"
echo
done
Length of output: 806
Standardize HF_INFERENCE_URL across config, .env.sample, and provider
The backend config defines a hf_inference_url
setting (env var HF_INFERENCE_URL
), but the .env.sample
uses lowercase hf_inference_url
and the provider directly calls os.getenv
. Let’s align everything:
• apps/backend/.env.sample
- Rename the key to
HF_INFERENCE_URL
to match Pydantic’s expected env var.
• apps/backend/app/agent/providers/huggingface.py - Import and use
settings.hf_inference_url
instead ofos.getenv('HF_INFERENCE_URL')
so all env var logic is centralized.
Locations to update:
apps/backend/.env.sample
apps/backend/app/agent/providers/huggingface.py
Suggested diffs:
--- a/apps/backend/.env.sample
+++ b/apps/backend/.env.sample
@@ # HuggingFace Configuration
-USE_HUGGINGFACE=false
-HF_API_KEY=""
-hf_inference_url="https://api-inference.huggingface.co/models/"+ echo
+USE_HUGGINGFACE=false
+HF_API_KEY=""
+HF_INFERENCE_URL="https://api-inference.huggingface.co/models/"
--- a/apps/backend/app/agent/providers/huggingface.py
+++ b/apps/backend/app/agent/providers/huggingface.py
@@
-from os import getenv
+from core.config import settings
@@ def _build_url(self) -> str:
- return f"{os.getenv('HF_INFERENCE_URL')}{self._model_name}"
+ return f"{settings.hf_inference_url}{self._model_name}"
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In apps/backend/app/agent/providers/huggingface.py at line 42 and
apps/backend/.env.sample, standardize the environment variable name for the
inference URL to HF_INFERENCE_URL (uppercase) to match the backend config.
Update the .env.sample file to rename the key to HF_INFERENCE_URL, and in
huggingface.py replace direct os.getenv calls with importing and using
settings.hf_inference_url to centralize environment variable access and ensure
consistency.
Thanks for the PR and feature addition. I'll discuss this with the team whether we'll be implementing this feature or not. |
Overview
This PR implements the Hugging Face provider for text generation using the Hugging Face Inference API. It includes:
HuggingFaceProvider
class that handles text generation with configurable parameters.HuggingFaceEmbeddingProvider
class for text embeddings using SentenceTransformer.Implementation Details
HF_API_KEY
environment variable.USE_HUGGINGFACE
environment variable totrue
if you intend to use a HuggingFace model.Testing
Please verify the implementation works with:
Dependencies
Added sentence_transformers package for embeddings. Hence, run the following command if you intend to use HuggingFace.
Summary by CodeRabbit
New Features
Chores