Skip to content

Commit 31a54fb

Browse files
authored
Merge pull request #28 from codelion/feat-hallucination-detector
Feat hallucination detector
2 parents 4b30a10 + 707be0b commit 31a54fb

File tree

3 files changed

+787
-1
lines changed

3 files changed

+787
-1
lines changed

README.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,56 @@ The system combines three key components:
115115

116116
## Adaptive Classification with LLMs
117117

118+
### Hallucination Detector
119+
120+
The adaptive classifier can detect hallucinations in language model outputs, especially in Retrieval-Augmented Generation (RAG) scenarios. Despite incorporating external knowledge sources, LLMs often still generate content that isn't supported by the provided context. Our hallucination detector identifies when a model's output contains information that goes beyond what's present in the source material.
121+
122+
The classifier categorizes text into:
123+
124+
- **HALLUCINATED**: Output contains information not supported by or contradictory to the provided context
125+
- **NOT_HALLUCINATED**: Output is faithfully grounded in the provided context
126+
127+
Our hallucination detector has been trained and evaluated on the RAGTruth benchmark, which provides a standardized dataset for assessing hallucination detection across different task types:
128+
129+
#### Performance Across Tasks
130+
131+
| Task Type | Precision | Recall | F1 Score |
132+
|----------------|-----------|--------|----------|
133+
| QA | 35.50% | 45.11% | 39.74% |
134+
| Summarization | 22.18% | 96.91% | 36.09% |
135+
| Data-to-Text | 65.00% | 100.0% | 78.79% |
136+
| **Overall** | **40.89%**| **80.68%** | **51.54%** |
137+
138+
The detector shows particularly high recall (80.68% overall), making it effective at catching potential hallucinations, with strong performance on data-to-text generation tasks. The adaptive nature of the classifier means it continues to improve as it processes more examples, making it ideal for production environments where user feedback can be incorporated.
139+
140+
```python
141+
from adaptive_classifier import AdaptiveClassifier
142+
143+
# Load the hallucination detector
144+
detector = AdaptiveClassifier.from_pretrained("adaptive-classifier/llm-hallucination-detector")
145+
146+
# Detect hallucinations in RAG output
147+
context = "France is a country in Western Europe. Its capital is Paris. The population of France is about 67 million people."
148+
query = "What is the capital of France and its population?"
149+
response = "The capital of France is Paris. The population is 70 million."
150+
151+
# Format input as expected by the model
152+
input_text = f"Context: {context}\nQuestion: {query}\nAnswer: {response}"
153+
154+
# Get hallucination prediction
155+
prediction = detector.predict(input_text)
156+
# Returns: [('HALLUCINATED', 0.72), ('NOT_HALLUCINATED', 0.28)]
157+
158+
# Example handling logic
159+
if prediction[0][0] == 'HALLUCINATED' and prediction[0][1] > 0.6:
160+
print("Warning: Response may contain hallucinations")
161+
# Implement safeguards: request human review, add disclaimer, etc.
162+
```
163+
164+
This system can be integrated into RAG pipelines as a safety layer, LLM evaluation frameworks, or content moderation systems. The ability to detect hallucinations helps build more trustworthy AI systems, particularly for applications in domains like healthcare, legal, finance, and education where factual accuracy is critical.
165+
166+
The detector can be easily fine-tuned on domain-specific data, making it adaptable to specialized use cases where the definition of hallucination may differ from general contexts.
167+
118168
### LLM Configuration Optimization
119169

120170
The adaptive classifier can also be used to predict optimal configurations for Language Models. Our research shows that model configurations, particularly temperature settings, can significantly impact response quality. Using the adaptive classifier, we can automatically predict the best temperature range for different types of queries:
@@ -223,6 +273,8 @@ This real-world evaluation demonstrates that adaptive classification can signifi
223273
- [Lamini Classifier Agent Toolkit](https://www.lamini.ai/blog/classifier-agent-toolkit)
224274
- [Protoformer: Embedding Prototypes for Transformers](https://arxiv.org/abs/2206.12710)
225275
- [Overcoming catastrophic forgetting in neural networks](https://arxiv.org/abs/1612.00796)
276+
- [RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models](https://arxiv.org/abs/2401.00396)
277+
- [LettuceDetect: A Hallucination Detection Framework for RAG Applications](https://arxiv.org/abs/2502.17125)
226278

227279
## Citation
228280

0 commit comments

Comments
 (0)