ReasonIR

ReasonIR-8B is the first retriever specifically trained for general reasoning tasks, achieving the state-of-the-art retrieval performance on BRIGHT (reasoning-intensive retrieval). When employed for retrieval-augmented generation (RAG), ReasonIR-8B also brings substantial gains on MMLU and GPQA.

Paper: https://arxiv.org/abs/2504.20595
Model: https://huggingface.co/reasonir/ReasonIR-8B
Data: https://huggingface.co/datasets/reasonir/reasonir-data

General Usage

Make sure to install transformers>=4.47.0 first!

Transformers

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("reasonir/ReasonIR-8B", torch_dtype="auto", trust_remote_code=True)

query = "The quick brown fox jumps over the lazy dog."
document = "The quick brown fox jumps over the lazy dog."
query_instruction = ""
doc_instruction = ""
model = model.to("cuda")
model.eval()
query_emb = model.encode(query, instruction=query_instruction)
doc_emb = model.encode(document, instruction=doc_instruction)
sim = query_emb @ doc_emb.T

When using AutoModel, it is important to:

Include trust_remote_code=True to make sure our custom bidirectional encoding architecture is used.
Use torch_dtype="auto" so that bf16 is activated (by default torch will use fp32).

Sentence Transformers

Ordinary retrieval models that use mean pooling can automatically be used with SentenceTransformer after being published on Huggingface.

from sentence_transformers import SentenceTransformer
model_kwargs = {"torch_dtype": "auto"}
model = SentenceTransformer("reasonir/ReasonIR-8B", trust_remote_code=True, model_kwargs=model_kwargs)
model.set_pooling_include_prompt(include_prompt=False) # exclude the prompt during pooling

query = "The quick brown fox jumps over the lazy dog."
document = "The quick brown fox jumps over the lazy dog."
query_instruction = ""
doc_instruction = ""
query_emb = model.encode(query, instruction=query_instruction)
doc_emb = model.encode(document, instruction=doc_instruction)
sim = query_emb @ doc_emb.T

It is important to also include trust_remote_code=True and torch_dtype="auto" as discussed earlier.

NOTE: there seems to be some very slight floating point discrepancy when using the SentenceTransformer (because it does not support bf16 precision), though it should not affect the results in general.

Evaluations

Please refer to the instructions in evaluation/.

Synthetic Data Generation

Please refer to the instructions in synthetic_data_generation/.

Test Time Scaling Techniques

Please refer to the instructions in test_time_techniques/.

Retriever Training

Please refer to the instructions in training/.

Citation

@article{shao2025reasonir,
      title={ReasonIR: Training Retrievers for Reasoning Tasks}, 
      author={Rulin Shao and Rui Qiao and Varsha Kishore and Niklas Muennighoff and Xi Victoria Lin and Daniela Rus and Bryan Kian Hsiang Low and Sewon Min and Wen-tau Yih and Pang Wei Koh and Luke Zettlemoyer},
      year={2025},
      journal={arXiv preprint arXiv:2504.20595},
      url={https://arxiv.org/abs/2504.20595}, 
}

License

ReasonIR is FAIR Noncommercial Research License licensed, as found in the LICENSE file.

Acknowledgments

We thank the following great open-source repositories:

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
evaluation		evaluation
synthetic_data_generation		synthetic_data_generation
test_time_techniques		test_time_techniques
training		training
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ReasonIR

General Usage

Transformers

Sentence Transformers

Evaluations

Synthetic Data Generation

Test Time Scaling Techniques

Retriever Training

Citation

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

facebookresearch/ReasonIR

Folders and files

Latest commit

History

Repository files navigation

ReasonIR

General Usage

Transformers

Sentence Transformers

Evaluations

Synthetic Data Generation

Test Time Scaling Techniques

Retriever Training

Citation

License

Acknowledgments

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages