RAM (namely RAG As continually updated Memory) is an innovative RAG-based and training-free framework with continually updated memory. Inspired by humans' pedagogical process, RAM utilizes recursively reasoning-based retrieval as well as reflections to continually update and enriches the dynamic memory. Notably, RAM learns from users' communicative feedback for further self-improvement, namely communicative learning. Extensive experiments with both simulated and real users demonstrate significant improvements over traditional RAG and self-knowledge methods, particularly excelling in handling false premise and multi-hop questions. Furthermore, RAM exhibits promising adaptability to various feedback and retrieval strategies, showcasing its potential for advancing AI capabilities in dynamic knowledge acquisition and lifelong learning.
- 📊 Data
- ⚙ Installation
- 🚀 Quick Start
- 👩🏫 Different types of feedback in RAM
- 🔍 Different retrieval strategies
- Human Study
- 📝 Citation
- 📣 Contacts
We use two datasets FreshQA and MQuAKE and prepare relevant knowledge accordingly as below:
RAM/data
├─freshqa_2104 # old knowledge of FreshQA before April 2021
├─mquake_2104 # old knowledge of MQuAKE before April 2021
│
│ freshqa.csv # QA pairs selected with source from Wikipedia from FreshQA for use
│ freshqa_groundtruth.txt # ground truth of FreshQA
│ freshqa_IDs.txt # IDs of questions selected from original FreshQA dataset for use
│ freshqa_QR.json # related question mappings in FreshQA
│ mquake.csv # QA pairs sampled from original MQuAKE for use
│ mquake_groundtruth.json # related question mappings in FreshQA
# clone RAM
git clone https://github.com/bigai-nlco/RAM.git
cd RAM
#create conda env
conda create -n RAM python=3.10
conda activate RAM
# install package
pip install -r requirements.txt
# export openai key
export OPENAI_API_KEY="[your_openai_api_key]"
In this work, we use all-MinLM-L6-v2 to construct database, and use bert-base-nli-mean-tokens to calculate the semantic similarity of feedback and ground truth.
You can download these two models from all-MiniLM-L6-v2 and bert-base-nli-mean-tokens. You should put all-MiniLM-L6-v2 and bert-base-nli-mean-tokens at the path inference/.
If you want to use other models, you can change them at line 81 and line 88 in inference/llm.py.
Before you start the project, you should encode each article in datasets as an embedding in the vector database ChromaDB.
python data_preparation.py --vectordb_name chroma_db
We test LLMs using Python codes under the path inference/ with four methods in our work. We select the method for evaluation via --method and the specific dataset via --dataset. Let's take FreshQA as an example:
For Self-knowledge:
python inference/base.py --dataset freshqa --result_path ./result/freshqa_self_knowledge.json --method Self-knowledge --model_path Meta-Llama-3-8B-Instruct
For RAG-only:
python inference/base.py --dataset freshqa --result_path ./result/freshqa_rag_only.json --method RAG-only --vectordb_name chroma_db --model_path Meta-Llama-3-8B-Instruct
For RAM-R3:
python inference/RAM.py --dataset freshqa --result_path ./result/freshqa_ram_r3.json --method RAM_R3 --vectordb_name chroma_db --model_path Meta-Llama-3-8B-Instruct
For RAM-R3 with Hints:
python inference/RAM.py --dataset freshqa --result_path ./result/freshqa_ram_r3_hints.json --method RAM_R3_hints --vectordb_name chroma_db --model_path Meta-Llama-3-8B-Instruct
For RAM:
python inference/RAM.py --dataset freshqa --result_path ./result/freshqa_ram.json --method RAM --vectordb_name chroma_db --model_path Meta-Llama-3-8B-Instruct
Open-source models can be downloaded and loaded from models/ by default, you can change the path via --model_path. You can download the Llama-3-8B-Instruct, LlaMa-2-7B, LlaMa-2-13B and Vicuna-7B from Hugginig Face. For Vicuna-7B, you should modify prompt according to instruct tuning. You can also determine the output result through --result_path.
The implementation parameters of models and metrics are in default settings. All the experiments can be run on 1 A100 each with 80G GPU
We test the performance using different types of feedback by using Python codes under feedback/ with four methods, and we select method via --feedback_method.
For Hints:
Hints is a basic setting in our experimental method, as described in "For RAM" in Step 3 of Quick Start.
For Hints without ground truth:
python inference/RAM.py --dataset freshqa --result_path ./result/freshqa_ram_wo_groundtruth.json --method RAM_wo_groundtruth --vectordb_name chroma_db --model_path Meta-Llama-3-8B-Instruct
For Direct answer:
python feedback/RAM.py --dataset freshqa --result_path ./result/freshqa_DA.json --vectordb_name chroma_db --model_path Meta-Llama-3-8B-Instruct --feedback_method Direct_answer
For No feedback:
python feedback/RAM.py --dataset freshqa --result_path ./result/freshqa_DA.json --vectordb_name chroma_db --model_path Meta-Llama-3-8B-Instruct --feedback_method NA
We also evaluate RAM using four distinct retrieval types: Default(Stuff), Map-reduce, Refine, and Map-rerank. For detailed information about the settings of the above types, please refer to this link.
You can replace lines 256 to 268 in inference/agents.py with the code implementations below, except for Default setting.
docs = self.retriever.get_relevant_documents(argument)
tmp = []
for i in docs:
doc_details = i.to_json()['kwargs']
tmp.append(doc_details['metadata']['title'])
self.title = tmp
chain = load_qa_chain(self.llm, chain_type="map_reduce") ## Replace "chain_type" with either "refine" or "map_rerank"
result = chain.run(input_documents=docs, question=self.hfeedback + ' ' + self.question)
The graphical interactive interfaces for RAM are presented below.
The figure below illustrates the effectiveness of communicative learning with real users across two datasets and four distinct settings.
If you would like to use our data or find our work interesting, please cite:
@article{li2024ram,
title={RAM: Retrieval-augmented Generation As Continually Updated Memory via Communicative Learning},
author={Li, Jiaqi and Wang, Xiaobo and Wang, Zihao and Zheng, Zilong},
journal={arXiv preprint arXiv:2404.12045},
year={2024}
}
We sincerely appreciate users for their valuable contributions to human study in RAM. We are very pleased to answer any questions about RAM: [email protected]