LinuxFLBench

This repository contains the code and data for the paper "LinuxFLBench: Benchmarking and Enhancing LLM-based Agents in Localizing Linux Kernel Bugs".

Dataset Introduction

LINUXFLBENCH is a new benchmark of 250 Fault Localization tasks derived from real-world Linux kernel bugs.

The dataset is located at dataset/LINUXFLBENCH_dataset.jsonl in JSON Lines format.
Each line is a real Linux kernel bug sample, with fields including:
- id: Bug ID
- title: Bug title
- description: Detailed bug description
- Kernel Version: The version of the Linux kernel in which the bug occurred (e.g., 5.6.7).
- patch: Patch content for the fix
- paths: Source file paths involved (i.e., localization target files)
- methods: Function names involved
- Additional metadata: kernel version, component, hardware, etc.
The dataset covers various kernel versions and is suitable for evaluating LLM/agent-based fault localization in large and complex systems(i.e., the Linux kernel).
The source code for different Linux kernel versions can be downloaded from here.

Methods and Code Structure

The main code is under the code/ directory, organized as follows:

scale/: Candidate file expansion and reasoning
- scaling_candidates_with_dir.py: Directory-based candidate expansion
- scaling_candidates_with_guess.py: LLM-based candidate expansion
merge/: Multi-method result fusion and reranking
- merge.py: Fusion of multiple ranking results
- rerank.py: LLM-based candidate reranking
mail/：Mail-related scripts
- mails_retrieval.py：Retrieves relevant emails from the mail dataset based on queries
- search_mails_bm25s.py：BM25-based Mail Search Utils
method_fl/: Method-level fault localization based on the predicted code files
- method_localize.py: Method-level fault localization script
eval/: Evaluation and metrics
- evaluate.py: Main evaluation script
- evaluation_metrics.py: Common metrics such as Recall@K, MRR
utils.py, file_parser.py: General utility functions
The mail data for retrieval can be downloaded from here.

Typical Workflow

Candidate Expansion
Use scripts in scale/ to expand candidate file lists for each bug (e.g., Directory-Aware Expansion, Potential Cause Expansion).
Candidate Integration
Use scripts in merge/ to fuse multiple candidate ranking results, and rerank with LLM.
Evaluation
Use scripts in eval/ to evaluate the final results with metrics such as Recall@K and MRR.

Results

All experimental results are located in the result/ directory and can be used for reproduction.

Requirements

This project requires Python 3.8+ and the following packages:

openai
jsonlines

Install dependencies with pip:

pip install openai jsonlines

Some scripts require configuration of OpenAI API Key and base_url. See script arguments for details.

Quick Start

Example: Directory-Aware Expansion

python code/scale/scaling_candidates_with_dir.py \
  --data_path dataset/LINUXFLBENCH_dataset.jsonl \
  --save_path results/dir_scaling.jsonl \
  --gpt_base_url https://api.openai.com/v1 \
  --api_key YOUR_API_KEY \
  --kernel_path /path/to/linux/kernel/

Evaluate the results:

python code/eval/evaluate.py --path results/dir_scaling.jsonl

For more details, usage, or questions, please open an issue or contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
code		code
dataset		dataset
results		results
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LinuxFLBench

Dataset Introduction

Methods and Code Structure

Typical Workflow

Results

Requirements

Quick Start

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

FudanSELab/LinuxFLBench

Folders and files

Latest commit

History

Repository files navigation

LinuxFLBench

Dataset Introduction

Methods and Code Structure

Typical Workflow

Results

Requirements

Quick Start

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages