Skip to content

FoundationAgents/SPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

16 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

SPO | Self-Supervised Prompt Optimization

Paper Demo ModelScope

An automated prompt engineering tool for Large Language Models (LLMs), designed for universal domain adaptation.

A next-generation prompt engineering system implementing Self-Supervised Prompt Optimization (SPO). Achieves state-of-the-art performance with 17.8-90.9ร— higher cost efficiency than conventional methods. ๐Ÿš€

Framework of SPO

โœจ Core Advantages

  • ๐Ÿ’ธ Ultra-Low Cost - $0.15 per task optimization
  • ๐Ÿท๏ธ Zero Supervision - No ground truth/human feedback required
  • โšก Universal Adaptation - Closed & open-ended tasks supported
  • ๐Ÿ”„ Self-Evolving - Auto-optimization via LLM-as-judge mechanism

๐Ÿ”— Quick Links

๐Ÿ“Š Experiment

Closed Tasks

SPO closed task table SPO closed task figure

SPO demonstrates superior cost efficiency, requiring only 1.1% to 5.6% of the cost of state-of-the-art methods while maintaining competitive performance.

Open-ended Tasks

Open-ended task figure

SPO significantly improves model performance across all model configurations in open-ended tasks.

๐Ÿš€ Quick Start

1. Configure Your API Key โš™๏ธ

Configure LLM parameters in config/config2.yaml (see examples/spo/config2.example.yaml for reference)

2. Define Your Iteration template ๐Ÿ“

Create a Iteration template file settings/task_name.yaml:

prompt: |
  Please solve the following problem.

requirements: |
  ...

count: None

qa:
  - question: |
      ...
    answer: |
      ...

  - question: |
      ...
    answer: |
      ...

Notes:

  • prompt: Initial prompt for iteration
  • requirements: Desired effects/outcomes (e.g., generate more thinking, use more humorous language)
  • count: Target word count for the generated prompt (e.g., 50). Set to None for no limit
  • faq: QA pairs used for iteration, can include appropriate number of pairs (typically 3)
    • question: Questions from the dataset used for iteration
    • answer: Corresponding answers. Can contain desired thinking patterns or responses instead of actual answers, or can be left empty. See settings/Navigate.yaml for reference

3. Implement the PromptOptimizer ๐Ÿ”ง

You have three ways to run the PromptOptimizer:

Option 1: Python Script

from components.optimizer import PromptOptimizer
from utils.llm_client import SPO_LLM

if __name__ == "__main__":
  # Initialize LLM settings
  SPO_LLM.initialize(
    optimize_kwargs={"model": "claude-3-5-sonnet-20240620", "temperature": 0.7},
    evaluate_kwargs={"model": "gpt-4o-mini", "temperature": 0.3},
    execute_kwargs={"model": "gpt-4o-mini", "temperature": 0},
    mode = "base_model"
  )

  # Create and run optimizer
  optimizer = PromptOptimizer(
    optimized_path="workspace",  # Output directory
    initial_round=1,  # Starting round
    max_rounds=10,  # Maximum optimization rounds
    template="Poem.yaml",  # Template file
    name="Poem",  # Project name
  )

  optimizer.optimize()

Option 2: Command Line Interface

python -m optimize

Available command line options:

--opt-model            Model for optimization (default: claude-3-5-sonnet-20240620)
--opt-temp            Temperature for optimization (default: 0.7)
--eval-model          Model for evaluation (default: gpt-4o-mini)
--eval-temp          Temperature for evaluation (default: 0.3)
--exec-model          Model for execution (default: gpt-4o-mini)
--exec-temp          Temperature for execution (default: 0)
--workspace          Output directory path (default: workspace)
--initial-round      Initial round number (default: 1)
--max-rounds        Maximum number of rounds (default: 10)
--template          Template file name (default: Poem.yaml)
--name              Project name (default: Poem)
--mode              Execution model mode: base_model or reasoning_model (default: base_model)

For help:

python -m optimize --help

Option 3: Streamlit Web Interface

For a more user-friendly experience, you can use the Streamlit web interface to configure and run the optimizer.

First, install Streamlit:

pip install "streamlit~=1.42.0"

Then run the web interface:

python -m streamlit run app.py

4. View Results

workspace
  โ””โ”€โ”€ Project_name
      โ””โ”€โ”€ prompts
          โ”œโ”€โ”€ results.json 
          โ”œโ”€โ”€ round_1
          โ”‚   โ”œโ”€โ”€ answers.txt
          โ”‚   โ””โ”€โ”€ prompt.txt
          โ”œโ”€โ”€ round_2
          โ”‚   โ”œโ”€โ”€ answers.txt
          โ”‚   โ””โ”€โ”€ prompt.txt
          โ”œโ”€โ”€ round_3
          โ”‚   โ”œโ”€โ”€ answers.txt
          โ”‚   โ””โ”€โ”€ prompt.txt
          โ”œโ”€โ”€ ...
          โ””โ”€โ”€ round_n
              โ”œโ”€โ”€ answers.txt
              โ””โ”€โ”€ prompt.txt
  • results.json: Stores whether each iteration round was judged successful and other related information
  • prompt.txt: The optimized prompt for the corresponding round
  • answers.txt: The output results generated using the prompt for the corresponding round

4. About Reasoning Model

You can control the execution model's output mode via the --mode parameter (or mode argument in Python):

  • base_model: Only returns the model's main content.
  • reasoning_model: If the model supports it, returns both the reasoning process (reasoning_content) and the main content.

Example:

python -m optimize --mode reasoning_model

Or in Python:

SPO_LLM.initialize(
    ...,
    mode="reasoning_model"
)

our exploration : SPO and Reasoning Models

We investigated how Self-Supervised Prompt Optimization (SPO) impacts different types of Large Language Models, particularly focusing on advanced Reasoning Models versus more general Base Models. Our key findings include:

  • Output Refinement vs. Core Logic Change (Reasoning Models): For sophisticated Reasoning Models, SPO excels at refining output structure, style, and adherence to specific formats (e.g., successful in role-playing, MT-Bench formatting). However, it does not fundamentally alter their core "thought paths" or internal reasoning logic. Even with highly structured prompts, the underlying problem-solving approach of these models remains largely consistent.

  • Limited Impact on Inherent Reasoning Flaws (Reasoning Models): SPO showed limited ability to correct inherent logical errors or fill knowledge gaps in Reasoning Models for complex tasks like advanced mathematical reasoning (MATH) or deep knowledge QA (GPQA). If a model inherently struggled with a concept, SPO couldn't typically "teach" it to solve the problem correctly.

  • Guiding Reasoning (Base Models): In contrast, for Base Models, SPO appears more effective in guiding the actual reasoning process, helping them construct more structured and accurate responses by providing clearer paths.

  • Differential Mechanism: This suggests SPO acts more as an "output customizer" and "constraint enforcer" for already capable Reasoning Models, whereas for Base Models, it can serve as a more direct "reasoning guide."

In essence: While SPO is a powerful tool for prompt optimization, its primary benefits and operational mechanisms differ based on the target LLM's existing reasoning capabilities. For Reasoning Models, SPO is highly effective for output control and customization, but less so for fundamentally enhancing their core logical problem-solving abilities if those abilities are already limited. (For detailed experimental setups, specific prompt examples, and full result tables, please refer to our [full research notes link - https://bcniea0qxkrv.feishu.cn/wiki/K2lMwya6diDy7ek94ZRcqxa8nsb?from=from_copylink]).

For more details or to discuss further, feel free to reach out @Rubbisheep.

Citation

If you use SPO in your research, please cite our paper:

@misc{xiang2025spo,
      title={Self-Supervised Prompt Optimization}, 
      author={Jinyu Xiang and Jiayi Zhang and Zhaoyang Yu and Fengwei Teng and Jinhao Tu and Xinbing Liang and Sirui Hong and Chenglin Wu and Yuyu Luo},
      year={2025},
      eprint={2502.06855},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.06855}, 
}

About

Self Supervised Prompt Optimization.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages