Reflection Agent Pattern

The Reflection pattern implements a generate-critique-refine cycle where an agent creates an initial response, reflects on its quality, and iteratively improves it based on self-critique.

Overview

Best For: Tasks requiring high-quality outputs where iterative refinement adds value

Complexity: ⭐⭐ Moderate (Good balance of simplicity and power)

Cost: $$$ Higher (Multiple LLM calls per cycle)

When to Use Reflection

Ideal Use Cases

✅ Content generation and editing

Agent generates draft content
Critiques quality, clarity, and completeness
Refines output to meet high standards

✅ Code review and improvement

Generates initial code solution
Identifies bugs, inefficiencies, or style issues
Produces improved version

✅ Creative writing

Creates initial draft
Evaluates narrative flow, character development
Refines story elements

✅ Document preparation

Generates reports or proposals
Critiques structure, argumentation, evidence
Polishes final version

When NOT to Use Reflection

❌ Time-sensitive tasks → Use ReAct for faster results ❌ Tasks requiring external tools → Use ReAct with tool access ❌ Learning from multiple trials → Use Reflexion for memory-based learning ❌ Simple queries → Direct LLM call sufficient

How Reflection Works

The Generate-Reflect-Refine Cycle

┌─────────────────────────────────────────┐
│                                         │
│  1. GENERATE: Create initial output    │
│     "Draft blog post about AI ethics"   │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│                                         │
│  2. REFLECT: Critique the output        │
│     "Missing concrete examples, too     │
│      abstract, could improve structure" │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│                                         │
│  3. CHECK: Needs refinement?            │
│     Analyze critique for improvement    │
│     opportunities                       │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│                                         │
│  4. REFINE: Improve based on critique   │
│     Add examples, restructure, clarify  │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
              [Repeat up to max_reflection_cycles]

Theoretical Foundation

The Reflection pattern is inspired by metacognitive learning and self-regulated problem solving. Key principles:

Self-evaluation: The agent assesses its own work objectively
Iterative improvement: Multiple refinement cycles lead to better outcomes
Quality awareness: Explicit critique makes quality criteria transparent
Adaptive refinement: Agent learns what aspects need improvement

Algorithm

def reflection_loop(task, max_cycles=1):
    """Simplified Reflection algorithm"""

    # Generate initial output
    output = llm_generate(task)

    for cycle in range(max_cycles):
        # Reflect on current output
        reflection = llm_reflect(task, output)

        # Check if refinement is needed
        needs_refinement = evaluate_reflection(reflection)

        if not needs_refinement:
            return output

        # Refine based on critique
        output = llm_refine(task, output, reflection)

    return output

API Reference

Class: `ReflectionAgent`

from agent_patterns.patterns import ReflectionAgent

agent = ReflectionAgent(
    llm_configs: Dict[str, Dict[str, Any]],
    max_reflection_cycles: int = 1,
    prompt_dir: str = "prompts",
    custom_instructions: Optional[str] = None,
    prompt_overrides: Optional[Dict[str, Dict[str, str]]] = None
)

Parameters

Parameter	Type	Required	Description
`llm_configs`	`Dict[str, Dict[str, Any]]`	Yes	LLM configs for “documentation” and “reflection” roles
`max_reflection_cycles`	`int`	No	Maximum number of refine iterations (default: 1)
`prompt_dir`	`str`	No	Custom prompt directory (default: “prompts”)
`custom_instructions`	`str`	No	Instructions appended to system prompts
`prompt_overrides`	`Dict`	No	Override specific prompts programmatically

LLM Roles

documentation: Used for generating initial output and refinements
reflection: Used for critiquing the output

Methods

run(input_data: str) -> str

Executes the Reflection pattern on the given input.

Parameters:
- input_data (str): The task or content request
Returns: str - The final refined output
Raises: ValueError if graph not built

build_graph() -> None

Builds the LangGraph state graph. Called automatically during initialization.

Complete Examples

Basic Usage

from agent_patterns.patterns import ReflectionAgent

# Configure LLMs
llm_configs = {
    "documentation": {
        "provider": "openai",
        "model": "gpt-4",
        "temperature": 0.7,
    },
    "reflection": {
        "provider": "openai",
        "model": "gpt-4",
        "temperature": 0.3,  # Lower temp for more consistent critique
    }
}

# Create agent
agent = ReflectionAgent(
    llm_configs=llm_configs,
    max_reflection_cycles=1
)

# Generate and refine content
result = agent.run("""
Write a technical blog post about the benefits of microservices architecture.
Target audience: Senior developers and tech leads.
""")

print(result)
# Output will be a refined blog post that has been self-critiqued and improved

With Custom Instructions

# Add domain-specific quality criteria
writing_guidelines = """
You are a technical writer. Follow these quality standards:

GENERATION:
- Use clear, concrete examples
- Avoid jargon without explanation
- Structure with clear headings
- Include code snippets where relevant

REFLECTION:
- Check for technical accuracy
- Verify examples are practical
- Ensure logical flow
- Identify areas lacking clarity

REFINEMENT:
- Address all critique points
- Maintain the original intent
- Improve without over-complicating
"""

agent = ReflectionAgent(
    llm_configs=llm_configs,
    max_reflection_cycles=2,  # Allow 2 refinement cycles
    custom_instructions=writing_guidelines
)

result = agent.run("Explain database indexing to junior developers")

With Prompt Overrides

# Customize the reflection prompt for code review
overrides = {
    "Reflect": {
        "system": """You are an expert code reviewer. Provide constructive critique focusing on:
- Code correctness and edge cases
- Performance and efficiency
- Readability and maintainability
- Best practices and design patterns
- Security considerations
""",
        "user": """Original task: {task}

Current code:
{output}

Provide a detailed code review. For each issue, explain:
1. What the problem is
2. Why it matters
3. How to fix it

Your critique:"""
    },
    "Refine": {
        "system": "You are an expert programmer. Improve the code based on the review feedback.",
        "user": """Task: {task}

Current code:
{output}

Code review feedback:
{reflection}

Produce improved code that addresses all feedback while maintaining functionality."""
    }
}

agent = ReflectionAgent(
    llm_configs=llm_configs,
    max_reflection_cycles=1,
    prompt_overrides=overrides
)

result = agent.run("""
Write a Python function to find the longest common substring
between two strings. Include error handling.
""")

Customizing Prompts

Understanding the System Prompt Structure

Version 0.2.0 introduces enterprise-grade prompts with a comprehensive 9-section structure. Each system prompt is now 150-300+ lines (compared to ~32 lines previously), providing significantly better guidance to the LLM.

The 9-Section Comprehensive Structure

All Reflection system prompts now follow this proven architecture:

Role and Identity - Clear definition of the agent’s purpose and capabilities
Core Capabilities - Explicit CAN/CANNOT boundaries to prevent hallucination
Process - Step-by-step workflow guidance for consistent execution
Output Format - Precise specifications for structured responses
Decision-Making Guidelines - Context-specific rules and best practices
Quality Standards - Clear criteria for excellent vs. poor outputs
Edge Cases - Built-in error handling and special situation guidance
Examples - 2-3 concrete examples demonstrating expected behavior
Critical Reminders - Key points emphasized for reliability

Benefits: Increased reliability, better transparency, improved robustness, and backward compatibility. No code changes required to benefit from enhanced prompts.

Understanding Reflection Prompts

The Reflection pattern uses three prompt templates (all now with comprehensive 9-section structure):

Generate/system.md & user.md: Initial content generation

Now includes all 9 sections for better generation quality
Sets the agent’s role, capabilities, and boundaries
Produces first draft with clear quality standards

Reflect/system.md & user.md: Critique generation

Comprehensive guidance on analyzing outputs
Identifies strengths and weaknesses systematically
Suggests improvements with examples

Refine/system.md & user.md: Improvement generation

Detailed process for addressing feedback
Takes original output and critique
Produces improved version with quality checks

Method 1: Custom Instructions

Add guidelines without changing core prompts:

agent = ReflectionAgent(
    llm_configs=llm_configs,
    custom_instructions="""
    QUALITY CRITERIA:
    - Accuracy: All facts must be verifiable
    - Clarity: Use simple language where possible
    - Completeness: Address all aspects of the task
    - Conciseness: Avoid unnecessary verbosity

    When refining, prioritize clarity and accuracy over length.
    """
)

Method 2: Prompt Overrides

Replace prompts entirely:

overrides = {
    "Generate": {
        "system": "You are a creative writer specializing in short stories.",
        "user": "Write a short story based on: {task}\n\nYour story:"
    },
    "Reflect": {
        "system": "You are a literary critic. Evaluate stories for narrative quality.",
        "user": """Story prompt: {task}

Current story:
{output}

Critique this story's:
1. Character development
2. Plot structure
3. Pacing and tension
4. Dialogue quality
5. Descriptive language

Your critique:"""
    },
    "Refine": {
        "system": "You are a skilled editor. Improve stories based on feedback.",
        "user": """Story prompt: {task}

Current story:
{output}

Editorial feedback:
{reflection}

Rewrite the story addressing all feedback points:"""
    }
}

agent = ReflectionAgent(
    llm_configs=llm_configs,
    prompt_overrides=overrides,
    max_reflection_cycles=2
)

Method 3: Custom Prompt Directory

For extensive customization:

my_prompts/
└── ReflectionAgent/
    ├── Generate/
    │   ├── system.md
    │   └── user.md
    ├── Reflect/
    │   ├── system.md
    │   └── user.md
    └── Refine/
        ├── system.md
        └── user.md

agent = ReflectionAgent(
    llm_configs=llm_configs,
    prompt_dir="my_prompts"
)

Setting Agent Goals

Via Task Description

The most direct way to set goals:

# Specific quality requirements
agent.run("""
Write a product description for wireless headphones that:
1. Highlights key features (battery life, sound quality, comfort)
2. Uses persuasive but honest language
3. Is 150-200 words
4. Targets audiophile consumers
""")

# Technical requirements
agent.run("""
Create a Python class for a binary search tree with:
- Insert, delete, and search methods
- In-order traversal
- Comprehensive docstrings
- Type hints
- Unit test examples
""")

Via Custom Instructions

Set persistent quality standards:

agent = ReflectionAgent(
    llm_configs=llm_configs,
    custom_instructions="""
    GOAL: Produce publication-ready technical documentation

    GENERATION STANDARDS:
    - Use active voice
    - Include concrete examples
    - Structure with clear hierarchy
    - Add code snippets for technical concepts

    REFLECTION FOCUS:
    - Verify technical accuracy
    - Check completeness of examples
    - Ensure proper formatting
    - Validate code snippets

    REFINEMENT PRIORITIES:
    1. Correctness
    2. Clarity
    3. Completeness
    4. Consistency
    """
)

Via System Prompt Override

Set goals at the system level for each phase:

overrides = {
    "Generate": {
        "system": """You are a technical documentation writer. Your goal is to create
clear, accurate, and comprehensive documentation that helps developers understand
complex concepts quickly. Always include practical examples."""
    },
    "Reflect": {
        "system": """You are a senior technical reviewer. Your goal is to ensure
documentation meets enterprise standards for accuracy, completeness, and clarity.
Be thorough but constructive in your critique."""
    },
    "Refine": {
        "system": """You are an expert technical editor. Your goal is to transform
good documentation into excellent documentation by addressing all feedback while
maintaining readability and practical value."""
    }
}

Advanced Usage

Multiple Reflection Cycles

# For complex tasks requiring multiple iterations
agent = ReflectionAgent(
    llm_configs=llm_configs,
    max_reflection_cycles=3
)

# Each cycle provides another opportunity for improvement
result = agent.run("""
Write a comprehensive research paper introduction about quantum computing,
including background, significance, and research questions.
""")

Role-Specific LLM Configurations

# Use different models for different roles
llm_configs = {
    "documentation": {
        "provider": "openai",
        "model": "gpt-4",  # Stronger model for generation
        "temperature": 0.8,  # Higher creativity
    },
    "reflection": {
        "provider": "openai",
        "model": "gpt-3.5-turbo",  # Cheaper model for critique
        "temperature": 0.2,  # More focused critique
    }
}

agent = ReflectionAgent(llm_configs=llm_configs)

Combining with External Validation

# Use custom logic to decide if refinement is needed
class CustomReflectionAgent(ReflectionAgent):
    def _check_refinement_needed(self, state):
        """Override with custom validation logic"""
        reflection = state.get("reflection", "")
        output = state.get("initial_output") or state.get("refined_output", "")

        # Custom checks
        needs_refinement = (
            len(output) < 500 or  # Too short
            "incomplete" in reflection.lower() or
            "error" in reflection.lower() or
            not self._has_code_examples(output)
        )

        state["needs_refinement"] = needs_refinement
        return state

    def _has_code_examples(self, text):
        """Check if output contains code blocks"""
        return "```" in text or "    " in text

agent = CustomReflectionAgent(llm_configs=llm_configs)

Performance Considerations

Cost Optimization

Reflection makes multiple LLM calls (generate + reflect + refine) × cycles:

# Minimize cycles for routine tasks
agent = ReflectionAgent(
    llm_configs=llm_configs,
    max_reflection_cycles=1  # Just one improvement pass
)

# Use cheaper model for reflection
llm_configs = {
    "documentation": {
        "provider": "openai",
        "model": "gpt-4",
    },
    "reflection": {
        "provider": "openai",
        "model": "gpt-3.5-turbo",  # Cheaper for critique
    }
}

Cost per task:

1 cycle: 3 LLM calls (generate + reflect + refine)
2 cycles: 5 LLM calls (generate + reflect + refine + reflect + refine)
3 cycles: 7 LLM calls

Quality vs Cost Tradeoff

# High-quality output (expensive)
premium_agent = ReflectionAgent(
    llm_configs={"documentation": {...}, "reflection": {...}},
    max_reflection_cycles=3
)

# Balanced (moderate cost)
standard_agent = ReflectionAgent(
    llm_configs={"documentation": {...}, "reflection": {...}},
    max_reflection_cycles=1
)

# Fast and cheap (skip reflection entirely - use direct LLM)
# Don't use Reflection pattern for simple tasks

When to Skip Reflection

Not all tasks benefit from reflection:

# Good for Reflection: Complex, high-stakes content
agent.run("Write a legal disclaimer for software liability")  # ✅

# Overkill for Reflection: Simple queries
agent.run("What is 2+2?")  # ❌ Too simple, use direct LLM
agent.run("List Python's built-in data types")  # ❌ Factual, no refinement needed

Comparison with Other Patterns

Aspect	Reflection	Reflexion	Self-Discovery
Approach	Single-task iteration	Multi-trial learning	Reasoning module selection
Memory	Within-task only	Across trials	No memory
Best For	Quality refinement	Learning from failures	Complex reasoning
Cost	Medium-High	High	Medium-High
Iterations	1-3 cycles	3-10 trials	Single pass
Strengths	Polished output	Adaptive learning	Diverse perspectives

Common Pitfalls

1. Insufficient Reflection Cycles

❌ Bad: Using 1 cycle for complex tasks

agent = ReflectionAgent(llm_configs=llm_configs, max_reflection_cycles=1)
agent.run("Write a comprehensive 50-page technical specification")

✅ Good: Allocate cycles based on task complexity

agent = ReflectionAgent(llm_configs=llm_configs, max_reflection_cycles=3)
agent.run("Write a comprehensive technical specification with examples")

2. Vague Reflection Criteria

❌ Bad: Generic reflection instructions

custom_instructions = "Make it better"

✅ Good: Specific quality criteria

custom_instructions = """
REFLECTION CRITERIA:
- Technical accuracy: Are all facts correct?
- Completeness: Are all required sections present?
- Clarity: Can the target audience understand it?
- Examples: Are there sufficient practical examples?
"""

4. Same Temperature for All Roles

❌ Bad: Same settings for generation and critique

llm_configs = {
    "documentation": {"provider": "openai", "model": "gpt-4", "temperature": 0.7},
    "reflection": {"provider": "openai", "model": "gpt-4", "temperature": 0.7}
}

✅ Good: Lower temperature for consistent critique

llm_configs = {
    "documentation": {"provider": "openai", "model": "gpt-4", "temperature": 0.8},
    "reflection": {"provider": "openai", "model": "gpt-4", "temperature": 0.2}
}