# STORM Agent Pattern

The **STORM** (Synthesis of Topic Outlines through Retrieval and Multi-perspective question asking) pattern creates comprehensive, well-researched reports by exploring topics from multiple perspectives, retrieving information systematically, and synthesizing it into structured documents.

## Overview

**Best For**: Creating comprehensive multi-perspective research reports and articles

**Complexity**: ⭐⭐⭐ Advanced (Multi-stage research workflow)

**Cost**: $$$$ Very High (Many LLM calls + retrieval operations)

## When to Use STORM

### Ideal Use Cases

✅ **Research report generation**
- Generates structured outlines
- Explores multiple viewpoints
- Retrieves comprehensive information
- Synthesizes into coherent reports

✅ **Wikipedia-style articles**
- Systematic topic coverage
- Multiple expert perspectives
- Well-cited, comprehensive content
- Structured sections and subsections

✅ **Technical documentation**
- Multi-perspective analysis (architect, developer, operator)
- Comprehensive topic coverage
- Research-backed content
- Organized presentation

✅ **Market research reports**
- Analyst, customer, competitor perspectives
- Data-driven insights
- Structured findings
- Synthesized recommendations

### When NOT to Use STORM

❌ **Simple queries** → Use direct LLM or ReAct
❌ **No retrieval needed** → Use Reflection or Plan & Solve
❌ **Single perspective sufficient** → Use simpler patterns
❌ **Time-sensitive tasks** → Too many stages and calls

## How STORM Works

### The Multi-Stage Research Workflow

```
┌─────────────────────────────────────────┐
│                                         │
│  1. GENERATE OUTLINE                    │
│     Topic: "Quantum Computing"          │
│     Sections: Introduction, History,    │
│               Applications, Challenges   │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│                                         │
│  2. GENERATE PERSPECTIVES               │
│     - Researcher perspective            │
│     - Practitioner perspective          │
│     - Industry expert perspective       │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│                                         │
│  3. GENERATE QUESTIONS                  │
│     For each section × each perspective │
│     Researcher + Introduction:          │
│       "What are the theoretical         │
│        foundations?"                    │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│                                         │
│  4. EXECUTE SEARCHES                    │
│     Run retrieval for all questions     │
│     Gather information from sources     │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│                                         │
│  5. SYNTHESIZE SECTIONS                 │
│     Combine multi-perspective info      │
│     into each section                   │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│                                         │
│  6. COMPILE REPORT                      │
│     Assemble all sections into          │
│     final coherent document             │
│                                         │
└─────────────────────────────────────────┘
```

### Theoretical Foundation

Based on the paper ["Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models"](https://arxiv.org/abs/2402.14207). Key principles:

1. **Multi-perspective inquiry**: Different viewpoints reveal different insights
2. **Systematic coverage**: Structured outline ensures completeness
3. **Research-backed**: Retrieval grounds content in actual information
4. **Hierarchical synthesis**: Build from questions → sections → final report

### Algorithm

```python
def storm_workflow(topic, perspectives, retrieval_tool):
    """Simplified STORM algorithm"""

    # Stage 1: Plan structure
    outline = llm_generate_outline(topic)

    # Stage 2: Identify perspectives
    active_perspectives = select_perspectives(topic, perspectives)

    # Stage 3: Generate questions
    questions = {}
    for section in outline:
        for perspective in active_perspectives:
            qs = llm_generate_questions(topic, section, perspective)
            questions[section][perspective] = qs

    # Stage 4: Retrieve information
    search_results = {}
    for section, persp_qs in questions.items():
        for perspective, qs in persp_qs.items():
            for q in qs:
                info = retrieval_tool(q)
                search_results[section][perspective].append(info)

    # Stage 5: Synthesize sections
    synthesized_sections = {}
    for section, persp_results in search_results.items():
        section_content = llm_synthesize_section(
            topic, section, persp_results
        )
        synthesized_sections[section] = section_content

    # Stage 6: Compile final report
    final_report = llm_compile_report(topic, synthesized_sections)

    return final_report
```

## API Reference

### Class: `STORMAgent`

```python
from agent_patterns.patterns import STORMAgent

agent = STORMAgent(
    llm_configs: Dict[str, Dict[str, Any]],
    retrieval_tools: Optional[Dict[str, Callable]] = None,
    perspectives: Optional[List[Dict[str, str]]] = None,
    prompt_dir: str = "prompts",
    custom_instructions: Optional[str] = None,
    prompt_overrides: Optional[Dict[str, Dict[str, str]]] = None
)
```

#### Parameters

| Parameter | Type | Required | Description |
|-----------|------|----------|-------------|
| `llm_configs` | `Dict[str, Dict[str, Any]]` | Yes | LLM configs for "thinking" and "documentation" roles |
| `retrieval_tools` | `Dict[str, Callable]` | No | Dictionary mapping tool names to retrieval functions |
| `perspectives` | `List[Dict]` | No | Custom perspective definitions (uses defaults if None) |
| `prompt_dir` | `str` | No | Custom prompt directory (default: "prompts") |
| `custom_instructions` | `str` | No | Instructions appended to system prompts |
| `prompt_overrides` | `Dict` | No | Override specific prompts programmatically |

#### Default Perspectives

The agent includes 4 default perspectives:

1. **expert**: Technical expert with deep domain knowledge
2. **practitioner**: Professional practitioner applying concepts
3. **researcher**: Academic researcher studying the topic
4. **critic**: Critical analyst examining limitations and challenges

#### LLM Roles

- **thinking**: Used for outline, perspectives, questions, and planning
- **documentation**: Used for section synthesis and report compilation

#### Methods

**`run(input_data: str) -> str`**

Executes the STORM pattern on the given topic.

- **Parameters**:
  - `input_data` (str): The topic for the research report
- **Returns**: str - The final compiled report
- **Raises**: ValueError if graph not built

**`build_graph() -> None`**

Builds the LangGraph state graph. Called automatically during initialization.

## Complete Examples

### Basic Usage

```python
from agent_patterns.patterns import STORMAgent

# Define retrieval tool
def search_web(query: str) -> str:
    """Search the web and return relevant information"""
    # Use actual search API in production
    import requests
    response = requests.get(f"https://api.search.com/search?q={query}")
    return response.json()["snippet"]

# Configure LLMs
llm_configs = {
    "thinking": {
        "provider": "openai",
        "model": "gpt-4",
        "temperature": 0.7,
    },
    "documentation": {
        "provider": "openai",
        "model": "gpt-4",
        "temperature": 0.7,
    }
}

# Create agent
agent = STORMAgent(
    llm_configs=llm_configs,
    retrieval_tools={"search": search_web}
)

# Generate comprehensive report
report = agent.run("Artificial Intelligence in Healthcare")

print(report)
# Output: Multi-section report with:
# - Introduction from multiple perspectives
# - Applications (expert, practitioner views)
# - Challenges and limitations (critic view)
# - Research directions (researcher view)
# - Conclusion synthesizing all perspectives
```

### With Custom Perspectives

```python
# Define domain-specific perspectives
healthcare_perspectives = [
    {
        "name": "physician",
        "description": "Medical doctor focused on clinical applications"
    },
    {
        "name": "patient_advocate",
        "description": "Patient representative concerned with access and safety"
    },
    {
        "name": "health_economist",
        "description": "Economist analyzing cost-effectiveness and policy"
    },
    {
        "name": "medical_researcher",
        "description": "Clinical researcher studying evidence and outcomes"
    },
    {
        "name": "tech_implementer",
        "description": "IT professional implementing health technology"
    }
]

agent = STORMAgent(
    llm_configs=llm_configs,
    retrieval_tools={"search": search_web},
    perspectives=healthcare_perspectives
)

report = agent.run("Telemedicine Adoption Post-Pandemic")
```

### With Multiple Retrieval Tools

```python
def search_academic(query: str) -> str:
    """Search academic databases"""
    # Integration with PubMed, arxiv, etc.
    return f"Academic research about: {query}"

def search_news(query: str) -> str:
    """Search news articles"""
    # Integration with news APIs
    return f"Recent news about: {query}"

def search_patents(query: str) -> str:
    """Search patent databases"""
    # Integration with patent databases
    return f"Patents related to: {query}"

# Agent will use 'search' tool by default, but you can customize
retrieval_tools = {
    "search": search_academic,
    "news_search": search_news,
    "patent_search": search_patents
}

agent = STORMAgent(
    llm_configs=llm_configs,
    retrieval_tools=retrieval_tools
)

# Modify agent to use different tools for different sections
# (requires custom prompt overrides or subclassing)
```

### With Custom Instructions

```python
research_guidelines = """
You are generating high-quality research reports. Follow these guidelines:

OUTLINE GENERATION:
- Create 5-7 main sections
- Each section should have 2-4 subsections
- Structure should tell a complete story

QUESTION GENERATION:
- Ask specific, answerable questions
- Focus on factual information
- Avoid yes/no questions
- Target 3-5 questions per section/perspective

SECTION SYNTHESIS:
- Integrate perspectives smoothly
- Cite different viewpoints explicitly
- Resolve contradictions when possible
- Maintain objectivity

REPORT COMPILATION:
- Ensure smooth transitions between sections
- Create compelling introduction and conclusion
- Maintain consistent tone throughout
- Format with clear headings and structure
"""

agent = STORMAgent(
    llm_configs=llm_configs,
    retrieval_tools={"search": search_web},
    custom_instructions=research_guidelines
)

report = agent.run("Blockchain Technology in Supply Chain Management")
```

## Retrieval Tool Guidelines

### Tool Function Signature

```python
def retrieval_tool(query: str) -> str:
    """
    Retrieve information for a query.

    Args:
        query: The search query or question

    Returns:
        Retrieved information as a string
    """
    # Retrieval implementation
    return result_string
```

### Tool Best Practices

1. **Return comprehensive info**: Include enough context for synthesis
2. **Handle errors gracefully**: Return informative error messages
3. **Be consistent**: Always return strings
4. **Add source attribution**: Include sources in retrieved content when possible

### Example Retrieval Tools

```python
def wiki_search(query: str) -> str:
    """Search Wikipedia for information"""
    try:
        import wikipedia
        results = wikipedia.summary(query, sentences=5)
        return results
    except Exception as e:
        return f"Wikipedia search error: {str(e)}"

def arxiv_search(query: str) -> str:
    """Search arXiv for academic papers"""
    try:
        import arxiv
        search = arxiv.Search(query=query, max_results=3)
        summaries = []
        for result in search.results():
            summaries.append(f"{result.title}: {result.summary[:200]}")
        return "\n\n".join(summaries)
    except Exception as e:
        return f"arXiv search error: {str(e)}"

def google_search(query: str) -> str:
    """Search Google and return snippets"""
    try:
        from googlesearch import search
        results = []
        for url in search(query, num_results=5):
            # Fetch and extract snippet
            results.append(f"Source: {url}")
        return "\n".join(results)
    except Exception as e:
        return f"Search error: {str(e)}"
```

## Customizing Prompts

### Understanding the System Prompt Structure

Version 0.2.0 introduces **enterprise-grade prompts** with a comprehensive 9-section structure. Each system prompt is now 150-300+ lines (compared to ~32 lines previously), providing significantly better guidance to the LLM.

**The 9-Section Comprehensive Structure**: All STORM system prompts now include Role and Identity, Core Capabilities (CAN/CANNOT boundaries), Process, Output Format, Decision-Making Guidelines, Quality Standards, Edge Cases, Examples, and Critical Reminders.

**Benefits**: Increased reliability, better transparency, improved robustness, and backward compatibility.

### Understanding STORM Prompts

STORM uses six prompt templates for different stages (all now with comprehensive 9-section structure):

1. **GenerateOutline**: Creates hierarchical document structure with detailed quality standards
2. **GeneratePerspectives**: Selects relevant viewpoints using systematic process guidance
3. **GenerateQuestions**: Creates questions from each perspective with examples
4. **SynthesizeSection**: Combines multi-perspective info with edge case handling
5. **CompileReport**: Assembles final document with quality criteria

### Method 1: Custom Instructions

```python
agent = STORMAgent(
    llm_configs=llm_configs,
    retrieval_tools={"search": search_web},
    custom_instructions="""
    Target audience: Business executives and decision-makers
    Tone: Professional, authoritative, accessible
    Format: Executive summary + detailed sections + recommendations
    Citation style: Inline references to perspectives
    """
)
```

### Method 2: Prompt Overrides

```python
overrides = {
    "GenerateOutline": {
        "system": "You create well-structured research outlines for business reports.",
        "user": """Topic: {topic}

Create a comprehensive outline with:
1. Executive Summary
2. Background and Context
3. Current State Analysis
4. Key Challenges
5. Opportunities
6. Recommendations
7. Conclusion

For sections 2-6, include 3-4 subsections.

Your outline:"""
    },
    "GenerateQuestions": {
        "system": "You generate insightful research questions.",
        "user": """Topic: {topic}
Section: {section}
Subsections: {subsections}

Perspective: {perspective_name} - {perspective_description}

Generate 4-5 specific research questions from this perspective that will help
create comprehensive content for this section. Make questions actionable and
focused on gathering concrete information.

Your questions (one per line, numbered):"""
    }
}

agent = STORMAgent(
    llm_configs=llm_configs,
    retrieval_tools={"search": search_web},
    prompt_overrides=overrides
)
```

### Method 3: Custom Prompt Directory

```bash
my_prompts/
└── STORMAgent/
    ├── GenerateOutline/
    │   ├── system.md
    │   └── user.md
    ├── GeneratePerspectives/
    │   ├── system.md
    │   └── user.md
    ├── GenerateQuestions/
    │   ├── system.md
    │   └── user.md
    ├── SynthesizeSection/
    │   ├── system.md
    │   └── user.md
    └── CompileReport/
        ├── system.md
        └── user.md
```

## Setting Agent Goals

### Via Topic Description

Provide detailed topic with requirements:

```python
agent.run("""
Topic: Remote Work Technology Trends 2024-2025

Scope:
- Focus on collaboration tools
- Include security considerations
- Cover hybrid work models
- Address productivity metrics

Audience: IT decision-makers and HR leaders

Desired length: Comprehensive (8-10 sections)
""")
```

### Via Custom Instructions

```python
agent = STORMAgent(
    llm_configs=llm_configs,
    retrieval_tools={"search": search_web},
    custom_instructions="""
    GOAL: Create authoritative, well-researched industry reports

    PERSPECTIVE SELECTION:
    - Always include industry expert and practitioner
    - Add researcher for academic topics
    - Add critic for balanced analysis

    QUESTION QUALITY:
    - Target specific, verifiable facts
    - Include "how" and "why" questions
    - Avoid vague or open-ended questions

    SYNTHESIS QUALITY:
    - Integrate all perspectives smoothly
    - Highlight areas of agreement and disagreement
    - Support claims with retrieved information
    - Maintain objectivity

    OUTPUT FORMAT:
    - Clear section headings
    - Smooth paragraph flow
    - Professional tone
    - Logical progression of ideas
    """
)
```

## Advanced Usage

### Custom Perspective Selection

```python
# Industry-specific perspectives
fintech_perspectives = [
    {"name": "regulator", "description": "Financial regulatory expert"},
    {"name": "security_expert", "description": "Cybersecurity specialist"},
    {"name": "customer", "description": "End-user perspective"},
    {"name": "developer", "description": "Platform developer"}
]

agent = STORMAgent(
    llm_configs=llm_configs,
    retrieval_tools={"search": search_web},
    perspectives=fintech_perspectives
)
```

### Caching Retrieval Results

```python
class CachedSTORMAgent(STORMAgent):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.retrieval_cache = {}

    def _retrieve_information(self, query: str) -> str:
        """Override to add caching"""
        if query in self.retrieval_cache:
            return self.retrieval_cache[query]

        result = super()._retrieve_information(query)
        self.retrieval_cache[query] = result
        return result

agent = CachedSTORMAgent(
    llm_configs=llm_configs,
    retrieval_tools={"search": search_web}
)
```

## Performance Considerations

### Cost Analysis

STORM is expensive due to many stages:

**Per report cost**:
- Generate outline: 1 LLM call
- Generate perspectives: 1 LLM call
- Generate questions: S × P calls (S=sections, P=perspectives)
- Synthesize sections: S calls
- Compile report: 1 call
- **Total LLM calls**: ~15-30 for typical report
- **Retrieval calls**: Q × S × P (Q=questions per section/perspective)
  - Example: 4 questions × 6 sections × 3 perspectives = 72 retrievals

**Optimization strategies**:

```python
# 1. Limit perspectives
focused_perspectives = [
    {"name": "expert", "description": "Domain expert"},
    {"name": "practitioner", "description": "Hands-on practitioner"}
]  # Just 2 instead of 4

# 2. Use cheaper model for synthesis
llm_configs = {
    "thinking": {"provider": "openai", "model": "gpt-4"},
    "documentation": {"provider": "openai", "model": "gpt-3.5-turbo"}
}

# 3. Reduce questions per section
# Override GenerateQuestions to request 2-3 questions instead of 4-5

# 4. Cache retrieval results (see Advanced Usage)
```

### When to Use STORM

| Use Case | STORM? | Alternative |
|----------|--------|-------------|
| Wikipedia-style article | ✅ Yes | - |
| Comprehensive research report | ✅ Yes | - |
| Quick summary | ❌ No | Direct LLM |
| No retrieval needed | ❌ No | Reflection, Plan & Solve |
| Single perspective sufficient | ❌ No | ReAct with tools |
| Cost-sensitive | ❌ No | Simpler patterns |

## Comparison with Other Patterns

| Aspect | STORM | ReAct | Self-Discovery |
|--------|-------|-------|----------------|
| **Purpose** | Comprehensive reports | Dynamic tool use | Complex reasoning |
| **Retrieval** | Core feature | Via tools | Not supported |
| **Perspectives** | Multi-viewpoint | Single agent | Module-based |
| **Structure** | Hierarchical outline | Adaptive | Reasoning plan |
| **Cost** | Very High | Medium | High |
| **Best For** | Research reports | Interactive tasks | Novel problems |

## Common Pitfalls

### 1. Insufficient Retrieval

❌ **Bad**: No retrieval tools configured
```python
agent = STORMAgent(llm_configs=llm_configs)  # No tools!
```

✅ **Good**: Provide effective retrieval
```python
agent = STORMAgent(
    llm_configs=llm_configs,
    retrieval_tools={"search": search_web}
)
```

### 2. Too Many Perspectives

❌ **Bad**: Overwhelming number of viewpoints
```python
perspectives = [/* 8+ perspectives */]  # Too many!
```

✅ **Good**: 2-4 complementary perspectives
```python
perspectives = [
    {"name": "expert", "description": "..."},
    {"name": "practitioner", "description": "..."},
    {"name": "critic", "description": "..."}
]
```

### 3. Vague Topics

❌ **Bad**: Overly broad or vague
```python
agent.run("Technology")  # Way too broad
```

✅ **Good**: Specific, scoped topics
```python
agent.run("""
Cloud-Native Application Development:
Focus on containerization, microservices, and DevOps practices
""")
```

### 4. Poor Question Generation

❌ **Bad**: Allowing yes/no or vague questions

✅ **Good**: Override to ensure quality questions
```python
overrides = {
    "GenerateQuestions": {
        "user": """...
Generate 3-4 specific questions that:
- Start with "What", "How", or "Why"
- Target factual, verifiable information
- Are specific to this section and perspective

Your questions:"""
    }
}
```

## Troubleshooting

### Shallow or Generic Content

**Symptom**: Report lacks depth despite retrieval

**Solutions**:
```python
# 1. Improve question quality via prompts
# 2. Use better retrieval tools
# 3. Add more specific perspectives
# 4. Override synthesis to emphasize depth

custom_instructions = """
SYNTHESIS REQUIREMENTS:
- Include specific examples and data
- Cite information from multiple perspectives
- Provide detailed explanations
- Support claims with retrieved evidence
"""
```

### Disjointed Sections

**Symptom**: Sections don't flow well together

**Solutions**:
```python
# Override CompileReport for better integration
overrides = {
    "CompileReport": {
        "user": """Topic: {topic}

Sections:
{sections}

Compile these into a cohesive report with:
1. Smooth transitions between sections
2. Consistent narrative arc
3. Clear introduction setting context
4. Strong conclusion tying everything together

Your compiled report:"""
    }
}
```

### Retrieval Failures

**Symptom**: Many retrieval errors or poor results

**Solutions**:
```python
def robust_search(query: str) -> str:
    """Search with fallback strategies"""
    try:
        # Primary search
        return primary_search_api(query)
    except:
        try:
            # Fallback search
            return fallback_search_api(query)
        except:
            # Last resort: reformulate and try again
            reformulated = f"information about {query}"
            return basic_search(reformulated)
```

## Next Steps

- Try the [complete examples](../examples/storm-examples.md)
- Learn about [ReAct](react.md) for simpler tool-based workflows
- Explore [Self-Discovery](self-discovery.md) for reasoning without retrieval
- Read the [original paper](https://arxiv.org/abs/2402.14207)

## References

- Original paper: [Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models](https://arxiv.org/abs/2402.14207)
- Stanford STORM project: [https://storm.genie.stanford.edu/](https://storm.genie.stanford.edu/)
- Related: Multi-document synthesis and question generation research