LLM Compiler Agent Pattern

The LLM Compiler pattern treats multi-tool workflows like a compiler, constructing a Directed Acyclic Graph (DAG) of tool calls with explicit dependencies, then executing nodes in topological order. This enables parallel execution of independent tools while respecting dependencies.

Overview

Best For: Complex multi-tool workflows with parallelizable steps

Complexity: ⭐⭐⭐ Advanced (DAG construction and execution)

Cost: $$ Medium (Efficient execution despite complexity)

When to Use LLM Compiler

Ideal Use Cases

✅ Parallel tool execution

Multiple independent tool calls
Can execute simultaneously
Respects dependencies when present
Maximizes efficiency

✅ Complex data pipelines

Multiple processing steps
Clear dependencies between steps
Benefits from parallel execution
Structured workflow

✅ Multi-source data gathering

Fetch from multiple sources
Some sources independent
Combine results systematically
Optimize execution time

✅ Workflow orchestration

Complex task dependencies
Want optimal execution order
Need to maximize parallelism
Clear input/output relationships

When NOT to Use LLM Compiler

❌ Simple sequential tasks → Use Plan & Solve ❌ Highly dynamic workflows → Use ReAct ❌ Few tools needed → Overhead not worthwhile ❌ Unknown dependencies → Hard to construct DAG upfront

How LLM Compiler Works

The DAG Construction and Execution

TASK: "Get weather in NYC and LA, calculate average temperature"

┌─────────────────────────────────────────┐
│  PHASE 1: PLANNER CREATES DAG           │
│                                         │
│  NODE: node1                            │
│  TOOL: get_weather                      │
│  ARGS: {"location": "NYC"}              │
│  DEPENDS_ON: []                         │
│                                         │
│  NODE: node2                            │
│  TOOL: get_weather                      │
│  ARGS: {"location": "LA"}               │
│  DEPENDS_ON: []                         │
│                                         │
│  NODE: node3                            │
│  TOOL: calculate                        │
│  ARGS: {"expr": "(#node1 + #node2) / 2"}│
│  DEPENDS_ON: [node1, node2]             │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│  PHASE 2: EXECUTOR (Topological Order)  │
│                                         │
│  Iteration 1: Execute ready nodes       │
│  ├─ node1: weather("NYC") → "72°F"      │
│  └─ node2: weather("LA") → "85°F"       │
│  (Parallel execution!)                  │
│                                         │
│  Iteration 2: node1, node2 complete     │
│  └─ node3: calculate("(72+85)/2")       │
│     → "78.5°F"                          │
│                                         │
│  All nodes complete!                    │
│                                         │
└─────────────────┬───────────────────────┘
                  ↓
┌─────────────────────────────────────────┐
│  PHASE 3: SYNTHESIZER                   │
│                                         │
│  Results:                               │
│  - node1: 72°F                          │
│  - node2: 85°F                          │
│  - node3: 78.5°F                        │
│                                         │
│  Final Answer:                          │
│  "NYC weather: 72°F, LA weather: 85°F,  │
│   Average: 78.5°F"                      │
│                                         │
└─────────────────────────────────────────┘

Key Concepts

Directed Acyclic Graph (DAG):

Nodes represent tool calls
Edges represent dependencies
No cycles (acyclic)
Enables topological ordering

Topological Execution:

Execute nodes when dependencies satisfied
Parallel execution of independent nodes
Efficient use of resources
Guaranteed correct ordering

Dependency Resolution:

#node1 in parameters references another node’s output
Automatically resolved when node1 completes
Enables data flow through DAG

Theoretical Foundation

Based on the paper “An LLM Compiler for Parallel Function Calling”. Inspired by compiler optimization techniques.

Key principles:

Static analysis: Determine dependencies upfront
Optimization: Identify parallelizable operations
Efficient execution: Run independent operations simultaneously
Correctness: Respect all dependencies

Algorithm

def llm_compiler(task, tools):
    """Simplified LLM Compiler algorithm"""

    # Phase 1: Construct DAG
    dag = planner_llm_generate_graph(task, tools)
    # dag = {
    #   "nodes": [
    #     {"id": "node1", "tool": "search", "args": {...}, "depends_on": []},
    #     {"id": "node2", "tool": "calc", "args": {"x": "#node1"}, "depends_on": ["node1"]},
    #   ]
    # }

    # Phase 2: Execute in topological order
    results = {}

    while not all_nodes_complete(dag, results):
        # Find nodes ready to execute (dependencies satisfied)
        ready_nodes = [
            n for n in dag["nodes"]
            if n["id"] not in results
            and all(dep in results for dep in n["depends_on"])
        ]

        # Execute ready nodes (can be parallelized)
        for node in ready_nodes:
            # Resolve dependencies in arguments
            resolved_args = resolve_references(node["args"], results)

            # Execute tool
            result = tools[node["tool"]](**resolved_args)
            results[node["id"]] = result

    # Phase 3: Synthesize final answer
    final_answer = synthesizer_llm(task, dag, results)

    return final_answer

API Reference

Class: `LLMCompilerAgent`

from agent_patterns.patterns import LLMCompilerAgent

agent = LLMCompilerAgent(
    llm_configs: Dict[str, Dict[str, Any]],
    tools: Optional[Dict[str, Callable]] = None,
    prompt_dir: str = "prompts",
    custom_instructions: Optional[str] = None,
    prompt_overrides: Optional[Dict[str, Dict[str, str]]] = None
)

Parameters

Parameter	Type	Required	Description
`llm_configs`	`Dict[str, Dict[str, Any]]`	Yes	LLM configs for “thinking” and “documentation” roles
`tools`	`Dict[str, Callable]`	No	Dictionary mapping tool names to functions
`prompt_dir`	`str`	No	Custom prompt directory (default: “prompts”)
`custom_instructions`	`str`	No	Instructions appended to system prompts
`prompt_overrides`	`Dict`	No	Override specific prompts programmatically

LLM Roles

thinking: Used for planning (DAG generation)
documentation: Used for synthesizing final answer

Methods

run(input_data: str) -> str

Executes the LLM Compiler pattern on the given input.

Parameters:
- input_data (str): The task requiring multiple tools
Returns: str - The final synthesized answer
Raises: ValueError if graph not built

build_graph() -> None

Builds the LangGraph state graph. Called automatically during initialization.

Complete Examples

Basic Usage

from agent_patterns.patterns import LLMCompilerAgent

# Define tools
def search_tool(query: str) -> str:
    """Search for information"""
    # API call
    return f"Search results for: {query}"

def calculate_tool(expression: str) -> float:
    """Evaluate mathematical expression"""
    return eval(expression)  # Use safe_eval in production

def get_price(product: str) -> float:
    """Get product price"""
    prices = {"laptop": 999, "phone": 699, "tablet": 499}
    return prices.get(product, 0)

# Configure LLMs
llm_configs = {
    "thinking": {
        "provider": "openai",
        "model": "gpt-4",
        "temperature": 0.3,
    },
    "documentation": {
        "provider": "openai",
        "model": "gpt-4",
        "temperature": 0.7,
    }
}

# Create agent
agent = LLMCompilerAgent(
    llm_configs=llm_configs,
    tools={
        "search": search_tool,
        "calculate": calculate_tool,
        "get_price": get_price,
    }
)

# Execute complex workflow
result = agent.run("""
Find the prices of laptop, phone, and tablet.
Calculate the total cost if I buy one of each.
Also search for information about each product's warranty.
Provide a summary with total cost and warranty info.
""")

print(result)
# Agent will:
# 1. PLAN: Create DAG with parallel price lookups and searches
# 2. EXECUTE: Run get_price and search calls in parallel
#    Then calculate total (depends on prices)
# 3. SYNTHESIZE: Combine all results into summary

With Custom Instructions

data_pipeline_instructions = """
You are orchestrating data processing pipelines.

DAG CONSTRUCTION:
- Identify all data sources (parallel)
- Identify processing steps (sequential when dependent)
- Identify aggregation steps (after all data ready)
- Maximize parallelism where safe

TOOL EXECUTION:
- Respect all dependencies
- Never execute before dependencies ready
- Handle errors gracefully

SYNTHESIS:
- Present data clearly
- Highlight key insights
- Show data lineage
"""

agent = LLMCompilerAgent(
    llm_configs=llm_configs,
    tools=tools,
    custom_instructions=data_pipeline_instructions
)

result = agent.run("""
Analyze sales data:
1. Fetch sales from Q1, Q2, Q3, Q4 (parallel)
2. Calculate total annual sales
3. Calculate quarter-over-quarter growth rates
4. Identify best and worst performing quarters
5. Generate executive summary
""")

With Prompt Overrides

# Customize DAG planning
overrides = {
    "PlanGraph": {
        "system": """You are an expert at constructing execution graphs for
multi-tool workflows. Create DAGs that maximize parallelism while respecting
all dependencies.""",
        "user": """Task: {task}

Available tools:
{tools}

Create a DAG (Directed Acyclic Graph) for this task.

For each node in the graph, specify:
NODE: <unique_id>
TOOL: <tool_name>
ARGS: <JSON args, use #node_id to reference other nodes>
DEPENDS_ON: <list of node_ids this depends on, or []>

Make independent operations parallelizable by having empty or non-overlapping
dependencies.

Your DAG:"""
    },
    "Synthesize": {
        "system": "You synthesize results from complex workflows into clear answers.",
        "user": """Task: {task}

Execution results:
{results}

Create a comprehensive answer that:
1. Addresses the original task completely
2. Presents information logically
3. Highlights key findings
4. Shows how results relate to each other

Your answer:"""
    }
}

agent = LLMCompilerAgent(
    llm_configs=llm_configs,
    tools=tools,
    prompt_overrides=overrides
)

Tool Definition Guidelines

Tool Function Signature

def tool_name(param1: str, param2: int = 0) -> Any:
    """
    Clear description of what the tool does.

    Args:
        param1: Description of parameter 1
        param2: Description of parameter 2 (optional)

    Returns:
        Result (can be any type, will be converted to string)
    """
    # Tool implementation
    return result

Dependency References

Tools can reference other node outputs using #node_id:

# In DAG:
# NODE: node1
# TOOL: get_data
# ARGS: {"source": "api"}
# DEPENDS_ON: []
#
# NODE: node2
# TOOL: process_data
# ARGS: {"data": "#node1"}  # References node1's output
# DEPENDS_ON: [node1]

# When executing node2, #node1 is replaced with actual result

Customizing Prompts

Understanding the System Prompt Structure

Version 0.2.0 introduces enterprise-grade prompts with a comprehensive 9-section structure (150-300+ lines vs ~32 lines).

The 9-Section Structure: All prompts include Role and Identity, Core Capabilities, Process, Output Format, Decision-Making Guidelines, Quality Standards, Edge Cases, Examples, and Critical Reminders. Benefits: Better reliability and robustness.

Understanding LLM Compiler Prompts

Uses two main prompts (both now with comprehensive 9-section structure):

PlanGraph: Planner LLM creates DAG structure with detailed quality standards and edge case handling
Synthesize: Synthesizer LLM combines results with systematic process guidance

Method 1: Custom Instructions

agent = LLMCompilerAgent(
    llm_configs=llm_configs,
    tools=tools,
    custom_instructions="""
    OPTIMIZATION GOAL: Maximize parallelism
    CORRECTNESS GOAL: Respect all dependencies
    CLARITY GOAL: Clear, structured final answers
    """
)

Method 2: Prompt Overrides

See “With Prompt Overrides” example above.

Method 3: Custom Prompt Directory

my_prompts/
└── LLMCompilerAgent/
    ├── PlanGraph/
    │   ├── system.md
    │   └── user.md
    └── Synthesize/
        ├── system.md
        └── user.md

Setting Agent Goals

Via Task Description

# Clear task with sub-goals
agent.run("""
Goal: Compare three cloud providers (AWS, GCP, Azure)

Sub-tasks (can be parallelized):
1. Get pricing for each provider
2. Get features for each provider
3. Search for reviews of each provider

Then:
4. Create comparison matrix
5. Generate recommendation

Provide detailed comparison.
""")

Via Custom Instructions

agent = LLMCompilerAgent(
    llm_configs=llm_configs,
    tools=tools,
    custom_instructions="""
    GOAL: Efficient, parallel execution of multi-tool workflows

    PLANNING:
    - Identify all independent operations
    - Enable maximum parallelism
    - Clear dependency chains

    EXECUTION:
    - Respect topological order
    - Handle errors without blocking entire workflow

    OUTPUT:
    - Comprehensive synthesis
    - Clear presentation
    - Actionable insights
    """
)

Advanced Usage

Parallel Execution Simulation

# Current implementation executes sequentially
# But DAG enables parallel execution in production

class ParallelLLMCompilerAgent(LLMCompilerAgent):
    def _executor_dispatch(self, state):
        """Override to add parallel execution"""
        import concurrent.futures

        graph = state["execution_graph"]
        results = state["node_results"]

        with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
            while not self._all_complete(graph, results):
                # Find ready nodes
                ready_nodes = self._get_ready_nodes(graph, results)

                if not ready_nodes:
                    break

                # Submit all ready nodes to executor
                futures = {
                    executor.submit(
                        self._execute_tool,
                        node["tool"],
                        node["args"],
                        results
                    ): node
                    for node in ready_nodes
                }

                # Collect results
                for future in concurrent.futures.as_completed(futures):
                    node = futures[future]
                    result = future.result()
                    results[node["id"]] = result

        state["node_results"] = results
        return state

agent = ParallelLLMCompilerAgent(llm_configs=llm_configs, tools=tools)

DAG Visualization

class VisualizingLLMCompilerAgent(LLMCompilerAgent):
    def run(self, input_data):
        """Override to visualize DAG"""
        result = super().run(input_data)

        # Access DAG (would need to store during execution)
        print("\n=== Execution DAG ===")
        self._print_dag()

        return result

    def _print_dag(self):
        """Print DAG structure"""
        # Implementation to visualize the execution graph
        pass

agent = VisualizingLLMCompilerAgent(llm_configs=llm_configs, tools=tools)

Performance Considerations

Cost Analysis

LLM Compiler cost:

Plan DAG: 1 LLM call
Execute tools: 0 LLM calls (just tool execution)
Synthesize: 1 LLM call
Total: 2 LLM calls (like REWOO)

Efficiency gains:

Parallel execution reduces wall-clock time
Only 2 LLM calls regardless of tool count
Optimal execution order minimizes waste

When LLM Compiler Excels

✅ Many independent tools: Parallel execution shines ✅ Complex dependencies: DAG handles correctly ✅ Time-sensitive: Parallelism speeds up execution ✅ Clear structure: Can plan DAG upfront

When to Use Other Patterns

Scenario	Better Pattern	Reason
Dynamic workflow	ReAct	Can’t plan DAG upfront
Simple sequence	Plan & Solve	DAG overhead unnecessary
No tools	Self-Discovery, Reflection	LLM Compiler needs tools
Unknown dependencies	ReAct	Adaptive approach better

Comparison with Other Patterns

Aspect	LLM Compiler	REWOO	ReAct
Planning	DAG construction	Linear with placeholders	Adaptive
Execution	Topological order	Sequential	Iterative
Parallelism	Explicit support	No	No
LLM Calls	2 (fixed)	2 (fixed)	N + 1
Dependencies	Explicit in DAG	Implicit in placeholders	Adaptive
Best For	Complex workflows	Batch operations	Dynamic exploration

Common Pitfalls

1. Circular Dependencies

❌ Bad: Creating cycles in DAG

# NODE: node1 depends on node2
# NODE: node2 depends on node1
# → Impossible to execute!

✅ Good: Acyclic dependencies

# NODE: node1 depends on []
# NODE: node2 depends on [node1]
# NODE: node3 depends on [node1, node2]

2. Missing Dependencies

❌ Bad: Not specifying required dependencies

# NODE: node2 uses #node1 in args
# DEPENDS_ON: []  # Missing node1!

✅ Good: Explicit dependencies

# NODE: node2 uses #node1 in args
# DEPENDS_ON: [node1]  # ✅ Correct

3. Over-Sequencing

❌ Bad: Making everything depend on everything

✅ Good: Only specify actual dependencies

# If node2 and node3 are independent:
# NODE: node2 DEPENDS_ON: []
# NODE: node3 DEPENDS_ON: []
# → Can execute in parallel!

4. Incorrect Reference Syntax

❌ Bad: Wrong reference format

# ARGS: {"data": "node1"}  # Missing #

✅ Good: Correct reference

# ARGS: {"data": "#node1"}  # ✅ Will be resolved

Troubleshooting

DAG Parsing Failures

Symptom: Can’t extract DAG from plan

Solutions:

# Strengthen PlanGraph prompt format
overrides = {
    "PlanGraph": {
        "user": """...
STRICT FORMAT (follow exactly):

NODE: node1
TOOL: tool_name
ARGS: {"param": "value"}
DEPENDS_ON: []

NODE: node2
TOOL: tool_name
ARGS: {"param": "#node1"}
DEPENDS_ON: [node1]

(Blank line between nodes)

Your DAG:"""
    }
}

Execution Hangs

Symptom: Some nodes never execute

Solutions:

# Check for:
# 1. Circular dependencies (impossible to resolve)
# 2. Missing tools (can't execute)
# 3. Incorrect dependency specification

# Add validation:
class ValidatingLLMCompilerAgent(LLMCompilerAgent):
    def _planner_generate_graph(self, state):
        state = super()._planner_generate_graph(state)

        # Validate DAG
        if self._has_cycles(state["execution_graph"]):
            state["error"] = "Circular dependencies detected"

        return state

Poor Parallelism

Symptom: Nodes execute sequentially despite being independent

Solutions:

# Emphasize parallelism in planning
custom_instructions = """
DAG PLANNING:
When creating the DAG, actively look for opportunities for parallel execution.
If two nodes don't depend on each other, they should have independent DEPENDS_ON lists.
"""

Next Steps

Try the complete examples
Learn about REWOO for simpler batch execution
Explore ReAct for dynamic tool workflows
Read the original paper

References

Original paper: An LLM Compiler for Parallel Function Calling
DAG concepts: Directed Acyclic Graphs
Topological sorting: Topological Sort
Compiler optimization techniques applied to LLM workflows

LLM Compiler Agent Pattern

Overview

When to Use LLM Compiler

Ideal Use Cases

When NOT to Use LLM Compiler

How LLM Compiler Works

The DAG Construction and Execution

Key Concepts

Theoretical Foundation

Algorithm

API Reference

Class: LLMCompilerAgent

Parameters

LLM Roles

Methods

Complete Examples

Basic Usage

With Custom Instructions

With Prompt Overrides

Tool Definition Guidelines

Tool Function Signature

Dependency References

Customizing Prompts

Understanding the System Prompt Structure

Understanding LLM Compiler Prompts

Method 1: Custom Instructions

Method 2: Prompt Overrides

Method 3: Custom Prompt Directory

Setting Agent Goals

Via Task Description

Via Custom Instructions

Advanced Usage

Parallel Execution Simulation

DAG Visualization

Performance Considerations

Cost Analysis

When LLM Compiler Excels

When to Use Other Patterns

Comparison with Other Patterns

Common Pitfalls

1. Circular Dependencies

2. Missing Dependencies

3. Over-Sequencing

4. Incorrect Reference Syntax

Troubleshooting

DAG Parsing Failures

Execution Hangs

Poor Parallelism

Next Steps

References

Class: `LLMCompilerAgent`