# LLM Compiler Agent Pattern The **LLM Compiler** pattern treats multi-tool workflows like a compiler, constructing a Directed Acyclic Graph (DAG) of tool calls with explicit dependencies, then executing nodes in topological order. This enables parallel execution of independent tools while respecting dependencies. ## Overview **Best For**: Complex multi-tool workflows with parallelizable steps **Complexity**: ⭐⭐⭐ Advanced (DAG construction and execution) **Cost**: $$ Medium (Efficient execution despite complexity) ## When to Use LLM Compiler ### Ideal Use Cases ✅ **Parallel tool execution** - Multiple independent tool calls - Can execute simultaneously - Respects dependencies when present - Maximizes efficiency ✅ **Complex data pipelines** - Multiple processing steps - Clear dependencies between steps - Benefits from parallel execution - Structured workflow ✅ **Multi-source data gathering** - Fetch from multiple sources - Some sources independent - Combine results systematically - Optimize execution time ✅ **Workflow orchestration** - Complex task dependencies - Want optimal execution order - Need to maximize parallelism - Clear input/output relationships ### When NOT to Use LLM Compiler ❌ **Simple sequential tasks** → Use Plan & Solve ❌ **Highly dynamic workflows** → Use ReAct ❌ **Few tools needed** → Overhead not worthwhile ❌ **Unknown dependencies** → Hard to construct DAG upfront ## How LLM Compiler Works ### The DAG Construction and Execution ``` TASK: "Get weather in NYC and LA, calculate average temperature" ┌─────────────────────────────────────────┐ │ PHASE 1: PLANNER CREATES DAG │ │ │ │ NODE: node1 │ │ TOOL: get_weather │ │ ARGS: {"location": "NYC"} │ │ DEPENDS_ON: [] │ │ │ │ NODE: node2 │ │ TOOL: get_weather │ │ ARGS: {"location": "LA"} │ │ DEPENDS_ON: [] │ │ │ │ NODE: node3 │ │ TOOL: calculate │ │ ARGS: {"expr": "(#node1 + #node2) / 2"}│ │ DEPENDS_ON: [node1, node2] │ │ │ └─────────────────┬───────────────────────┘ ↓ ┌─────────────────────────────────────────┐ │ PHASE 2: EXECUTOR (Topological Order) │ │ │ │ Iteration 1: Execute ready nodes │ │ ├─ node1: weather("NYC") → "72°F" │ │ └─ node2: weather("LA") → "85°F" │ │ (Parallel execution!) │ │ │ │ Iteration 2: node1, node2 complete │ │ └─ node3: calculate("(72+85)/2") │ │ → "78.5°F" │ │ │ │ All nodes complete! │ │ │ └─────────────────┬───────────────────────┘ ↓ ┌─────────────────────────────────────────┐ │ PHASE 3: SYNTHESIZER │ │ │ │ Results: │ │ - node1: 72°F │ │ - node2: 85°F │ │ - node3: 78.5°F │ │ │ │ Final Answer: │ │ "NYC weather: 72°F, LA weather: 85°F, │ │ Average: 78.5°F" │ │ │ └─────────────────────────────────────────┘ ``` ### Key Concepts **Directed Acyclic Graph (DAG)**: - Nodes represent tool calls - Edges represent dependencies - No cycles (acyclic) - Enables topological ordering **Topological Execution**: - Execute nodes when dependencies satisfied - Parallel execution of independent nodes - Efficient use of resources - Guaranteed correct ordering **Dependency Resolution**: - `#node1` in parameters references another node's output - Automatically resolved when node1 completes - Enables data flow through DAG ### Theoretical Foundation Based on the paper ["An LLM Compiler for Parallel Function Calling"](https://arxiv.org/abs/2312.04511). Inspired by compiler optimization techniques. Key principles: 1. **Static analysis**: Determine dependencies upfront 2. **Optimization**: Identify parallelizable operations 3. **Efficient execution**: Run independent operations simultaneously 4. **Correctness**: Respect all dependencies ### Algorithm ```python def llm_compiler(task, tools): """Simplified LLM Compiler algorithm""" # Phase 1: Construct DAG dag = planner_llm_generate_graph(task, tools) # dag = { # "nodes": [ # {"id": "node1", "tool": "search", "args": {...}, "depends_on": []}, # {"id": "node2", "tool": "calc", "args": {"x": "#node1"}, "depends_on": ["node1"]}, # ] # } # Phase 2: Execute in topological order results = {} while not all_nodes_complete(dag, results): # Find nodes ready to execute (dependencies satisfied) ready_nodes = [ n for n in dag["nodes"] if n["id"] not in results and all(dep in results for dep in n["depends_on"]) ] # Execute ready nodes (can be parallelized) for node in ready_nodes: # Resolve dependencies in arguments resolved_args = resolve_references(node["args"], results) # Execute tool result = tools[node["tool"]](**resolved_args) results[node["id"]] = result # Phase 3: Synthesize final answer final_answer = synthesizer_llm(task, dag, results) return final_answer ``` ## API Reference ### Class: `LLMCompilerAgent` ```python from agent_patterns.patterns import LLMCompilerAgent agent = LLMCompilerAgent( llm_configs: Dict[str, Dict[str, Any]], tools: Optional[Dict[str, Callable]] = None, prompt_dir: str = "prompts", custom_instructions: Optional[str] = None, prompt_overrides: Optional[Dict[str, Dict[str, str]]] = None ) ``` #### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `llm_configs` | `Dict[str, Dict[str, Any]]` | Yes | LLM configs for "thinking" and "documentation" roles | | `tools` | `Dict[str, Callable]` | No | Dictionary mapping tool names to functions | | `prompt_dir` | `str` | No | Custom prompt directory (default: "prompts") | | `custom_instructions` | `str` | No | Instructions appended to system prompts | | `prompt_overrides` | `Dict` | No | Override specific prompts programmatically | #### LLM Roles - **thinking**: Used for planning (DAG generation) - **documentation**: Used for synthesizing final answer #### Methods **`run(input_data: str) -> str`** Executes the LLM Compiler pattern on the given input. - **Parameters**: - `input_data` (str): The task requiring multiple tools - **Returns**: str - The final synthesized answer - **Raises**: ValueError if graph not built **`build_graph() -> None`** Builds the LangGraph state graph. Called automatically during initialization. ## Complete Examples ### Basic Usage ```python from agent_patterns.patterns import LLMCompilerAgent # Define tools def search_tool(query: str) -> str: """Search for information""" # API call return f"Search results for: {query}" def calculate_tool(expression: str) -> float: """Evaluate mathematical expression""" return eval(expression) # Use safe_eval in production def get_price(product: str) -> float: """Get product price""" prices = {"laptop": 999, "phone": 699, "tablet": 499} return prices.get(product, 0) # Configure LLMs llm_configs = { "thinking": { "provider": "openai", "model": "gpt-4", "temperature": 0.3, }, "documentation": { "provider": "openai", "model": "gpt-4", "temperature": 0.7, } } # Create agent agent = LLMCompilerAgent( llm_configs=llm_configs, tools={ "search": search_tool, "calculate": calculate_tool, "get_price": get_price, } ) # Execute complex workflow result = agent.run(""" Find the prices of laptop, phone, and tablet. Calculate the total cost if I buy one of each. Also search for information about each product's warranty. Provide a summary with total cost and warranty info. """) print(result) # Agent will: # 1. PLAN: Create DAG with parallel price lookups and searches # 2. EXECUTE: Run get_price and search calls in parallel # Then calculate total (depends on prices) # 3. SYNTHESIZE: Combine all results into summary ``` ### With Custom Instructions ```python data_pipeline_instructions = """ You are orchestrating data processing pipelines. DAG CONSTRUCTION: - Identify all data sources (parallel) - Identify processing steps (sequential when dependent) - Identify aggregation steps (after all data ready) - Maximize parallelism where safe TOOL EXECUTION: - Respect all dependencies - Never execute before dependencies ready - Handle errors gracefully SYNTHESIS: - Present data clearly - Highlight key insights - Show data lineage """ agent = LLMCompilerAgent( llm_configs=llm_configs, tools=tools, custom_instructions=data_pipeline_instructions ) result = agent.run(""" Analyze sales data: 1. Fetch sales from Q1, Q2, Q3, Q4 (parallel) 2. Calculate total annual sales 3. Calculate quarter-over-quarter growth rates 4. Identify best and worst performing quarters 5. Generate executive summary """) ``` ### With Prompt Overrides ```python # Customize DAG planning overrides = { "PlanGraph": { "system": """You are an expert at constructing execution graphs for multi-tool workflows. Create DAGs that maximize parallelism while respecting all dependencies.""", "user": """Task: {task} Available tools: {tools} Create a DAG (Directed Acyclic Graph) for this task. For each node in the graph, specify: NODE: TOOL: ARGS: DEPENDS_ON: Make independent operations parallelizable by having empty or non-overlapping dependencies. Your DAG:""" }, "Synthesize": { "system": "You synthesize results from complex workflows into clear answers.", "user": """Task: {task} Execution results: {results} Create a comprehensive answer that: 1. Addresses the original task completely 2. Presents information logically 3. Highlights key findings 4. Shows how results relate to each other Your answer:""" } } agent = LLMCompilerAgent( llm_configs=llm_configs, tools=tools, prompt_overrides=overrides ) ``` ## Tool Definition Guidelines ### Tool Function Signature ```python def tool_name(param1: str, param2: int = 0) -> Any: """ Clear description of what the tool does. Args: param1: Description of parameter 1 param2: Description of parameter 2 (optional) Returns: Result (can be any type, will be converted to string) """ # Tool implementation return result ``` ### Dependency References Tools can reference other node outputs using `#node_id`: ```python # In DAG: # NODE: node1 # TOOL: get_data # ARGS: {"source": "api"} # DEPENDS_ON: [] # # NODE: node2 # TOOL: process_data # ARGS: {"data": "#node1"} # References node1's output # DEPENDS_ON: [node1] # When executing node2, #node1 is replaced with actual result ``` ## Customizing Prompts ### Understanding the System Prompt Structure Version 0.2.0 introduces **enterprise-grade prompts** with a comprehensive 9-section structure (150-300+ lines vs ~32 lines). **The 9-Section Structure**: All prompts include Role and Identity, Core Capabilities, Process, Output Format, Decision-Making Guidelines, Quality Standards, Edge Cases, Examples, and Critical Reminders. **Benefits**: Better reliability and robustness. ### Understanding LLM Compiler Prompts Uses two main prompts (both now with comprehensive 9-section structure): 1. **PlanGraph**: Planner LLM creates DAG structure with detailed quality standards and edge case handling 2. **Synthesize**: Synthesizer LLM combines results with systematic process guidance ### Method 1: Custom Instructions ```python agent = LLMCompilerAgent( llm_configs=llm_configs, tools=tools, custom_instructions=""" OPTIMIZATION GOAL: Maximize parallelism CORRECTNESS GOAL: Respect all dependencies CLARITY GOAL: Clear, structured final answers """ ) ``` ### Method 2: Prompt Overrides See "With Prompt Overrides" example above. ### Method 3: Custom Prompt Directory ```bash my_prompts/ └── LLMCompilerAgent/ ├── PlanGraph/ │ ├── system.md │ └── user.md └── Synthesize/ ├── system.md └── user.md ``` ## Setting Agent Goals ### Via Task Description ```python # Clear task with sub-goals agent.run(""" Goal: Compare three cloud providers (AWS, GCP, Azure) Sub-tasks (can be parallelized): 1. Get pricing for each provider 2. Get features for each provider 3. Search for reviews of each provider Then: 4. Create comparison matrix 5. Generate recommendation Provide detailed comparison. """) ``` ### Via Custom Instructions ```python agent = LLMCompilerAgent( llm_configs=llm_configs, tools=tools, custom_instructions=""" GOAL: Efficient, parallel execution of multi-tool workflows PLANNING: - Identify all independent operations - Enable maximum parallelism - Clear dependency chains EXECUTION: - Respect topological order - Handle errors without blocking entire workflow OUTPUT: - Comprehensive synthesis - Clear presentation - Actionable insights """ ) ``` ## Advanced Usage ### Parallel Execution Simulation ```python # Current implementation executes sequentially # But DAG enables parallel execution in production class ParallelLLMCompilerAgent(LLMCompilerAgent): def _executor_dispatch(self, state): """Override to add parallel execution""" import concurrent.futures graph = state["execution_graph"] results = state["node_results"] with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: while not self._all_complete(graph, results): # Find ready nodes ready_nodes = self._get_ready_nodes(graph, results) if not ready_nodes: break # Submit all ready nodes to executor futures = { executor.submit( self._execute_tool, node["tool"], node["args"], results ): node for node in ready_nodes } # Collect results for future in concurrent.futures.as_completed(futures): node = futures[future] result = future.result() results[node["id"]] = result state["node_results"] = results return state agent = ParallelLLMCompilerAgent(llm_configs=llm_configs, tools=tools) ``` ### DAG Visualization ```python class VisualizingLLMCompilerAgent(LLMCompilerAgent): def run(self, input_data): """Override to visualize DAG""" result = super().run(input_data) # Access DAG (would need to store during execution) print("\n=== Execution DAG ===") self._print_dag() return result def _print_dag(self): """Print DAG structure""" # Implementation to visualize the execution graph pass agent = VisualizingLLMCompilerAgent(llm_configs=llm_configs, tools=tools) ``` ## Performance Considerations ### Cost Analysis LLM Compiler cost: - Plan DAG: 1 LLM call - Execute tools: 0 LLM calls (just tool execution) - Synthesize: 1 LLM call - **Total: 2 LLM calls** (like REWOO) **Efficiency gains**: - Parallel execution reduces wall-clock time - Only 2 LLM calls regardless of tool count - Optimal execution order minimizes waste ### When LLM Compiler Excels ✅ **Many independent tools**: Parallel execution shines ✅ **Complex dependencies**: DAG handles correctly ✅ **Time-sensitive**: Parallelism speeds up execution ✅ **Clear structure**: Can plan DAG upfront ### When to Use Other Patterns | Scenario | Better Pattern | Reason | |----------|---------------|---------| | Dynamic workflow | ReAct | Can't plan DAG upfront | | Simple sequence | Plan & Solve | DAG overhead unnecessary | | No tools | Self-Discovery, Reflection | LLM Compiler needs tools | | Unknown dependencies | ReAct | Adaptive approach better | ## Comparison with Other Patterns | Aspect | LLM Compiler | REWOO | ReAct | |--------|--------------|-------|-------| | **Planning** | DAG construction | Linear with placeholders | Adaptive | | **Execution** | Topological order | Sequential | Iterative | | **Parallelism** | Explicit support | No | No | | **LLM Calls** | 2 (fixed) | 2 (fixed) | N + 1 | | **Dependencies** | Explicit in DAG | Implicit in placeholders | Adaptive | | **Best For** | Complex workflows | Batch operations | Dynamic exploration | ## Common Pitfalls ### 1. Circular Dependencies ❌ **Bad**: Creating cycles in DAG ```python # NODE: node1 depends on node2 # NODE: node2 depends on node1 # → Impossible to execute! ``` ✅ **Good**: Acyclic dependencies ```python # NODE: node1 depends on [] # NODE: node2 depends on [node1] # NODE: node3 depends on [node1, node2] ``` ### 2. Missing Dependencies ❌ **Bad**: Not specifying required dependencies ```python # NODE: node2 uses #node1 in args # DEPENDS_ON: [] # Missing node1! ``` ✅ **Good**: Explicit dependencies ```python # NODE: node2 uses #node1 in args # DEPENDS_ON: [node1] # ✅ Correct ``` ### 3. Over-Sequencing ❌ **Bad**: Making everything depend on everything ✅ **Good**: Only specify actual dependencies ```python # If node2 and node3 are independent: # NODE: node2 DEPENDS_ON: [] # NODE: node3 DEPENDS_ON: [] # → Can execute in parallel! ``` ### 4. Incorrect Reference Syntax ❌ **Bad**: Wrong reference format ```python # ARGS: {"data": "node1"} # Missing # ``` ✅ **Good**: Correct reference ```python # ARGS: {"data": "#node1"} # ✅ Will be resolved ``` ## Troubleshooting ### DAG Parsing Failures **Symptom**: Can't extract DAG from plan **Solutions**: ```python # Strengthen PlanGraph prompt format overrides = { "PlanGraph": { "user": """... STRICT FORMAT (follow exactly): NODE: node1 TOOL: tool_name ARGS: {"param": "value"} DEPENDS_ON: [] NODE: node2 TOOL: tool_name ARGS: {"param": "#node1"} DEPENDS_ON: [node1] (Blank line between nodes) Your DAG:""" } } ``` ### Execution Hangs **Symptom**: Some nodes never execute **Solutions**: ```python # Check for: # 1. Circular dependencies (impossible to resolve) # 2. Missing tools (can't execute) # 3. Incorrect dependency specification # Add validation: class ValidatingLLMCompilerAgent(LLMCompilerAgent): def _planner_generate_graph(self, state): state = super()._planner_generate_graph(state) # Validate DAG if self._has_cycles(state["execution_graph"]): state["error"] = "Circular dependencies detected" return state ``` ### Poor Parallelism **Symptom**: Nodes execute sequentially despite being independent **Solutions**: ```python # Emphasize parallelism in planning custom_instructions = """ DAG PLANNING: When creating the DAG, actively look for opportunities for parallel execution. If two nodes don't depend on each other, they should have independent DEPENDS_ON lists. """ ``` ## Next Steps - Try the [complete examples](../examples/llm-compiler-examples.md) - Learn about [REWOO](rewoo.md) for simpler batch execution - Explore [ReAct](react.md) for dynamic tool workflows - Read the [original paper](https://arxiv.org/abs/2312.04511) ## References - Original paper: [An LLM Compiler for Parallel Function Calling](https://arxiv.org/abs/2312.04511) - DAG concepts: [Directed Acyclic Graphs](https://en.wikipedia.org/wiki/Directed_acyclic_graph) - Topological sorting: [Topological Sort](https://en.wikipedia.org/wiki/Topological_sorting) - Compiler optimization techniques applied to LLM workflows