Best Practices for Agent Patterns

Comprehensive guide to best practices, optimization strategies, common pitfalls, and production-ready patterns for building robust AI agent systems.

Overview

This guide consolidates learnings from:

  • Production deployments

  • Performance optimization

  • Cost management

  • Quality assurance

  • Security considerations

  • Maintainability

Design Principles

1. Choose the Right Pattern for the Task

Decision Framework:

def select_pattern(task_characteristics):
    """Select appropriate pattern based on task."""

    if task_characteristics["requires_tools"]:
        if task_characteristics["cost_sensitive"]:
            return "REWOO"  # Plan all tool calls upfront
        else:
            return "ReAct"  # Iterative tool use

    elif task_characteristics["requires_learning"]:
        return "Reflexion"  # Learn from failures

    elif task_characteristics["requires_quality"]:
        return "Reflection"  # Iterative refinement

    elif task_characteristics["requires_research"]:
        return "STORM"  # Multi-perspective research

    elif task_characteristics["complex_reasoning"]:
        return "SelfDiscovery"  # Adaptive strategies

    else:
        return "Reflection"  # General purpose

2. Start Simple, Then Optimize

# Phase 1: Proof of Concept
agent = ReflectionAgent(
    llm_configs=simple_config,
    max_reflection_cycles=1
)

# Phase 2: Production MVP
agent = ReflectionAgent(
    llm_configs=production_config,
    max_reflection_cycles=2,
    custom_instructions=domain_guidelines
)

# Phase 3: Optimized Production
agent = ReflectionAgent(
    llm_configs=optimized_config,
    max_reflection_cycles=2,
    custom_instructions=domain_guidelines,
    prompt_overrides=tested_overrides
)

3. Separate Concerns

# Good: Separation of concerns
class AgentOrchestrator:
    """Orchestrates agent execution with separate responsibilities."""

    def __init__(self):
        self.agent = self._create_agent()
        self.validator = InputValidator()
        self.logger = StructuredLogger()
        self.metrics = MetricsCollector()
        self.cache = ResultCache()

    def execute(self, task: str) -> str:
        # Validate
        validated_task = self.validator.validate(task)

        # Check cache
        cached = self.cache.get(task)
        if cached:
            return cached

        # Execute
        self.logger.log_start(task)
        self.metrics.start_timer()

        try:
            result = self.agent.run(validated_task)

            self.metrics.record_success()
            self.logger.log_success(result)
            self.cache.set(task, result)

            return result

        except Exception as e:
            self.metrics.record_failure()
            self.logger.log_error(e)
            raise

Prompt Engineering

1. Be Specific and Clear

Bad: Vague prompts

custom_instructions = "Be helpful and accurate"

Good: Specific, actionable guidelines

custom_instructions = """
Guidelines:
1. Cite sources for all factual claims
2. Use bullet points for lists of 3+ items
3. Define technical terms on first use
4. Provide concrete examples
5. Acknowledge uncertainties explicitly
"""

2. Use Hierarchical Goals

custom_instructions = """
PRIMARY GOALS (non-negotiable):
1. Accuracy: All facts must be verifiable
2. Safety: Never provide harmful information

SECONDARY GOALS (important):
3. Clarity: 8th grade reading level
4. Completeness: Address all aspects of question

TERTIARY GOALS (nice-to-have):
5. Brevity: Concise without sacrificing clarity
6. Examples: Concrete examples for abstract concepts
"""

3. Provide Examples in Prompts

prompt_override = {
    "Generate": {
        "system": """Generate structured responses.

Example format:
## Summary
- Key point 1
- Key point 2

## Details
Detailed explanation...

## Conclusion
Final thoughts...""",
        "user": "Topic: {task}\n\nProvide structured response:"
    }
}

4. Test Prompts Systematically

def test_prompt_variations(task, variations):
    """A/B test prompt variations."""

    results = {}

    for name, prompts in variations.items():
        agent = ReflectionAgent(
            llm_configs=test_config,
            prompt_overrides=prompts
        )

        result = agent.run(task)

        results[name] = {
            "result": result,
            "length": len(result),
            "quality_score": evaluate_quality(result)
        }

    return results

Configuration Management

1. Use Environment-Based Configs

# config.py
import os

class Config:
    """Environment-based configuration."""

    def __init__(self, environment: str = None):
        self.env = environment or os.getenv("ENVIRONMENT", "development")

    @property
    def llm_configs(self):
        if self.env == "production":
            return {
                "thinking": {
                    "provider": "openai",
                    "model": "gpt-4-turbo",
                    "temperature": 0.7,
                    "max_tokens": 2000
                }
            }
        else:
            return {
                "thinking": {
                    "provider": "openai",
                    "model": "gpt-3.5-turbo",
                    "temperature": 0.0,  # Deterministic for testing
                    "max_tokens": 1000
                }
            }

# Usage
config = Config()
agent = ReActAgent(llm_configs=config.llm_configs, tools=tools)

2. Version Your Configurations

# configs/v1.py
LLM_CONFIGS_V1 = {
    "thinking": {"provider": "openai", "model": "gpt-3.5-turbo"}
}

# configs/v2.py
LLM_CONFIGS_V2 = {
    "thinking": {"provider": "openai", "model": "gpt-4"}
}

# configs/v3.py (current)
LLM_CONFIGS_V3 = {
    "thinking": {
        "provider": "anthropic",
        "model": "claude-3-opus-20240229"
    }
}

# Usage
from configs.v3 import LLM_CONFIGS_V3 as LLM_CONFIGS

3. Centralize Domain Instructions

# domain_instructions.py
class DomainInstructions:
    """Centralized domain-specific instructions."""

    MEDICAL = """
    DOMAIN: Medical
    - Cite evidence-based sources
    - Include medical disclaimers
    - Recommend professional consultation
    Version: 2.1
    Updated: 2024-01-15
    """

    LEGAL = """
    DOMAIN: Legal
    - Specify jurisdiction
    - Include legal disclaimers
    - Cite laws and regulations
    Version: 1.3
    Updated: 2024-01-10
    """

    FINANCIAL = """
    DOMAIN: Financial
    - Discuss risks comprehensively
    - Include regulatory disclosures
    - No specific investment advice
    Version: 1.5
    Updated: 2024-01-20
    """

# Usage
agent = SelfDiscoveryAgent(
    llm_configs=configs,
    custom_instructions=DomainInstructions.MEDICAL
)

Error Handling

1. Always Use Try-Catch

def safe_run(agent, task, fallback="Service unavailable"):
    """Run agent with comprehensive error handling."""

    try:
        return agent.run(task)

    except RateLimitError as e:
        logger.warning(f"Rate limit hit: {e}")
        time.sleep(60)  # Wait and retry
        return agent.run(task)

    except Timeout as e:
        logger.error(f"Timeout: {e}")
        return fallback

    except ValidationError as e:
        logger.error(f"Validation error: {e}")
        raise  # Re-raise validation errors

    except Exception as e:
        logger.error(f"Unexpected error: {e}", exc_info=True)
        return fallback

2. Implement Circuit Breakers

class CircuitBreaker:
    """Circuit breaker for agent calls."""

    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.failures = 0
        self.last_failure = None
        self.state = "closed"

    def call(self, func, *args, **kwargs):
        if self.state == "open":
            if time.time() - self.last_failure > self.timeout:
                self.state = "half-open"
            else:
                raise Exception("Circuit breaker OPEN")

        try:
            result = func(*args, **kwargs)

            if self.state == "half-open":
                self.state = "closed"
                self.failures = 0

            return result

        except Exception as e:
            self.failures += 1
            self.last_failure = time.time()

            if self.failures >= self.failure_threshold:
                self.state = "open"

            raise

breaker = CircuitBreaker()
result = breaker.call(agent.run, task)

3. Graceful Degradation

def run_with_degradation(task):
    """Try progressively simpler approaches."""

    strategies = [
        ("complex", create_complex_agent),
        ("standard", create_standard_agent),
        ("simple", create_simple_agent),
        ("fallback", create_fallback_response)
    ]

    for name, strategy in strategies:
        try:
            logger.info(f"Trying {name} strategy")
            return strategy()(task)

        except Exception as e:
            logger.warning(f"{name} strategy failed: {e}")
            continue

    return "All strategies failed. Please try again later."

Performance Optimization

1. Cache Aggressively

from functools import lru_cache
import hashlib

class CachedAgent:
    """Agent with caching."""

    def __init__(self, agent):
        self.agent = agent
        self.cache = {}

    def run(self, task: str, use_cache: bool = True) -> str:
        if not use_cache:
            return self.agent.run(task)

        cache_key = hashlib.sha256(task.encode()).hexdigest()

        if cache_key in self.cache:
            logger.info("Cache hit")
            return self.cache[cache_key]

        result = self.agent.run(task)
        self.cache[cache_key] = result

        return result

2. Use Appropriate Models

def get_model_for_task(task_complexity: str):
    """Select model based on complexity."""

    models = {
        "simple": "gpt-3.5-turbo",      # Fast, cheap
        "moderate": "gpt-4",             # Balanced
        "complex": "gpt-4-turbo",        # Best quality
        "creative": "gpt-4",             # High temperature
        "analytical": "claude-3-opus"    # Strong reasoning
    }

    return models.get(task_complexity, "gpt-4")

3. Limit Iterations Appropriately

# Bad: Unlimited iterations
agent = ReActAgent(llm_configs=configs, tools=tools, max_iterations=100)

# Good: Reasonable limits
agent = ReActAgent(llm_configs=configs, tools=tools, max_iterations=5)

# Better: Dynamic limits based on task
def get_max_iterations(task):
    if "simple" in task.lower():
        return 3
    elif "complex" in task.lower():
        return 10
    else:
        return 5

agent = ReActAgent(
    llm_configs=configs,
    tools=tools,
    max_iterations=get_max_iterations(task)
)

4. Parallel Processing When Possible

from concurrent.futures import ThreadPoolExecutor, as_completed

def process_batch(tasks, agent):
    """Process multiple tasks in parallel."""

    results = {}

    with ThreadPoolExecutor(max_workers=5) as executor:
        future_to_task = {
            executor.submit(agent.run, task): task
            for task in tasks
        }

        for future in as_completed(future_to_task):
            task = future_to_task[future]
            try:
                results[task] = future.result()
            except Exception as e:
                logger.error(f"Task failed: {task[:50]}: {e}")
                results[task] = None

    return results

Cost Management

1. Monitor and Track Costs

def estimate_cost(input_text, output_text, model):
    """Estimate API cost."""

    PRICING = {
        "gpt-4": {"input": 0.03, "output": 0.06},
        "gpt-3.5-turbo": {"input": 0.0015, "output": 0.002},
        "claude-3-opus": {"input": 0.015, "output": 0.075}
    }

    input_tokens = len(input_text.split()) * 1.3
    output_tokens = len(output_text.split()) * 1.3

    pricing = PRICING.get(model, PRICING["gpt-4"])

    return (
        (input_tokens / 1000) * pricing["input"] +
        (output_tokens / 1000) * pricing["output"]
    )

class CostTrackingAgent:
    def __init__(self, agent, model):
        self.agent = agent
        self.model = model
        self.total_cost = 0.0

    def run(self, task):
        result = self.agent.run(task)

        cost = estimate_cost(task, result, self.model)
        self.total_cost += cost

        logger.info(f"Cost: ${cost:.4f}, Total: ${self.total_cost:.4f}")

        return result

2. Use Cost-Effective Patterns

# Expensive: ReAct with many iterations
expensive = ReActAgent(
    llm_configs={"thinking": {"model": "gpt-4"}},
    tools=tools,
    max_iterations=10  # Many LLM calls
)

# Cheaper: REWOO (plans all tools upfront)
cheaper = REWOOAgent(
    llm_configs={"thinking": {"model": "gpt-4"}},
    tools=tools  # Single planning call + tool execution
)

3. Implement Budgets

class BudgetedAgent:
    """Agent with cost budget."""

    def __init__(self, agent, budget: float):
        self.agent = agent
        self.budget = budget
        self.spent = 0.0

    def run(self, task):
        if self.spent >= self.budget:
            raise BudgetExceededError(
                f"Budget ${self.budget:.2f} exceeded (spent ${self.spent:.2f})"
            )

        result = self.agent.run(task)

        # Estimate and track cost
        cost = estimate_cost(task, result, "gpt-4")
        self.spent += cost

        return result

Testing

1. Test at Multiple Levels

# Unit tests: Test components in isolation
def test_agent_initialization():
    agent = ReflectionAgent(llm_configs=test_configs)
    assert agent.max_reflection_cycles == 1

# Integration tests: Test workflows with mocks
@patch.object(ReflectionAgent, "_get_llm")
def test_reflection_workflow(mock_llm):
    # Test full workflow
    pass

# E2E tests: Test with real LLMs (expensive)
@pytest.mark.slow
def test_real_llm():
    agent = ReflectionAgent(llm_configs=real_configs)
    result = agent.run("Test task")
    assert result is not None

2. Use Deterministic Settings for Tests

test_configs = {
    "thinking": {
        "provider": "openai",
        "model": "gpt-3.5-turbo",
        "temperature": 0.0,  # Deterministic!
        "max_tokens": 500
    }
}

3. Mock Expensive Operations

@patch.object(Agent, "_get_llm")
def test_with_mock(mock_get_llm):
    mock_llm = Mock()
    mock_llm.invoke.return_value = Mock(content="Test response")
    mock_get_llm.return_value = mock_llm

    agent = create_agent()
    result = agent.run("test")

    assert result == "Test response"

Security

1. Validate All Inputs

def validate_input(task: str, max_length: int = 10000):
    """Validate and sanitize input."""

    if not task or not task.strip():
        raise ValueError("Empty input")

    if len(task) > max_length:
        raise ValueError(f"Input too long (max {max_length})")

    # Check for injection attempts
    forbidden = ["<script>", "eval(", "exec(", "system("]
    for pattern in forbidden:
        if pattern in task.lower():
            raise ValueError(f"Forbidden pattern: {pattern}")

    return task.strip()

2. Sanitize Outputs

def sanitize_output(result: str):
    """Remove sensitive data from output."""

    import re

    # Remove API keys
    result = re.sub(r'sk-[a-zA-Z0-9]{48}', '[API_KEY_REDACTED]', result)

    # Remove credit cards
    result = re.sub(r'\d{16}', '[CC_REDACTED]', result)

    # Remove SSNs
    result = re.sub(r'\d{3}-\d{2}-\d{4}', '[SSN_REDACTED]', result)

    return result

3. Implement Rate Limiting

from collections import defaultdict
import time

class RateLimiter:
    def __init__(self, max_requests=100, window=3600):
        self.max_requests = max_requests
        self.window = window
        self.requests = defaultdict(list)

    def is_allowed(self, user_id: str) -> bool:
        now = time.time()
        window_start = now - self.window

        # Clean old requests
        self.requests[user_id] = [
            t for t in self.requests[user_id]
            if t > window_start
        ]

        if len(self.requests[user_id]) >= self.max_requests:
            return False

        self.requests[user_id].append(now)
        return True

Monitoring and Observability

1. Structured Logging

import logging
import json

logger = logging.getLogger(__name__)

def log_agent_execution(agent_type, task, result, duration):
    """Log agent execution with structured data."""

    logger.info(json.dumps({
        "event": "agent_execution",
        "agent_type": agent_type,
        "task_length": len(task),
        "result_length": len(result),
        "duration_ms": duration * 1000,
        "timestamp": time.time()
    }))

2. Metrics Collection

from prometheus_client import Counter, Histogram

request_count = Counter(
    'agent_requests_total',
    'Total agent requests',
    ['pattern', 'status']
)

request_duration = Histogram(
    'agent_duration_seconds',
    'Agent execution duration',
    ['pattern']
)

# Usage
with request_duration.labels(pattern="reflection").time():
    try:
        result = agent.run(task)
        request_count.labels(pattern="reflection", status="success").inc()
    except Exception:
        request_count.labels(pattern="reflection", status="error").inc()
        raise

3. Distributed Tracing

from opentelemetry import trace

tracer = trace.get_tracer(__name__)

def run_with_tracing(agent, task):
    """Run agent with distributed tracing."""

    with tracer.start_as_current_span("agent_execution") as span:
        span.set_attribute("pattern", type(agent).__name__)
        span.set_attribute("task_length", len(task))

        try:
            result = agent.run(task)
            span.set_attribute("success", True)
            return result
        except Exception as e:
            span.set_attribute("success", False)
            span.record_exception(e)
            raise

Common Pitfalls

1. Not Handling Rate Limits

Bad: No rate limit handling

result = agent.run(task)  # Crashes on rate limit

Good: Handle rate limits with retry

def run_with_retry(agent, task, max_retries=3):
    for i in range(max_retries):
        try:
            return agent.run(task)
        except RateLimitError:
            if i < max_retries - 1:
                time.sleep(2 ** i)  # Exponential backoff
            else:
                raise

2. Ignoring Costs

Bad: No cost awareness

agent = ReActAgent(
    llm_configs={"thinking": {"model": "gpt-4"}},
    max_iterations=100  # Could be very expensive!
)

Good: Cost-conscious configuration

agent = ReActAgent(
    llm_configs={"thinking": {"model": "gpt-3.5-turbo"}},
    max_iterations=5
)

3. No Input Validation

Bad: Raw user input

result = agent.run(user_input)  # Could be malicious

Good: Validated input

validated = validate_input(user_input)
result = agent.run(validated)

4. Poor Error Messages

Bad: Generic errors

except Exception:
    return "Error occurred"

Good: Informative errors

except RateLimitError:
    return "Rate limit exceeded. Please try again in 60 seconds."
except Timeout:
    return "Request timeout. Try a simpler query."
except Exception as e:
    logger.error(f"Unexpected error: {e}", exc_info=True)
    return "Unexpected error. Support has been notified."

5. Not Testing with Real Scenarios

Always test with realistic inputs:

test_cases = [
    ("simple query", "What is Python?"),
    ("complex query", "Explain quantum computing with examples"),
    ("edge case", "a" * 10000),  # Very long input
    ("empty", ""),
    ("special chars", "Test with émojis 🎉 and symbols @#$%")
]

for name, test_input in test_cases:
    try:
        result = agent.run(test_input)
        print(f"✓ {name}: Passed")
    except Exception as e:
        print(f"✗ {name}: Failed - {e}")

Production Checklist

Before deploying to production:

  • ✅ Input validation implemented

  • ✅ Output sanitization in place

  • ✅ Error handling comprehensive

  • ✅ Rate limiting configured

  • ✅ Cost tracking enabled

  • ✅ Logging structured

  • ✅ Metrics being collected

  • ✅ Caching implemented

  • ✅ Circuit breakers in place

  • ✅ Health checks working

  • ✅ Security review completed

  • ✅ Load testing done

  • ✅ Monitoring configured

  • ✅ Alerts set up

  • ✅ Documentation updated

Next Steps

  • Review specific guides for detailed information

  • Test thoroughly before production

  • Monitor and iterate on production data

  • Join community for latest practices

Summary

Golden Rules:

  1. Choose the right pattern for the task

  2. Start simple, optimize based on data

  3. Always validate inputs and sanitize outputs

  4. Implement comprehensive error handling

  5. Monitor costs and performance

  6. Test at multiple levels

  7. Use structured logging

  8. Cache aggressively

  9. Limit iterations appropriately

  10. Security first, always