Skip to main content

Error Patterns Guide

Use this guide when you encounter a specific symptom and need to identify the root cause and fix.

Symptom-to-Error Mapping


Infinite Loops & Recursion

Symptom: "Recursion limit exceeded" or "Max iterations reached"

Error Code: RECURSION_000

Symptoms:

  • Graph runs forever until hitting limit
  • Logs show repeated identical or similar tool calls
  • Response never completes

Root Causes:

CauseDescription
Tool selection loopModel keeps calling the same tool with same inputs
Routing loopConditional edges create infinite path
Missing END conditionGraph never terminates
Self-referential toolTool indirectly calls itself

Debugging Steps:

  1. Enable verbose logging to see the loop pattern
  2. Check if same tool is called repeatedly
  3. Verify routing conditions are not self-referential
  4. Confirm END condition exists for all paths

Fix:

# Check your graph routing
graph.add_conditional_edges(
"AGENT",
route_by_response, # Make sure this terminates
{
"continue": "TOOL",
"done": END, # This must be reachable
}
)

# Add recursion limit
app = graph.compile(recursion_limit=50)

Storage & Persistence

Symptom: "Thread not found" or "Resource not found"

Error Code: STORAGE_NOT_FOUND_000

Symptoms:

  • API returns 404 for thread operations
  • Conversation history is empty
  • Checkpoint retrieval fails

Root Causes:

CauseDescription
Wrong thread_idTypo or stale ID
Thread deletedThread was manually removed
No checkpointerStorage not configured
Checkpoint expiredTTL exceeded

Debugging Steps:

  1. Verify thread_id is correct and valid
  2. Check if checkpointer is configured in agentflow.json
  3. Verify storage backend is accessible
  4. Check for checkpoint TTL settings

Fix:

# List available threads
curl http://localhost:8000/v1/threads

# Check thread state
curl http://localhost:8000/v1/threads/{thread_id}/state

Symptom: "Connection timeout" or "Temporary failure"

Error Code: STORAGE_TRANSIENT_000

Symptoms:

  • Intermittent failures
  • Operations fail under load
  • Database connection errors

Root Causes:

CauseDescription
Network latencySlow connection to storage backend
Lock contentionHigh concurrency causing locks
Resource exhaustionMemory or connection pool limits

Debugging Steps:

  1. Check database connection health
  2. Monitor connection pool utilization
  3. Review retry logic in your code
  4. Check for high concurrency issues

Fix: Implement retry with exponential backoff:

from agentflow.core.exceptions import TransientStorageError

async def retry_operation(operation, max_retries=3):
for attempt in range(max_retries):
try:
return await operation()
except TransientStorageError:
if attempt == max_retries - 1:
raise
await asyncio.sleep(2 ** attempt) # 1s, 2s, 4s

Symptom: "Failed to serialize" or "Invalid data format"

Error Code: STORAGE_SERIALIZATION_000

Symptoms:

  • Checkpoint save fails
  • State restoration fails
  • Messages cannot be decoded

Root Causes:

CauseDescription
Schema mismatchState schema changed without migration
Corrupt dataCheckpoint data is corrupted
Unsupported typeState contains non-serializable object

Debugging Steps:

  1. Check schema version compatibility
  2. Verify checkpoint data integrity
  3. Review state schema for unsupported types
  4. Check logs for specific serialization failure

Fix:

# Ensure state uses serializable types
class AgentState(TypedDict):
messages: list[str] # Use JSON-serializable types
context: dict[str, Any]
# Avoid: functions, custom classes, open connections

Symptom: "Schema version mismatch" or "Migration required"

Error Code: STORAGE_SCHEMA_000

Symptoms:

  • "Schema out of date" errors
  • Migration warnings on startup
  • Checkpoint operations fail after upgrade

Root Causes:

CauseDescription
Upgrade without migrationAgentFlow upgraded, DB not migrated
Version mismatchCode and database out of sync
Corrupt version tableSchema version tracking corrupted

Debugging Steps:

  1. Check AgentFlow version
  2. Verify database schema version
  3. Look for migration warnings in logs

Fix: Run migrations after upgrading AgentFlow:

# Check current schema version
# Look for migration output during upgrade

# After upgrading AgentFlow
pip install --upgrade agentflow
# Migrations should run automatically, or check docs for manual steps

Validation & Security

Symptom: "Input validation failed" or "Prompt injection detected"

Error Code: VALIDATION_000

Symptoms:

  • User input is rejected
  • "Bad word detected" errors
  • Security policy violations

Violation Types:

TypeDescriptionExample
prompt_injectionDirect or indirect injection"Ignore previous instructions"
jailbreakAttempt to bypass safetyRole-play to override system
content_policyPolicy violationBlocked content patterns
encoding_attackObfuscated contentBase64 encoded instructions
delimiter_confusionConflicting markersNested special characters
payload_splittingDistributed attackSplit across multiple inputs
system_leakPrompt extraction attempt"What are your instructions?"

Debugging Steps:

  1. Check validation logs for specific violation type
  2. Review user input that triggered the error
  3. Determine if input is legitimate or attack
  4. Adjust validation strictness if needed

Fix:

# Disable strict mode for development (not recommended for production)
from agentflow.utils.callbacks import CallbackManager
from agentflow.utils.validators import PromptInjectionValidator

callback_manager = CallbackManager()
validator = PromptInjectionValidator(strict_mode=False) # Logs but doesn't block
callback_manager.register_input_validator(validator)

Media & Files

Symptom: "Model does not support media" or "Unsupported media input"

Error Code: MEDIA_000

Symptoms:

  • Image/video/audio input fails
  • Model-specific capability errors
  • Media transport mode errors

Root Causes:

CauseDescription
Model lacks capabilitye.g., non-vision model for images
Wrong source typeURL not supported, use file_id
Transport failureAll transport modes failed

Supported Capabilities by Model:

ModelVisionAudioDocument
gpt-4oYesNoYes
gpt-4o-miniYesNoYes
gpt-4-turboYesNoYes
gemini-1.5-proYesYesYes
gemini-1.5-flashYesYesYes
claude-3-opusYesNoNo
claude-3-sonnetYesNoNo

Debugging Steps:

  1. Check model capabilities
  2. Verify input source type (URL vs file_id)
  3. Check transport modes tried
  4. Consider using a different model

Fix:

# Use a vision-capable model for images
agent = Agent(
model="gpt-4o", # Supports vision
# ...
)

# Or upload file and use file_id
from agentflow.storage.media import MediaStorage
media = MediaStorage()
file_id = await media.upload(file_path)
# Use file_id in message instead of URL

Node & Tool Execution

Symptom: "Node execution failed" or "Tool error"

Error Code: NODE_000

Symptoms:

  • Tool call returns error
  • Node operation fails
  • Partial execution before failure

Root Causes:

CauseDescription
Tool runtime errorException in tool function
Invalid tool inputTool received malformed parameters
Tool timeoutTool exceeded time limit
Missing dependenciesTool requires unavailable resource

Debugging Steps:

  1. Check tool error message in logs
  2. Verify tool function signature
  3. Test tool in isolation
  4. Review tool timeout settings

Fix:

from agentflow.core.exceptions import NodeError

try:
result = await tool.execute(input_data)
except NodeError as e:
# Handle node-specific error
logger.error(f"Node error: {e.error_code}")
except Exception as e:
# Handle unexpected errors
raise NodeError(
message=f"Tool execution failed: {str(e)}",
context={"tool_name": tool.name}
)

Quick Reference Tables

Error by Symptom

SymptomError CodeAction
Infinite loopRECURSION_000Add recursion limit, fix routing
Not foundSTORAGE_NOT_FOUND_000Verify thread_id, check storage
TimeoutSTORAGE_TRANSIENT_000Retry with backoff
Serialization failSTORAGE_SERIALIZATION_000Fix state schema
Schema mismatchSTORAGE_SCHEMA_000Run migrations
Validation blockedVALIDATION_000Check input, adjust validators
Media unsupportedMEDIA_000Use capable model
Node errorNODE_000Check tool implementation
Graph errorGRAPH_000Review graph configuration

Error by HTTP Status

StatusLikely ErrorCode
400ValidationVALIDATION_000
404Not foundSTORAGE_NOT_FOUND_000
429Rate limitSTORAGE_TRANSIENT_000
500Graph/NodeGRAPH_000, NODE_000
503TransientSTORAGE_TRANSIENT_000

Common Fixes

Always

  • Check logs first for specific error messages
  • Note the error code for programmatic handling
  • Use to_dict() for structured error inspection

For Recursion Issues

# Add explicit termination
app = graph.compile(recursion_limit=50)

# Add timeout
app = graph.compile(recursion_limit=50, execution_timeout=60)

For Storage Issues

# Configure proper checkpointer
app = graph.compile(
checkpointer=PostgresCheckpointer(connection_string=dsn)
)

# Handle transient errors
@retry(attempts=3, backoff=2)
async def save_checkpoint():
await checkpointer.save(thread_id, state)

For Validation Issues

# For false positives, add to allowlist
validator = PromptInjectionValidator(
allowlist_patterns=[r"ignore previous"] # If legitimate
)

# Or disable strict mode for specific inputs
validator = PromptInjectionValidator(strict_mode=False)