Design Checklists
Use these checklists to make sure your GenAI system design is complete before implementation.
Use Case Fit Checklist
Before starting any GenAI project, answer these questions:
Problem Fit
- Is the task mostly deterministic or does it require judgment?
- Would a rule-based system work? If not, why?
- Is the task something LLMs are good at (generation, summarization, classification)?
- Is the task something LLMs struggle with (precise calculations, real-time data)?
LLM vs. Workflow Decision
- Use a workflow if: Task is mostly sequential, steps are known, no real judgment needed
- Use a single agent if: Task has branching, needs tool use, context-dependent decisions
- Use multiple agents if: Tasks require different expertise, parallel processing needed, or complex handoffs
Output Requirements
- Do you need structured output (JSON, specific format)?
- Is freeform text acceptable?
- Do you need citations or source tracking?
- Are there latency requirements?
Model Selection Checklist
Capability Requirements
- What modality do you need? (text only, images, audio, video)
- Do you need function calling / tool use support?
- Do you need structured outputs?
- What context window size do you need?
Cost and Latency
- What is your latency budget? (real-time < 1s, async < 30s, batch acceptable)
- What is your cost ceiling per request?
- Can you cache prompts to reduce costs?
- Is latency or cost more critical for your use case?
Quality Requirements
- What quality level is acceptable?
- Do you need reasoning capabilities (o1, o3, Claude extended thinking)?
- Have you tested multiple models on your specific use case?
Prompt Design Checklist
Structure
- Is the system instruction clear and specific?
- Are there explicit format instructions?
- Is the task clearly defined at the end?
- Have you placed important instructions at the beginning and end?
Examples
- Do you have few-shot examples for complex formats?
- Are examples representative of edge cases?
- Are examples in the same format you expect?
Context Management
- Is all necessary context included?
- Is unnecessary context removed?
- Is the most important information last (recency effect)?
Tool Design Checklist
Safety
- Are destructive operations (DELETE, DROP) restricted?
- Is PII handling prohibited or limited?
- Are there rate limits or quotas?
- Can you trace tool calls to audit logs?
Design Quality
- Do tool names clearly describe their function?
- Are parameters typed and constrained?
- Are descriptions explicit about preconditions and side effects?
- Can tools be called idempotently?
Error Handling
- What happens when a tool fails?
- Can the agent retry, or is manual intervention needed?
- Are error messages user-friendly?
Retrieval Design Checklist
Data Preparation
- Is your data clean and well-structured?
- Are chunk boundaries semantically coherent?
- Do chunks have metadata for filtering and citation?
- Is the data fresh enough for your use case?
Retrieval Strategy
- Have you chosen an appropriate chunk size?
- Do you need hybrid search (vector + keyword)?
- Do you need reranking?
- How many chunks do you retrieve?
Grounding
- Does the model cite sources?
- Can users verify citations?
- Is hallucination risk mitigated?
State and Memory Checklist
State Management
- What state needs to persist across interactions?
- Is state stored in thread or in external memory?
- Do you need checkpointing for resumability?
Memory Design
- What memories should be short-term (thread)?
- What memories should be long-term (store)?
- When should memories be summarized vs. stored raw?
Streaming
- Should responses be streamed for better UX?
- Can the client handle streaming?
- Do you need to update UI during generation?
Multi-Agent Design Checklist
Architecture Decision
- Have you justified why single agent isn't sufficient?
- Is the task complex enough to warrant multiple agents?
- Can a router classify and delegate tasks?
Handoff Design
- Are handoff points clearly defined?
- Does context transfer correctly between agents?
- Is failure handling defined for handoff failures?
Human-in-the-Loop
- Are there approval gates for risky actions?
- Can users interrupt and correct agent behavior?
- Is checkpointing enabled for resumability?
Evaluation Checklist
Test Coverage
- Do you have golden dataset examples?
- Are edge cases covered?
- Are failure modes tested?
Quality Metrics
- Have you defined success criteria?
- Can you measure accuracy, relevance, or task completion?
- Do you have automated regression tests?
Monitoring
- Can you track output quality over time?
- Are there alerts for quality degradation?
- Can you sample and review outputs?
Security Checklist
Prompt Injection
- Can user input manipulate system behavior?
- Are retrieved documents sanitized?
- Do you validate and constrain outputs?
Access Control
- Is authentication required?
- Are there authorization levels?
- Can you audit who accessed what?
Data Safety
- Is PII handled appropriately?
- Are API keys and secrets protected?
- Is sensitive data logged?
Deployment Checklist
Infrastructure
- Do you have deployment documentation?
- Is the system containerized?
- Are there rollback procedures?
Monitoring
- Are there logs for debugging?
- Can you trace requests end-to-end?
- Are there cost and usage alerts?
Reliability
- Is there retry logic for transient failures?
- Are there circuit breakers for downstream failures?
- Is there a runbook for common issues?
Quick Reference: Decision Tree
Start: Do you need an LLM?
│
├─ No → Use traditional software
│
└─ Yes → Is the task deterministic?
│
├─ Yes → Is it sequential?
│ ├─ Yes → Workflow (no LLM needed)
│ └─ No → Use LLM for specific steps only
│
└─ No → Does it need tool use or context?
│
├─ No → Simple prompting + structured output
│
└─ Yes → Does it need multiple capabilities?
│
├─ No → Single agent
│
└─ Yes → Multiple agents with routing