Design Checklists

Use these checklists to make sure your GenAI system design is complete before implementation.

Use Case Fit Checklist

Before starting any GenAI project, answer these questions:

Problem Fit

Is the task mostly deterministic or does it require judgment?
Would a rule-based system work? If not, why?
Is the task something LLMs are good at (generation, summarization, classification)?
Is the task something LLMs struggle with (precise calculations, real-time data)?

LLM vs. Workflow Decision

Use a workflow if: Task is mostly sequential, steps are known, no real judgment needed
Use a single agent if: Task has branching, needs tool use, context-dependent decisions
Use multiple agents if: Tasks require different expertise, parallel processing needed, or complex handoffs

Output Requirements

Do you need structured output (JSON, specific format)?
Is freeform text acceptable?
Do you need citations or source tracking?
Are there latency requirements?

Model Selection Checklist

Capability Requirements

What modality do you need? (text only, images, audio, video)
Do you need function calling / tool use support?
Do you need structured outputs?
What context window size do you need?

Cost and Latency

What is your latency budget? (real-time < 1s, async < 30s, batch acceptable)
What is your cost ceiling per request?
Can you cache prompts to reduce costs?
Is latency or cost more critical for your use case?

Quality Requirements

What quality level is acceptable?
Do you need reasoning capabilities (o1, o3, Claude extended thinking)?
Have you tested multiple models on your specific use case?

Prompt Design Checklist

Structure

Is the system instruction clear and specific?
Are there explicit format instructions?
Is the task clearly defined at the end?
Have you placed important instructions at the beginning and end?

Examples

Do you have few-shot examples for complex formats?
Are examples representative of edge cases?
Are examples in the same format you expect?

Context Management

Is all necessary context included?
Is unnecessary context removed?
Is the most important information last (recency effect)?

Tool Design Checklist

Safety

Are destructive operations (DELETE, DROP) restricted?
Is PII handling prohibited or limited?
Are there rate limits or quotas?
Can you trace tool calls to audit logs?

Design Quality

Do tool names clearly describe their function?
Are parameters typed and constrained?
Are descriptions explicit about preconditions and side effects?
Can tools be called idempotently?

Error Handling

What happens when a tool fails?
Can the agent retry, or is manual intervention needed?
Are error messages user-friendly?

Retrieval Design Checklist

Data Preparation

Is your data clean and well-structured?
Are chunk boundaries semantically coherent?
Do chunks have metadata for filtering and citation?
Is the data fresh enough for your use case?

Retrieval Strategy

Have you chosen an appropriate chunk size?
Do you need hybrid search (vector + keyword)?
Do you need reranking?
How many chunks do you retrieve?

Grounding

Does the model cite sources?
Can users verify citations?
Is hallucination risk mitigated?

State and Memory Checklist

State Management

What state needs to persist across interactions?
Is state stored in thread or in external memory?
Do you need checkpointing for resumability?

Memory Design

What memories should be short-term (thread)?
What memories should be long-term (store)?
When should memories be summarized vs. stored raw?

Streaming

Should responses be streamed for better UX?
Can the client handle streaming?
Do you need to update UI during generation?

Multi-Agent Design Checklist

Architecture Decision

Have you justified why single agent isn't sufficient?
Is the task complex enough to warrant multiple agents?
Can a router classify and delegate tasks?

Handoff Design

Are handoff points clearly defined?
Does context transfer correctly between agents?
Is failure handling defined for handoff failures?

Human-in-the-Loop

Are there approval gates for risky actions?
Can users interrupt and correct agent behavior?
Is checkpointing enabled for resumability?

Evaluation Checklist

Test Coverage

Do you have golden dataset examples?
Are edge cases covered?
Are failure modes tested?

Quality Metrics

Have you defined success criteria?
Can you measure accuracy, relevance, or task completion?
Do you have automated regression tests?

Monitoring

Can you track output quality over time?
Are there alerts for quality degradation?
Can you sample and review outputs?

Security Checklist

Prompt Injection

Can user input manipulate system behavior?
Are retrieved documents sanitized?
Do you validate and constrain outputs?

Access Control

Is authentication required?
Are there authorization levels?
Can you audit who accessed what?

Data Safety

Is PII handled appropriately?
Are API keys and secrets protected?
Is sensitive data logged?

Deployment Checklist

Infrastructure

Do you have deployment documentation?
Is the system containerized?
Are there rollback procedures?

Monitoring

Are there logs for debugging?
Can you trace requests end-to-end?
Are there cost and usage alerts?

Reliability

Is there retry logic for transient failures?
Are there circuit breakers for downstream failures?
Is there a runbook for common issues?

Quick Reference: Decision Tree

Start: Do you need an LLM?
│
├─ No → Use traditional software
│
└─ Yes → Is the task deterministic?
         │
         ├─ Yes → Is it sequential?
         │        ├─ Yes → Workflow (no LLM needed)
         │        └─ No → Use LLM for specific steps only
         │
         └─ No → Does it need tool use or context?
                 │
                 ├─ No → Simple prompting + structured output
                 │
                 └─ Yes → Does it need multiple capabilities?
                          │
                          ├─ No → Single agent
                          │
                          └─ Yes → Multiple agents with routing

Use Case Fit Checklist​

Problem Fit​

LLM vs. Workflow Decision​

Output Requirements​

Model Selection Checklist​

Capability Requirements​

Cost and Latency​

Quality Requirements​

Prompt Design Checklist​

Structure​

Examples​

Context Management​

Tool Design Checklist​

Safety​

Design Quality​

Error Handling​

Retrieval Design Checklist​

Data Preparation​

Retrieval Strategy​

Grounding​

State and Memory Checklist​

State Management​

Memory Design​

Streaming​

Multi-Agent Design Checklist​

Architecture Decision​

Handoff Design​

Human-in-the-Loop​

Evaluation Checklist​

Test Coverage​

Quality Metrics​

Monitoring​

Security Checklist​

Prompt Injection​

Access Control​

Data Safety​

Deployment Checklist​

Infrastructure​

Monitoring​

Reliability​

Quick Reference: Decision Tree​

Related Resources​

Use Case Fit Checklist

Problem Fit

LLM vs. Workflow Decision

Output Requirements

Model Selection Checklist

Capability Requirements

Cost and Latency

Quality Requirements

Prompt Design Checklist

Structure

Examples

Context Management

Tool Design Checklist

Safety

Design Quality

Error Handling

Retrieval Design Checklist

Data Preparation

Retrieval Strategy

Grounding

State and Memory Checklist

State Management

Memory Design

Streaming

Multi-Agent Design Checklist

Architecture Decision

Handoff Design

Human-in-the-Loop

Evaluation Checklist

Test Coverage

Quality Metrics

Monitoring

Security Checklist

Prompt Injection

Access Control

Data Safety

Deployment Checklist

Infrastructure

Monitoring

Reliability

Quick Reference: Decision Tree

Related Resources