Glossary
This glossary defines terms used across both the Beginner and Advanced GenAI courses. It is a quick reference, not a replacement for the full explanations in each lesson.
A
Agent
A system that uses an LLM to decide which actions to take, can use tools, maintains state across calls, and can operate with varying levels of autonomy.
Related: Beginner Lesson 1: Use cases, models, and the LLM app lifecycle
Agentic RAG
A retrieval-augmented generation pattern where the agent decides what to retrieve, when to retrieve, and whether to retrieve again. Contrast with 2-step RAG where retrieval happens once before generation.
Related: Advanced Lesson 4: Knowledge systems and advanced RAG
Attention
A mechanism in transformer models that allows each token to attend to all other tokens in the sequence, enabling the model to understand relationships and context.
Related: Shared: Transformer basics
B
Bounded Autonomy
Designing agent behavior so that it operates within explicit limits (step budgets, tool restrictions, approval gates) rather than running unbounded until completion.
Related: Advanced Lesson 2: Single-agent runtime and bounded autonomy
BPE (Byte Pair Encoding)
A common tokenization algorithm that splits text into subword units based on frequency. Most modern LLMs use BPE or similar algorithms.
Related: Shared: Tokenization and context windows
C
Checkpoint
A snapshot of agent state that enables resuming execution after interruption. Checkpoints store the full conversation history and any intermediate variables.
Related: Beginner Lesson 5: State, memory, threads, and streaming
Chunk
A segment of a document created during the ingestion process. Chunk size and overlap affect retrieval quality.
Related: Shared: Chunking and retrieval primitives
Citation
A reference to the source document or passage used to ground an LLM's response. Citations increase trust and enable fact-checking.
Related: Beginner Lesson 4: Retrieval, grounding, and citations
Context Window
The maximum number of tokens (input + output) that an LLM can process in a single request. Context window size limits how much information you can pass to the model.
Related: Shared: Tokenization and context windows
Cosine Similarity
A measure of similarity between two vectors, ranging from -1 to 1. In retrieval, documents with high cosine similarity to a query are considered relevant.
Related: Shared: Embeddings and similarity
D
Deterministic Workflow
A predefined sequence of steps with no LLM decision-making. Contrast with agentic systems where the model chooses actions.
Related: Advanced Lesson 1: Agentic product fit and system boundaries
Durable Execution
An execution model where long-running tasks survive server restarts and can be resumed from the last checkpoint.
Related: Advanced Lesson 7: Memory, checkpoints, artifacts, and durable execution
E
Embedding
A numerical representation of text (or other data) as a vector of numbers. Semantically similar texts have embeddings that are close in vector space.
Related: Shared: Embeddings and similarity
Eval (Evaluation)
A systematic test of LLM output quality. Evals can be golden dataset comparisons, regression tests, or human ratings.
Related: Beginner Lesson 7: Evals, safety, cost, and release
F
Fan-out / Fan-in
A pattern where one agent distributes work to multiple parallel agents (fan-out) and collects results (fan-in). Common in manager-specialist architectures.
Related: Advanced Lesson 5: Router, manager, and specialist patterns
G
Grounding
Ensuring an LLM's output is based on reliable information. Grounding techniques include retrieval from knowledge bases, citations, and structured constraints.
Related: Beginner Lesson 4: Retrieval, grounding, and citations
H
Handoff
The transfer of control or context between agents. Handoffs can be peer-to-peer (equal agents transfer conversation) or hierarchical (manager delegates to specialist).
Related: Advanced Lesson 6: Handoffs, human review, and control surfaces
Human-in-the-loop
Patterns where a human approves, edits, or interrupts agent actions. Requires durable state and clear interrupt semantics.
Related: Advanced Lesson 6: Handoffs, human review, and control surfaces
Hybrid RAG
A retrieval strategy that combines vector similarity search with keyword/exact-match search for better coverage.
Related: Advanced Lesson 4: Knowledge systems and advanced RAG
I
Instruction Hierarchy
The precedence order for different instruction types. System instructions typically take priority over user instructions, which take priority over examples.
Related: Shared: Prompt and output patterns cheatsheet
M
MCP (Model Context Protocol)
A standard protocol for connecting AI models to external tools and data sources. AgentFlow supports MCP-style integration.
Related: Beginner Lesson 3: Tools, files, and MCP basics
Memory
Storage of conversation context or learned information. In AgentFlow, memory can be short-term (thread state) or long-term (persistent store).
Related: Beginner Lesson 5: State, memory, threads, and streaming
Multimodal
The ability to process multiple types of input (text, images, audio, video) or generate multiple types of output. Contrast with text-only models.
Related: Beginner Lesson 6: Multimodal and client/server integration
N
Next-token Prediction
The core task LLMs are trained on: predicting the most likely next token given all previous tokens. This probabilistic nature is why outputs vary.
Related: Shared: LLM basics for engineers
P
Positional Encoding
A mechanism that gives transformer models information about token positions. Without it, attention would be position-agnostic.
Related: Shared: Transformer basics
Prompt Injection
A security risk where an attacker embeds malicious instructions in user input or retrieved documents to manipulate LLM behavior.
Related: Beginner Lesson 7: Evals, safety, cost, and release
Prompt Template
A parameterized prompt structure where variables are replaced at runtime. Templates enable reusable prompt engineering.
Related: Beginner Lesson 2: Prompting, context engineering, and structured outputs
R
RAG (Retrieval-Augmented Generation)
A pattern where an LLM retrieves relevant documents before generating a response. RAG helps ground outputs in specific knowledge.
Related: Beginner Lesson 4: Retrieval, grounding, and citations, Advanced Lesson 4: Knowledge systems and advanced RAG
ReAct (Reasoning + Acting)
A agent pattern where the model alternates between reasoning about the current state and taking actions. Popular for tool-use agents.
Related: Advanced Lesson 2: Single-agent runtime and bounded autonomy
Reranking
A secondary ranking step that reorders retrieved documents using a more expensive but accurate model.
Related: Shared: Chunking and retrieval primitives
S
Schema Validation
Checking that LLM output conforms to an expected structure. Structured outputs rely on schema validation for reliability.
Related: Beginner Lesson 2: Prompting, context engineering, and structured outputs
Specialist
A focused agent designed for a specific task or domain. Specialists are often used in manager-specialist architectures.
Related: Advanced Lesson 5: Router, manager, and specialist patterns
State
The data that persists across agent interactions. In AgentFlow, state includes conversation history, variables, and tool results.
Related: Beginner Lesson 5: State, memory, threads, and streaming
Streaming
A response pattern where tokens are sent incrementally to the client, improving perceived latency for long responses.
Related: Beginner Lesson 5: State, memory, threads, and streaming
Structured Output
LLM responses constrained to a specific schema (JSON, XML, enum values). Structured outputs improve reliability over freeform text.
Related: Beginner Lesson 2: Prompting, context engineering, and structured outputs
T
Thread
A persistent conversation context identified by a unique ID. Threads enable conversation continuity and state restoration.
Related: Beginner Lesson 5: State, memory, threads, and streaming
Token
The basic unit of text processed by an LLM. Tokens are not exactly words—a token is typically a few characters or a fraction of a word.
Related: Shared: Tokenization and context windows
Tool
A callable function that extends LLM capabilities beyond text generation. Tools enable agents to interact with external systems.
Related: Beginner Lesson 3: Tools, files, and MCP basics
Top-k Retrieval
Returning the k most similar documents from a vector store based on embedding similarity.
Related: Shared: Chunking and retrieval primitives
Transformer
The neural network architecture underlying modern LLMs. Transformers use self-attention to process sequences in parallel.
Related: Shared: Transformer basics
V
Vector Store
A database optimized for storing and searching embeddings. Common vector stores include Qdrant, Pinecone, and pgvector.
Related: Shared: Embeddings and similarity
2
2-step RAG
The classic retrieval-augmented generation pattern: retrieve relevant documents, then generate a response using those documents as context. Contrast with agentic RAG.
Related: Beginner Lesson 4: Retrieval, grounding, and citations, Advanced Lesson 4: Knowledge systems and advanced RAG