Glossary

This glossary defines terms used across both the Beginner and Advanced GenAI courses. It is a quick reference, not a replacement for the full explanations in each lesson.

A

Agent

A system that uses an LLM to decide which actions to take, can use tools, maintains state across calls, and can operate with varying levels of autonomy.

Agentic RAG

A retrieval-augmented generation pattern where the agent decides what to retrieve, when to retrieve, and whether to retrieve again. Contrast with 2-step RAG where retrieval happens once before generation.

Attention

A mechanism in transformer models that allows each token to attend to all other tokens in the sequence, enabling the model to understand relationships and context.

Related: Shared: Transformer basics

B

Bounded Autonomy

Designing agent behavior so that it operates within explicit limits (step budgets, tool restrictions, approval gates) rather than running unbounded until completion.

BPE (Byte Pair Encoding)

A common tokenization algorithm that splits text into subword units based on frequency. Most modern LLMs use BPE or similar algorithms.

C

Checkpoint

A snapshot of agent state that enables resuming execution after interruption. Checkpoints store the full conversation history and any intermediate variables.

Chunk

A segment of a document created during the ingestion process. Chunk size and overlap affect retrieval quality.

Citation

A reference to the source document or passage used to ground an LLM's response. Citations increase trust and enable fact-checking.

Context Window

The maximum number of tokens (input + output) that an LLM can process in a single request. Context window size limits how much information you can pass to the model.

Cosine Similarity

A measure of similarity between two vectors, ranging from -1 to 1. In retrieval, documents with high cosine similarity to a query are considered relevant.

D

Deterministic Workflow

A predefined sequence of steps with no LLM decision-making. Contrast with agentic systems where the model chooses actions.

Durable Execution

An execution model where long-running tasks survive server restarts and can be resumed from the last checkpoint.

E

Embedding

A numerical representation of text (or other data) as a vector of numbers. Semantically similar texts have embeddings that are close in vector space.

Eval (Evaluation)

A systematic test of LLM output quality. Evals can be golden dataset comparisons, regression tests, or human ratings.

F

Fan-out / Fan-in

A pattern where one agent distributes work to multiple parallel agents (fan-out) and collects results (fan-in). Common in manager-specialist architectures.

G

Grounding

Ensuring an LLM's output is based on reliable information. Grounding techniques include retrieval from knowledge bases, citations, and structured constraints.

H

Handoff

The transfer of control or context between agents. Handoffs can be peer-to-peer (equal agents transfer conversation) or hierarchical (manager delegates to specialist).

Human-in-the-loop

Patterns where a human approves, edits, or interrupts agent actions. Requires durable state and clear interrupt semantics.

Hybrid RAG

A retrieval strategy that combines vector similarity search with keyword/exact-match search for better coverage.

I

Instruction Hierarchy

The precedence order for different instruction types. System instructions typically take priority over user instructions, which take priority over examples.

M

MCP (Model Context Protocol)

A standard protocol for connecting AI models to external tools and data sources. AgentFlow supports MCP-style integration.

Memory

Storage of conversation context or learned information. In AgentFlow, memory can be short-term (thread state) or long-term (persistent store).

Multimodal

The ability to process multiple types of input (text, images, audio, video) or generate multiple types of output. Contrast with text-only models.

N

Next-token Prediction

The core task LLMs are trained on: predicting the most likely next token given all previous tokens. This probabilistic nature is why outputs vary.

P

Positional Encoding

A mechanism that gives transformer models information about token positions. Without it, attention would be position-agnostic.

Related: Shared: Transformer basics

Prompt Injection

A security risk where an attacker embeds malicious instructions in user input or retrieved documents to manipulate LLM behavior.

Prompt Template

A parameterized prompt structure where variables are replaced at runtime. Templates enable reusable prompt engineering.

R

RAG (Retrieval-Augmented Generation)

A pattern where an LLM retrieves relevant documents before generating a response. RAG helps ground outputs in specific knowledge.

ReAct (Reasoning + Acting)

A agent pattern where the model alternates between reasoning about the current state and taking actions. Popular for tool-use agents.

Reranking

A secondary ranking step that reorders retrieved documents using a more expensive but accurate model.

S

Schema Validation

Checking that LLM output conforms to an expected structure. Structured outputs rely on schema validation for reliability.

Specialist

A focused agent designed for a specific task or domain. Specialists are often used in manager-specialist architectures.

State

The data that persists across agent interactions. In AgentFlow, state includes conversation history, variables, and tool results.

Streaming

A response pattern where tokens are sent incrementally to the client, improving perceived latency for long responses.

Structured Output

LLM responses constrained to a specific schema (JSON, XML, enum values). Structured outputs improve reliability over freeform text.

T

Thread

A persistent conversation context identified by a unique ID. Threads enable conversation continuity and state restoration.

Token

The basic unit of text processed by an LLM. Tokens are not exactly words—a token is typically a few characters or a fraction of a word.

Tool

A callable function that extends LLM capabilities beyond text generation. Tools enable agents to interact with external systems.

Top-k Retrieval

Returning the k most similar documents from a vector store based on embedding similarity.

Transformer

The neural network architecture underlying modern LLMs. Transformers use self-attention to process sequences in parallel.

Related: Shared: Transformer basics

V

Vector Store

A database optimized for storing and searching embeddings. Common vector stores include Qdrant, Pinecone, and pgvector.

2

2-step RAG

The classic retrieval-augmented generation pattern: retrieve relevant documents, then generate a response using those documents as context. Contrast with agentic RAG.

A​

Agent​

Agentic RAG​

Attention​

B​

Bounded Autonomy​

BPE (Byte Pair Encoding)​

C​

Checkpoint​

Chunk​

Citation​

Context Window​

Cosine Similarity​

D​

Deterministic Workflow​

Durable Execution​

E​

Embedding​

Eval (Evaluation)​

F​

Fan-out / Fan-in​

G​

Grounding​

H​

Handoff​

Human-in-the-loop​

Hybrid RAG​

I​

Instruction Hierarchy​

M​

MCP (Model Context Protocol)​

Memory​

Multimodal​

N​

Next-token Prediction​

P​

Positional Encoding​

Prompt Injection​

Prompt Template​

R​

RAG (Retrieval-Augmented Generation)​

ReAct (Reasoning + Acting)​

Reranking​

S​

Schema Validation​

Specialist​

State​

Streaming​

Structured Output​

T​

Thread​

Token​

Tool​

Top-k Retrieval​

Transformer​

V​

Vector Store​

2​

2-step RAG​

A

Agent

Agentic RAG

Attention

B

Bounded Autonomy

BPE (Byte Pair Encoding)

C

Checkpoint

Chunk

Citation

Context Window

Cosine Similarity

D

Deterministic Workflow

Durable Execution

E

Embedding

Eval (Evaluation)

F

Fan-out / Fan-in

G

Grounding

H

Handoff

Human-in-the-loop

Hybrid RAG

I

Instruction Hierarchy

M

MCP (Model Context Protocol)

Memory

Multimodal

N

Next-token Prediction

P

Positional Encoding

Prompt Injection

Prompt Template

R

RAG (Retrieval-Augmented Generation)

ReAct (Reasoning + Acting)

Reranking

S

Schema Validation

Specialist

State

Streaming

Structured Output

T

Thread

Token

Tool

Top-k Retrieval

Transformer

V

Vector Store

2

2-step RAG