The Three Layers of Memory in PyAgenity¶

PyAgenity implements a sophisticated three-tier memory architecture that mirrors how humans process and retain information. Understanding this layered approach is crucial for building effective agents that can maintain context, learn from interactions, and provide personalized experiences.

The Memory Hierarchy: A Conceptual Foundation¶

Think of an intelligent agent as having three different types of memory, each serving distinct purposes:

1. Working Memory (Short-term Context) Like holding a conversation in your mind, this is the immediate context that drives current interactions. It's fast, temporary, and directly influences what the agent says next.

2. Session Memory (Conversation History) Similar to remembering what happened in a meeting, this preserves the flow and history of interactions for reference, debugging, and user interface purposes.

3. Knowledge Memory (Long-term Storage) Like accumulated wisdom and learned facts, this stores insights, preferences, and knowledge that span multiple conversations and enhance future interactions.

Why This Architecture Matters¶

This separation isn't just about technical organization—it reflects different temporal needs and access patterns in agent behavior:

Working memory needs to be fast and contextually relevant for real-time decision making
Session memory serves persistence and auditability without overwhelming the agent's thinking process
Knowledge memory enables learning and personalization across conversation boundaries

Let's explore how each layer works in practice.

Layer 1: Working Memory - The Agent's Active Thoughts¶

Working memory in PyAgenity is embodied by the AgentState, which holds the current conversation context as a living, breathing entity.

from pyagenity.state import AgentState
from pyagenity.utils import Message

# The agent's working memory
state = AgentState()
state.context = [
    Message.text_message("What's the weather like?", role="user"),
    Message.text_message("Let me check that for you.", role="assistant")
]

The Dynamic Nature of Working Memory¶

What makes working memory special is its dynamic, evolving nature. Unlike static data storage, the agent's context:

Grows with each interaction (user messages, assistant responses, tool calls)
Transforms through processing (the agent reasons about and responds to context)
Adapts through trimming (older context gets summarized or removed when limits are reached)

# Context evolves through the conversation
state.context.append(tool_call_message)
state.context.append(tool_result_message)  
state.context.append(final_response_message)

The Context Management Challenge¶

A critical challenge emerges: context windows have limits. As conversations grow, you need strategies to maintain relevance without losing important information. This is where context management becomes crucial:

# Context managers handle the "forgetting" process
from pyagenity.state import BaseContextManager

class SummaryContextManager(BaseContextManager):
    async def atrim_context(self, state):
        if len(state.context) > 50:
            # Summarize older messages, keep recent ones
            summary = await summarize_messages(state.context[:30])
            state.context_summary = summary
            state.context = state.context[30:]  # Keep recent context
        return state

The beauty of this approach is that context management is pluggable—you can implement different strategies (summarization, token-based trimming, importance scoring) without changing your core agent logic.

Layer 2: Session Memory - The Conversation Chronicle¶

While working memory focuses on what the agent is thinking right now, session memory preserves the complete interaction history for different purposes entirely.

Why Separate Session Memory?¶

Think about the difference between: - What you need to remember to continue a conversation effectively (working memory) - What you might want to review later, debug, or show in a user interface (session memory)

Session memory serves persistence, auditability, and user experience rather than immediate decision-making.

from pyagenity.checkpointer import PgCheckpointer

# Session memory persists the full interaction history
checkpointer = PgCheckpointer(postgres_dsn="postgresql://...")

# This stores every message, state transition, and execution detail
await checkpointer.aput_messages(config, messages)
await checkpointer.aput_state(config, final_state)

The Dual Storage Strategy¶

Here's a key insight: PyAgenity uses a two-tier persistence strategy within session memory itself:

Fast Cache (Redis) - For active conversations and immediate retrieval
Durable Storage (PostgreSQL) - For permanent record-keeping

# Fast retrieval from cache during active conversation
cached_state = await checkpointer.aget_state_cache(config)

# Durable persistence for long-term storage  
await checkpointer.aput_state(config, state)  # Writes to both cache and DB

This design optimizes for both speed and durability—active conversations stay fast while ensuring nothing is ever truly lost.

Layer 3: Knowledge Memory - The Agent's Learned Wisdom¶

Knowledge memory transcends individual conversations. It's where agents develop persistent understanding, store user preferences, and build contextual intelligence that improves over time.

Beyond Conversation Boundaries¶

Unlike working memory (single conversation) and session memory (conversation history), knowledge memory operates across multiple conversations, users, and time periods.

from pyagenity.store import QdrantStore

# Knowledge that persists across conversations
store = QdrantStore(collection_name="user_preferences")

# Store learned insights
await store.astore(
    config={"user_id": "alice"},
    content="Alice prefers concise technical explanations",
    memory_type=MemoryType.SEMANTIC,
    category="communication_style"
)

# Retrieve relevant knowledge in future conversations
relevant_memories = await store.asearch(
    config={"user_id": "alice"}, 
    query="how should I explain technical concepts?",
    limit=3
)

Retrieval Strategies and Intelligence¶

Knowledge memory isn't just storage—it's intelligent retrieval. Different situations call for different memory access patterns:

Similarity Search: Find semantically related information
Temporal Retrieval: Access recent or time-relevant memories
Hybrid Approaches: Combine multiple retrieval strategies

# Flexible retrieval strategies
memories = await store.asearch(
    config=config,
    query="user interface preferences",
    retrieval_strategy=RetrievalStrategy.HYBRID,
    memory_type=MemoryType.SEMANTIC,
    limit=5
)

The Integration Pattern: How the Layers Work Together¶

The real power emerges when these three memory layers work in harmony. Here's a typical interaction flow:

1. Context Assembly Phase¶

# Start with current working memory
state = current_agent_state

# Optionally enrich with relevant knowledge
if should_use_knowledge:
    relevant_memories = await store.asearch(config, query=state.context[-1].text())
    # Inject relevant memories into system prompts

2. Processing Phase¶

# Agent processes with full context awareness
response = await agent_function(state, config)

3. Persistence Phase¶

# Update working memory
state.context.append(response)

# Persist to session memory  
await checkpointer.aput_state(config, state)

# Extract insights for knowledge memory
if important_information_learned:
    await store.astore(config, insight, memory_type=MemoryType.SEMANTIC)

4. Context Management Phase¶

# Trim working memory if needed
if context_manager:
    state = await context_manager.atrim_context(state)

Design Principles and Implications¶

This three-tier architecture embodies several key design principles:

Separation of Concerns¶

Each memory layer has a distinct purpose, preventing interference and enabling optimization

Performance Optimization¶

Fast access patterns for immediate needs, efficient storage for long-term retention

Flexible Integration¶

Layers can be used independently or together, supporting various application architectures

Scalability Boundaries¶

Clear boundaries enable different scaling strategies for different memory types

Developer Experience¶

The abstraction matches mental models of how intelligent systems should work

When to Use Each Layer¶

Understanding when and why to engage each memory layer is crucial for effective agent design:

Use Working Memory When:¶

Making immediate responses and decisions
Maintaining conversation flow and coherence
Processing current context for LLM interactions
Managing real-time state transitions

Use Session Memory When:¶

Building user interfaces that show conversation history
Implementing conversation resume functionality
Debugging agent behavior and decision paths
Compliance and audit requirements need full interaction records

Use Knowledge Memory When:¶

Personalizing experiences across sessions
Building agents that learn and improve over time
Implementing recommendation systems
Creating persistent user preferences and profiles

The key insight is that these layers serve different stakeholders and use cases—the agent itself, the application interface, and the overall system intelligence.

Conclusion: Building Memory-Aware Agents¶

PyAgenity's three-tier memory architecture provides a foundation for building truly intelligent agents that can:

Think clearly with focused working memory
Remember completely with persistent session memory
Learn continuously with accumulated knowledge memory

By understanding these layers and their interactions, you can design agents that not only respond intelligently in the moment but also grow wiser over time—much like human intelligence itself.