Lesson 6: Multimodal and Client/Server Integration

Learning Outcome

By the end of this lesson, you will be able to:

Build multimodal agents that handle images and files
Connect agents to frontend clients via API
Design clean separation between UI and agent logic

Prerequisites

Lesson 5: State and memory
Streaming concepts

Concept: GenAI Apps Are Not Just Text Chat

Modern GenAI applications handle multiple modalities:

Common Multimodal Patterns

Pattern	Input	Output	Use Case
Vision + Chat	Image + text	Text	Screenshot analysis
Document Q&A	PDF + question	Text + citations	Contract review
Image Generation	Text prompt	Image	Creative tools
Voice Assistant	Audio	Audio	Hands-free interaction

Concept: File Upload Workflows

Typical File Handling Pipeline

File Type Handling

File Type	How to Handle	Considerations
Images	Vision API, base64 encoding	Size limits, processing cost
PDF	Text extraction, OCR	Complex layouts harder
Code files	Direct text reading	Preserve syntax
CSV/JSON	Structured parsing	Validate schema
Documents	Convert to markdown	Preserve formatting

Concept: Client/Server Architecture

Clean Separation of Concerns

Responsibilities

Layer	Responsibilities
Client	UI rendering, user input, streaming display, local caching
API	Request routing, auth, rate limiting, response formatting
Agent	LLM calls, tool execution, state management
Storage	File storage, vector store, checkpoint storage

Example: Multimodal Agent

Image Understanding

from agentflow.core.llm import OpenAIModel
from agentflow.core.state import Message
import base64

def encode_image(image_path: str) -> str:
    with open(image_path, "rb") as f:
        return base64.b64encode(f.read()).decode()

class VisionAgent:
    def __init__(self):
        self.llm = OpenAIModel("gpt-4o")  # Vision-capable model
    
    def analyze_image(self, image_path: str, question: str) -> str:
        """Analyze an image and answer questions about it."""
        image_data = encode_image(image_path)
        
        messages = [
            Message(role="user", content=[
                {
                    "type": "text",
                    "text": question
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{image_data}"
                    }
                }
            ])
        ]
        
        return self.llm.generate(messages=messages)

Document Q&A

from agentflow.core.tools import tool, ToolResult
import PyPDF2

class DocumentProcessor:
    def extract_text(self, file_path: str, max_pages: int = 10) -> str:
        """Extract text from PDF."""
        with open(file_path, "rb") as f:
            reader = PyPDF2.PdfReader(f)
            text = ""
            for i, page in enumerate(reader.pages[:max_pages]):
                text += page.extract_text() + "\n\n"
        return text

@tool(name="read_document", description="Read text from a document file")
def read_document(file_path: str, question: str = None) -> ToolResult:
    """Read and optionally answer questions about a document."""
    processor = DocumentProcessor()
    
    try:
        content = processor.extract_text(file_path)
        
        if question:
            # Answer question about document
            response = llm.generate(
                f"Based on this document, answer: {question}\n\n{content}"
            )
            return ToolResult(result=response)
        else:
            return ToolResult(result=content[:5000])  # Limit output
    except Exception as e:
        return ToolResult(error=str(e))

Example: API Server with Streaming

FastAPI Server Setup

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from typing import Optional

app = FastAPI()

class ChatRequest(BaseModel):
    message: str
    thread_id: str
    user_id: Optional[str] = None

@app.post("/api/chat")
async def chat(request: ChatRequest):
    """Non-streaming chat endpoint."""
    result = await agent.ainvoke(
        thread_id=request.thread_id,
        message=request.message,
        user_id=request.user_id
    )
    return {"response": result["response"], "thread_id": request.thread_id}

@app.post("/api/chat/stream")
async def chat_stream(request: ChatRequest):
    """Streaming chat endpoint."""
    async def generate():
        async for chunk in agent.astream(
            thread_id=request.thread_id,
            message=request.message
        ):
            yield f"data: {chunk.json()}\n\n"
    
    return StreamingResponse(
        generate(),
        media_type="text/event-stream"
    )

File Upload Endpoint

from fastapi import UploadFile, File

@app.post("/api/upload")
async def upload_file(file: UploadFile = File(...)):
    """Upload a file for processing."""
    # Validate file type
    allowed_types = ["image/jpeg", "image/png", "application/pdf", "text/plain"]
    if file.content_type not in allowed_types:
        raise HTTPException(400, f"File type {file.content_type} not allowed")
    
    # Save file securely
    file_id = save_file(file)
    
    return {"file_id": file_id, "filename": file.filename}

def save_file(file: UploadFile) -> str:
    """Save uploaded file to secure storage."""
    import uuid
    import os
    
    file_id = str(uuid.uuid4())
    path = f"/secure_storage/{file_id}"
    
    os.makedirs(os.path.dirname(path), exist_ok=True)
    
    with open(path, "wb") as f:
        content = await file.read()
        f.write(content)
    
    return file_id

Example: Frontend Client Integration

React Client Component

import { useState } from 'react';
import { AgentFlowClient } from '@10xscale/agentflow-client';

const client = new AgentFlowClient({
  baseUrl: 'http://localhost:8000'
});

export function ChatInterface() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [input, setInput] = useState('');
  const [threadId] = useState('user-123-thread-1');

  const sendMessage = async () => {
    if (!input.trim()) return;

    const userMessage = { role: 'user' as const, content: input };
    setMessages(prev => [...prev, userMessage]);
    setInput('');

    try {
      // Streaming response
      const stream = client.streamChat({
        threadId,
        message: input
      });

      const assistantMessage = { role: 'assistant' as const, content: '' };
      setMessages(prev => [...prev, assistantMessage]);

      for await (const chunk of stream) {
        setMessages(prev => {
          const updated = [...prev];
          updated[updated.length - 1].content += chunk.content;
          return updated;
        });
      }
    } catch (error) {
      console.error('Chat error:', error);
    }
  };

  return (
    <div className="chat-container">
      <div className="messages">
        {messages.map((m, i) => (
          <div key={i} className={`message ${m.role}`}>
            {m.content}
          </div>
        ))}
      </div>
      <div className="input-area">
        <input
          value={input}
          onChange={e => setInput(e.target.value)}
          onKeyDown={e => e.key === 'Enter' && sendMessage()}
          placeholder="Type a message..."
        />
        <button onClick={sendMessage}>Send</button>
      </div>
    </div>
  );
}

Exercise: Build a Full-Stack Chat App

Your Task

Build a complete chat application with:

Backend — FastAPI server with:
- Chat endpoint (streaming)
- File upload endpoint
- Thread management
Frontend — React component with:
- Message display
- Streaming text
- File upload button

Template Structure

my-agent-app/
├── backend/
│   ├── main.py          # FastAPI app
│   ├── agent.py         # Agent logic
│   └── models.py        # Pydantic models
├── frontend/
│   ├── Chat.tsx         # Main component
│   └── api.ts           # Client wrapper
└── docker-compose.yml

What You Learned

GenAI is multimodal — Text is just one modality
Files need processing — Extract content, generate embeddings
Clean architecture — Separate client, API, and agent responsibilities
Streaming improves UX — Send tokens as they arrive

Common Failure Mode

Thick frontend, thin backend

Don't put business logic in the frontend:

# ❌ Thin backend - logic in frontend
@app.post("/chat")
def chat(message: str):
    return {"response": "OK"}  # Frontend does everything!

# ✅ Thick backend - logic in agent
@app.post("/chat")
def chat(message: str, thread_id: str):
    result = agent.process(thread_id, message)
    return {"response": result}

Next Step

Continue to Lesson 7: Evals, safety, cost, and release to learn how to ship with confidence.

Or Explore

Connect Client Tutorial — Full client integration
Playground Tutorial — Using the playground

Learning Outcome​

Prerequisites​

Concept: GenAI Apps Are Not Just Text Chat​

Common Multimodal Patterns​

Concept: File Upload Workflows​

Typical File Handling Pipeline​

File Type Handling​

Concept: Client/Server Architecture​

Clean Separation of Concerns​

Responsibilities​

Example: Multimodal Agent​

Image Understanding​

Document Q&A​

Example: API Server with Streaming​

FastAPI Server Setup​

File Upload Endpoint​

Example: Frontend Client Integration​

React Client Component​

Exercise: Build a Full-Stack Chat App​

Your Task​

Template Structure​

What You Learned​

Common Failure Mode​

Next Step​

Or Explore​