System Architecture
Papr Memory is a new type of database for AI that connects context across sources and predicts what users need before they ask.
The Problem: Traditional RAG systems return fragments because conversation history, documents, and structured data live in separate silos. Retrieval works in demos but breaks in production—it's inaccurate and slow.
The Solution: The Predictive Memory Graph—a layer that maps real relationships across all your data and anticipates what users will ask next.
The Result: #1 on Stanford's STaRK benchmark (91%+ accuracy) with <150ms retrieval (when cached). Gets better as your memory grows, not worse.
Why Not Just Use SQLite + Vector Search?
Many teams start with simple memory (SQLite + keyword search, then add a vector DB). This works for prototypes, but hits production failures:
What Breaks with Simple Approaches
| Scenario | Simple Stack (SQLite + BM25) | Adding Vector Search | Papr's Approach |
|---|---|---|---|
| User asks "refund policy" but you stored "return process" | ❌ No match (keyword only) | ✅ Finds via semantic similarity | ✅ Hybrid retrieval (keyword + vector + graph) |
| Cross-session memory: "What did we discuss about travel?" (from 3 sessions ago) | ❌ Returns fragments, no connection to planning context | ❌ Returns similar text, but misses relationships | ✅ Knowledge graph links Trip → Japan → Conversation across sessions |
| Memory drift: User says "prefer email" (turn 1), later says "SMS for urgent" (turn 50) | ❌ Returns both, LLM picks randomly | ❌ Returns both based on similarity | ✅ Graph tracks provenance, GraphQL resolves conflicts ("most recent by topic") |
| Context explosion: After 100 turns, 50 relevant memories = 200K tokens, 8 sec latency | ❌ Must manually paginate or truncate | ❌ Still returns too much | ✅ Predictive caching (<150ms) + toon format (30-60% token reduction) |
| Procedural memory: Agent should remember "always check account status before refunds" | ❌ No structured learning | ❌ Retrieves similar text, but no enforcement | ✅ Agent memories (role="assistant") + retrieval before decisions |
| Multi-tenant isolation: User A shouldn't see User B's data | ❌ Manual filtering (error-prone) | ❌ Still manual filtering | ✅ Built-in namespace boundaries + ACLs |
Papr includes the simple approach (keyword search, event storage) plus the sophistication you'd eventually build.
Three Input Paths
Papr accepts content through three pathways, each optimized for different use cases:
1. Documents (POST /v1/document)
Upload PDFs and Word documents for intelligent processing.
How it works:
- System analyzes document content
- Decides what information is worth remembering
- Creates structured memories with hierarchical organization
- Extracts entities and relationships automatically
- Custom schemas (when provided) guide what to extract
Best for: Contracts, reports, research papers, specifications, meeting notes
response = client.document.upload(
file=open("contract.pdf", "rb"),
hierarchical_enabled=True,
simple_schema_mode=True
)2. Messages API (POST /v1/messages)
Store chat messages with automatic session management and memory creation.
How it works:
- Stores chat messages with role (user/assistant) and session grouping
- Automatically analyzes conversation for important information
- Creates memories from significant messages (with
process_messages: true) - Provides built-in conversation history and compression
- Hierarchical summaries for efficient LLM context
Best for: Chat applications, conversational AI, customer support, dialogue systems
# Store user message
client.messages.store(
content="I prefer email notifications over SMS",
role="user",
session_id="conv_123",
external_user_id="user_456",
process_messages=True # Automatically create memories
)
# Get conversation history
history = client.messages.get_history(session_id="conv_123", limit=20)
# Compress long conversations
compressed = client.messages.compress_session(session_id="conv_123")
context = compressed.context_for_llm # Use in LLM prompts3. Direct Memory API (POST /v1/memory)
Explicitly create memories with full control over content and structure.
How it works:
- Direct memory creation without analysis
- Full control over content, metadata, and graph structure
- Ideal for structured data and agent self-documentation
Best for: Explicit facts, structured data you control, agent's own reasoning and learnings, non-conversational content
# Agent documents its own workflow
client.memory.add(
content="When handling refund requests: 1) Check account status, 2) Verify purchase date, 3) Apply refund policy based on timeframe",
metadata={
"role": "assistant", # Agent memory
"category": "learning"
},
memory_policy={
"mode": "auto",
"schema_id": "workflow_schema" # Guide entity extraction
}
)The Predictive Memory Graph
Once content enters the system, the memory engine processes it through the Predictive Memory Graph—a layer that maps real relationships across all your data.
How It Works
Traditional systems rely on vector search alone. Great for similar text, terrible for connected context. Some add a knowledge graph, but it breaks when you add more data sources.
Papr's approach: The Predictive Memory Graph connects everything:
- A line of code → ties to a support ticket
- Support ticket → ties to a conversation with an AI coding agent
- AI conversation → ties to a Slack thread
- Slack thread → ties to a design decision from months ago
Your knowledge becomes one connected story.
Three Intelligence Layers
1. Vector Embeddings
Semantic similarity across all content types. Find information based on meaning, not just keywords.
2. Knowledge Graph
Real relationships between entities. Not just "similar text" but actual connections: Person WORKS_ON Project, Meeting DISCUSSED Feature.
3. Predictive Layer
Anticipates what users will ask next and pre-caches context:
- Analyzes query patterns
- Predicts likely follow-up questions
- Pre-loads connected context
- Result: <150ms retrieval when prediction hits (cached)
This is why Papr ranks #1 on Stanford's STaRK benchmark with 91%+ accuracy—and why it gets better as your memory grows.
Custom Schemas
When you define custom schemas for your domain (e.g., legal contracts, medical records, code functions), they guide:
- What entities to extract
- What relationships to identify
- How to structure the knowledge graph
- Property validation and consistency
Private by Design
Papr includes ACLs, namespace boundaries, and strict permission management from day one.
How It Works:
- AI agents only access what you authorize
- User permissions are respected across queries
- Data never leaks across users in the same namespace
- Namespace isolation for multi-tenancy
Result: Safe for production multi-tenant applications where data privacy is critical.
User Memories vs Agent Memories
Papr supports two types of memories, stored and queried the same way:
User Memories: Information about the user
- Preferences and settings
- Conversation history
- Personal context
- User-specific facts
Agent Memories: Agent documents its own intelligence
- Workflows and procedures
- Learnings from interactions
- Reasoning patterns that worked
- Self-improvement insights
This dual memory system enables agents to not just personalize for users, but to learn and improve their own capabilities over time.
Two Query Modes
Once memories are stored, you can query them in two powerful ways:
Natural Language Search (POST /v1/memory/search)
Ask questions in natural language and get relevant memories plus related graph entities.
How it works:
- Combines vector similarity search with graph relationships
- Returns both semantic matches and connected entities
- Predictive caching makes retrieval lightning-fast
- Agentic graph search can understand ambiguous references
Best for: Finding relevant context, answering questions, RAG (Retrieval Augmented Generation), contextual memory retrieval
search_response = client.memory.search(
query="What are the customer's preferences for notifications?",
user_id="user_123",
enable_agentic_graph=True,
max_memories=20,
max_nodes=15
)GraphQL (POST /v1/graphql)
Run structured queries for analytics, aggregations, and relationship analysis.
How it works:
- Query your knowledge graph with GraphQL syntax
- Run aggregations and joins across entities
- Analyze complex relationships
- Extract structured insights
Best for: Analytics, insights, structured data extraction, dashboards, multi-hop queries
response = client.graphql.query(
query="""
query GetCustomerInsights($customerId: ID!) {
customer(id: $customerId) {
name
preferences {
notifications
communication_channel
}
interactions_aggregate {
count
}
}
}
""",
variables={"customerId": "cust_123"}
)When to Use What
Input Pathways
| Use Case | Recommended Input | Why |
|---|---|---|
| Process PDFs, Word docs | Documents endpoint | Intelligent analysis extracts structured information |
| Store conversation history | Messages/Direct Memory | Capture dialogue context |
| Explicit facts you control | Direct Memory | Full control over structure |
| Agent self-documentation | Direct Memory | Agent documents workflows, learnings |
| Domain-specific extraction | Documents + Custom Schema | Schema guides what to extract |
Query Modes
| Use Case | Recommended Query | Why |
|---|---|---|
| Find relevant context | Natural Language Search | Semantic + graph combined |
| Answer questions | Natural Language Search | Best for RAG applications |
| Analytics & insights | GraphQL | Structured queries, aggregations |
| Relationship analysis | GraphQL | Multi-hop queries across entities |
| Build dashboards | GraphQL | Complex data extraction |
Complete Architecture Flow
Key Differentiators
Predictive Context Caching: Our predictive models anticipate what context you'll need and cache it in advance, making retrieval lightning-fast.
Intelligent Analysis: System automatically decides what's worth remembering when you upload documents or messages—no manual tagging required.
Dual Memory Types: Support for both user memories (personalization) and agent memories (self-improvement), enabling agents that learn and evolve.
Flexible Querying: Choose natural language search for RAG or GraphQL for analytics—same data, different access patterns.
Custom Domain Ontologies: Define your domain's entities and relationships once, and the system uses them to guide extraction across all content.
Next Steps
- Memory Management - Learn CRUD operations
- Document Processing - Upload and process documents
- Custom Schemas - Define domain ontologies
- Graph Generation - Control knowledge graph creation
- GraphQL Analysis - Query insights
- Quick Start - Get started in 15 minutes