System Architecture

Papr Memory is an intelligent backend for AI agents and applications, combining vector embeddings with knowledge graphs through predictive intelligence.

Three Input Paths

Papr accepts content through three pathways, each optimized for different use cases:

1. Documents (POST /v1/document)

Upload PDFs and Word documents for intelligent processing.

How it works:

System analyzes document content
Decides what information is worth remembering
Creates structured memories with hierarchical organization
Extracts entities and relationships automatically
Custom schemas (when provided) guide what to extract

Best for: Contracts, reports, research papers, specifications, meeting notes

response = client.document.upload(
    file=open("contract.pdf", "rb"),
    hierarchical_enabled=True,
    simple_schema_mode=True
)

2. Messages/Chat

Send conversation history for analysis and selective memory creation.

How it works:

System analyzes conversation context
Identifies important information worth remembering
Creates memories from significant content
Connects related conversational elements

Best for: Chat history, conversation context, meeting transcripts, dialogue

client.memory.add(
    content="User mentioned they prefer email notifications over SMS",
    type="text",
    metadata={"user_id": "user_123", "topics": ["preferences"]}
)

3. Direct Memory API (POST /v1/memory)

Explicitly create memories with full control over content and structure.

How it works:

Direct memory creation without analysis
Full control over content, metadata, and graph structure
Ideal for structured data and agent self-documentation

Best for: Explicit facts, structured data you control, agent's own reasoning and learnings

# Agent documents its own workflow
client.memory.add(
    content="When handling refund requests: 1) Check account status, 2) Verify purchase date, 3) Apply refund policy based on timeframe",
    metadata={
        "role": "assistant",  # Agent memory
        "category": "learning"
    }
)

Memory Engine Intelligence

Once content enters the system, the memory engine processes it through several intelligent layers:

Vector Embeddings

All memories are converted to high-dimensional vectors for semantic similarity search. This enables finding relevant information based on meaning, not just keywords.

Predictive Models

Our predictive models automatically:

Connect related memories together
Identify entities and relationships
Build knowledge graph structure
Cache likely context needs in advance for fast retrieval

This prediction layer is what makes Papr fast—by anticipating what context you'll need, we can retrieve it instantly.

Custom Schemas

When you define custom schemas for your domain (e.g., legal contracts, medical records, code functions), they guide:

What entities to extract
What relationships to identify
How to structure the knowledge graph
Property validation and consistency

User Memories vs Agent Memories

Papr supports two types of memories, stored and queried the same way:

User Memories: Information about the user

Preferences and settings
Conversation history
Personal context
User-specific facts

Agent Memories: Agent documents its own intelligence

Workflows and procedures
Learnings from interactions
Reasoning patterns that worked
Self-improvement insights

This dual memory system enables agents to not just personalize for users, but to learn and improve their own capabilities over time.

Two Query Modes

Once memories are stored, you can query them in two powerful ways:

Natural Language Search (POST /v1/memory/search)

Ask questions in natural language and get relevant memories plus related graph entities.

How it works:

Combines vector similarity search with graph relationships
Returns both semantic matches and connected entities
Predictive caching makes retrieval lightning-fast
Agentic graph search can understand ambiguous references

Best for: Finding relevant context, answering questions, RAG (Retrieval Augmented Generation), contextual memory retrieval

search_response = client.memory.search(
    query="What are the customer's preferences for notifications?",
    user_id="user_123",
    enable_agentic_graph=True,
    max_memories=20,
    max_nodes=15
)

GraphQL (POST /v1/graphql)

Run structured queries for analytics, aggregations, and relationship analysis.

How it works:

Query your knowledge graph with GraphQL syntax
Run aggregations and joins across entities
Analyze complex relationships
Extract structured insights

Best for: Analytics, insights, structured data extraction, dashboards, multi-hop queries

response = client.graphql.query(
    query="""
    query GetCustomerInsights($customerId: ID!) {
      customer(id: $customerId) {
        name
        preferences {
          notifications
          communication_channel
        }
        interactions_aggregate {
          count
        }
      }
    }
    """,
    variables={"customerId": "cust_123"}
)

When to Use What

Input Pathways

Use Case	Recommended Input	Why
Process PDFs, Word docs	Documents endpoint	Intelligent analysis extracts structured information
Store conversation history	Messages/Direct Memory	Capture dialogue context
Explicit facts you control	Direct Memory	Full control over structure
Agent self-documentation	Direct Memory	Agent documents workflows, learnings
Domain-specific extraction	Documents + Custom Schema	Schema guides what to extract

Query Modes

Use Case	Recommended Query	Why
Find relevant context	Natural Language Search	Semantic + graph combined
Answer questions	Natural Language Search	Best for RAG applications
Analytics & insights	GraphQL	Structured queries, aggregations
Relationship analysis	GraphQL	Multi-hop queries across entities
Build dashboards	GraphQL	Complex data extraction

Complete Architecture Flow

Key Differentiators

Predictive Context Caching: Our predictive models anticipate what context you'll need and cache it in advance, making retrieval lightning-fast.

Intelligent Analysis: System automatically decides what's worth remembering when you upload documents or messages—no manual tagging required.

Dual Memory Types: Support for both user memories (personalization) and agent memories (self-improvement), enabling agents that learn and evolve.

Flexible Querying: Choose natural language search for RAG or GraphQL for analytics—same data, different access patterns.

Custom Domain Ontologies: Define your domain's entities and relationships once, and the system uses them to guide extraction across all content.

Next Steps

Memory Management - Learn CRUD operations
Document Processing - Upload and process documents
Custom Schemas - Define domain ontologies
Graph Generation - Control knowledge graph creation
GraphQL Analysis - Query insights
Quick Start - Get started in 15 minutes

System Architecture

Three Input Paths

1. Documents (POST /v1/document)

2. Messages/Chat

3. Direct Memory API (POST /v1/memory)

Memory Engine Intelligence

Vector Embeddings

Predictive Models

Custom Schemas

User Memories vs Agent Memories

Two Query Modes

Natural Language Search (POST /v1/memory/search)

GraphQL (POST /v1/graphql)

When to Use What

Input Pathways

Query Modes

Complete Architecture Flow

Key Differentiators

Next Steps

Was this helpful?