Last updated

Predictive Memory

Predictive memory is Papr's capability to anticipate what context you'll need next and pre-cache it for near-instant retrieval (<150ms when cached). It combines semantic search, graph traversal, and behavioral learning to continuously improve accuracy as you use the system.

Why It Matters

Traditional RAG systems get slower and less accurate as your data grows. Papr's predictive memory does the opposite: it gets smarter with scale.

  • #1 on Stanford's STaRK benchmark (91%+ accuracy)
  • <150ms retrieval when prediction hits (cached)
  • Improves with usage through behavioral scoring

What's Possible with Predictive Memory

Simple keyword search (BM25/SQLite):

# Query: "show me the user's notification preferences"
# Returns: Exact matches for "notification preferences"
# Misses: Related settings, previous changes, context from other conversations

Basic vector search:

# Query: "show me the user's notification preferences"
# Returns: Semantically similar text about notifications
# Misses: Relationships (User → Preferences → Settings), temporal context (what changed), procedural context (why they changed it)

Papr's Predictive Memory:

client.memory.search(
    query="show me the user's notification preferences",
    enable_agentic_graph=True
)

# Returns (ranked by predicted importance):
# 1. Current notification preference (email) - direct match
# 2. User changed from SMS to email on March 15 - temporal relationship
# 3. Reason: "SMS notifications were too disruptive during work hours" - causal context
# 4. Related setting: user set "Do Not Disturb" 9am-5pm - graph relationship
# 5. Previous support ticket about notifications - cross-session context
# All in <150ms (when cached)

Why the difference?

  • Vector similarity (60%): Finds semantic matches
  • Transition probability (30%): Predicts related context from usage patterns
  • Normalized hotness (20%): Surfaces frequently accessed items
  • Graph traversal: Follows real relationships (User → Preference → Setting → Ticket)

How It Works

Predictive memory uses a three-component scoring system to rank context by relevance:

predicted_importance = 0.6 × vector_similarity 
                     + 0.3 × transition_probability 
                     + 0.2 × normalized_hotness

The Three Components

  1. Vector Similarity (60%): Semantic relevance to your goals/OKRs

    • Uses embedding similarity to your workspace objectives
    • Measures topical alignment with current work
  2. Transition Probability (30%): Contextual relevance

    • Built from 30 days of retrieval logs
    • Multi-step Markov predictions (3-step lookahead)
    • Exponential time decay (0.95 per day) keeps patterns fresh
  3. Normalized Hotness (20%): Access frequency

    • Log-normalized access counts (prevents outliers from dominating)
    • Research-backed: same normalization used in BM25, TF-IDF

Why It Gets Better with Scale

As you use Papr:

  • More retrieval logs → Better transition matrix → Smarter predictions
  • More usage patterns → Higher cache hit rates → Lower latency
  • More connections → Richer graph → Better multi-hop context

This creates a positive feedback loop: more data = better predictions = faster retrieval.

At Query Time

Enable graph-aware retrieval for complex queries:

{
  "query": "Find recurring issues in enterprise onboarding conversations",
  "enable_agentic_graph": true,
  "max_memories": 20,
  "max_nodes": 15,
  "response_format": "toon"
}

Parameters:

  • enable_agentic_graph: Enables intelligent multi-hop traversal
  • max_memories: Limit direct matches (default: 20)
  • max_nodes: Limit graph expansion nodes (default: 15)
  • response_format: toon: 30-60% token reduction for LLM input

At Sync Time

Access predictive context for edge/local applications via the Sync API:

curl -X POST "https://memory.papr.ai/v1/sync/tiers" \
  -H "X-API-Key: $PAPR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "max_tier0": 300,
    "max_tier1": 1000,
    "include_embeddings": true
  }'

Response:

  • tier0: Predictive items (goals/OKRs + predicted context)
  • tier1: Hot memories (frequently accessed, citation-ranked)
  • transitions: Predicted next-context relationships

See Portability and Sync for details.

Python SDK

The Python SDK provides a clean interface for sync operations:

from papr_memory import Papr

client = Papr(x_api_key="your-api-key")

# Fetch predictive tiers for local caching
sync_response = client.sync.tiers(
    max_tier0=300,      # Predictive context
    max_tier1=1000,     # Hot memories
    include_embeddings=True
)

# Cache locally for <150ms retrieval
for memory in sync_response.tier0:
    cache.store(memory.id, memory)