Predictive Memory
Predictive memory is Papr's capability to anticipate what context you'll need next and pre-cache it for near-instant retrieval (<150ms when cached). It combines semantic search, graph traversal, and behavioral learning to continuously improve accuracy as you use the system.
Why It Matters
Traditional RAG systems get slower and less accurate as your data grows. Papr's predictive memory does the opposite: it gets smarter with scale.
- #1 on Stanford's STaRK benchmark (91%+ accuracy)
- <150ms retrieval when prediction hits (cached)
- Improves with usage through behavioral scoring
What's Possible with Predictive Memory
Simple keyword search (BM25/SQLite):
# Query: "show me the user's notification preferences"
# Returns: Exact matches for "notification preferences"
# Misses: Related settings, previous changes, context from other conversationsBasic vector search:
# Query: "show me the user's notification preferences"
# Returns: Semantically similar text about notifications
# Misses: Relationships (User → Preferences → Settings), temporal context (what changed), procedural context (why they changed it)Papr's Predictive Memory:
client.memory.search(
query="show me the user's notification preferences",
enable_agentic_graph=True
)
# Returns (ranked by predicted importance):
# 1. Current notification preference (email) - direct match
# 2. User changed from SMS to email on March 15 - temporal relationship
# 3. Reason: "SMS notifications were too disruptive during work hours" - causal context
# 4. Related setting: user set "Do Not Disturb" 9am-5pm - graph relationship
# 5. Previous support ticket about notifications - cross-session context
# All in <150ms (when cached)Why the difference?
- Vector similarity (60%): Finds semantic matches
- Transition probability (30%): Predicts related context from usage patterns
- Normalized hotness (20%): Surfaces frequently accessed items
- Graph traversal: Follows real relationships (User → Preference → Setting → Ticket)
How It Works
Predictive memory uses a three-component scoring system to rank context by relevance:
predicted_importance = 0.6 × vector_similarity
+ 0.3 × transition_probability
+ 0.2 × normalized_hotnessThe Three Components
Vector Similarity (60%): Semantic relevance to your goals/OKRs
- Uses embedding similarity to your workspace objectives
- Measures topical alignment with current work
Transition Probability (30%): Contextual relevance
- Built from 30 days of retrieval logs
- Multi-step Markov predictions (3-step lookahead)
- Exponential time decay (0.95 per day) keeps patterns fresh
Normalized Hotness (20%): Access frequency
- Log-normalized access counts (prevents outliers from dominating)
- Research-backed: same normalization used in BM25, TF-IDF
Why It Gets Better with Scale
As you use Papr:
- More retrieval logs → Better transition matrix → Smarter predictions
- More usage patterns → Higher cache hit rates → Lower latency
- More connections → Richer graph → Better multi-hop context
This creates a positive feedback loop: more data = better predictions = faster retrieval.
At Query Time
Enable graph-aware retrieval for complex queries:
{
"query": "Find recurring issues in enterprise onboarding conversations",
"enable_agentic_graph": true,
"max_memories": 20,
"max_nodes": 15,
"response_format": "toon"
}Parameters:
enable_agentic_graph: Enables intelligent multi-hop traversalmax_memories: Limit direct matches (default: 20)max_nodes: Limit graph expansion nodes (default: 15)response_format: toon: 30-60% token reduction for LLM input
At Sync Time
Access predictive context for edge/local applications via the Sync API:
curl -X POST "https://memory.papr.ai/v1/sync/tiers" \
-H "X-API-Key: $PAPR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"max_tier0": 300,
"max_tier1": 1000,
"include_embeddings": true
}'Response:
tier0: Predictive items (goals/OKRs + predicted context)tier1: Hot memories (frequently accessed, citation-ranked)transitions: Predicted next-context relationships
See Portability and Sync for details.
Python SDK
The Python SDK provides a clean interface for sync operations:
from papr_memory import Papr
client = Papr(x_api_key="your-api-key")
# Fetch predictive tiers for local caching
sync_response = client.sync.tiers(
max_tier0=300, # Predictive context
max_tier1=1000, # Hot memories
include_embeddings=True
)
# Cache locally for <150ms retrieval
for memory in sync_response.tier0:
cache.store(memory.id, memory)Related
- Portability and Sync - Full sync API documentation
- Capability Matrix - All PAPR capabilities
- Golden Paths - Recommended integration patterns
- Quickstart: Chat Memory - Get started fast