Last updated

Documentation Audit Answers

Based on deep-dive into the memory project codebase:

✅ Verified Claims

1. Gzip Response Threshold: 1KB (1000 bytes)

  • Location: app_factory.py:401
  • Code: app.add_middleware(GZipMiddleware, minimum_size=1000)
  • Answer: Exactly 1000 bytes (1KB). Responses larger than 1KB are automatically gzipped.

2. Auto-Compression Trigger: Every 15 Messages

  • Locations:
    • routers/v1/message_routes.py:211 - "Summaries are automatically generated every 15 messages"
    • routers/v1/message_routes.py:592 - "short_term: Last 15 messages compressed"
    • services/message_batch_analysis.py:87 - short_term field description
  • Answer: Compression (summarization) is triggered every 15 messages as part of the batch analysis pipeline.

3. Content Types Supported

  • User Confirmation: text, image, PDF, Word documents
  • Not Supported: video, audio
  • Action: Remove any docs claiming video/audio support

4. "Gets Better with Scale" - The Mechanism

  • Location: services/predictive/tier0_builder.py
  • Answer: Behavioral scoring improves with usage through predictive memory.

How It Works:

The predictive memory system uses a three-component scoring formula:

predicted_importance = 0.6 × vector_similarity 
                     + 0.3 × transition_probability 
                     + 0.2 × normalized_hotness

Why it gets better:

  1. More retrieval logs → Better transition matrix (Markov chain transitions)
  2. More usage patterns → Better multi-step probability predictions (3 steps ahead)
  3. More access frequency data → Better hotness normalization
  4. Temporal decay (0.95 exponential decay per day) keeps it fresh while learning from history

Research-backed:

  • Uses 30-day retrieval logs for transition matrix
  • Log1p normalization for hotness (standard in BM25, TF-IDF)
  • Multi-step Markov predictions (3-step lookahead)
  • Exponential time decay prevents stale patterns from dominating

Result: The more you use PAPR, the better it predicts what context you'll need next, leading to:

  • Higher cache hit rates → Lower latency (<150ms when cached)
  • Better predicted_importance scores → More relevant Tier 0 items in sync
  • Smarter context anticipation → Better STaRK benchmark performance

5. Predictive Caching

  • User Confirmation: Mostly internal, but exposed via:
    • /v1/sync/tiers endpoint (Tier 0 predictive items)
    • Python SDK implementation
  • Implementation: Tier0PredictiveBuilder class builds predictive Tier 0 from goals/OKRs + usage patterns

6. Data Isolation

  • User Confirmation: Default isolation per organization + namespace (when set)
  • Enterprise: Full database segregation available (contact sales)

7. 96% Token Reduction (Message Compression)

  • Location: routers/v1/message_routes.py:599
  • Mechanism: Hierarchical conversation summaries (short/medium/long-term)
  • Distinct from: TOON format (30-60% reduction in search responses)

❌ Claims to Remove/Update

1. Cached Compression Latency: <50ms

  • Status: No source found in codebase
  • User Feedback: "unclear where that number came from"
  • Action: Remove claim or measure actual latency
  • Note: Compression endpoint has from_cache field but no latency measurement

2. Video/Audio Support

  • Status: Not currently supported
  • Action: Update docs to remove video/audio from supported content types
  • Supported: text, image, PDF, Word (.doc/.docx)

📝 Documentation Updates Needed

High Priority

  1. Add explanation for "gets better with scale" → Link to predictive memory and behavioral scoring
  2. Document sync tiers as the mechanism for exposing predictive caching
  3. Clarify content types - remove video/audio references
  4. Remove or verify cached compression latency claim (<50ms)
  1. How Predictive Memory Works guide explaining:

    • Tier 0 builder scoring formula
    • Transition matrix and multi-step predictions
    • Why accuracy improves with usage
    • Time decay mechanisms
  2. Sync API Deep Dive explaining:

    • Tier 0 vs Tier 1 items
    • Predictive vs citation-based ranking
    • How to use sync tiers for edge/local sync

Code References for Documentation

Compression (96% reduction)

  • routers/v1/message_routes.py:574-620 - /sessions/{session_id}/compress endpoint
  • services/message_batch_analysis.py:83-98 - Hierarchical summaries structure

Predictive Memory

  • services/predictive/tier0_builder.py:40-404 - Full Tier0PredictiveBuilder implementation
  • services/predictive/tier0_builder.py:232-241 - Scoring formula with research citations

Caching

  • services/cache_utils.py:10-148 - TTLCache implementation (3-minute TTL for auth, 10-minute for embeddings)
  • app_factory.py:401 - GZipMiddleware with 1KB threshold

Auto-compression Trigger

  • routers/v1/message_routes.py:774 - "Uses the same smart batch analysis (every 15 messages)"
  • services/message_batch_analysis.py:87 - "Concise summary of the last 15 messages (current batch)"