Last updated

Message Compression & Summarization

Intelligent conversation compression that reduces token usage by 96% while preserving critical context.

Overview

When working with long conversations, sending full chat history to LLMs becomes expensive and slow. Papr's message compression system automatically analyzes conversations and generates hierarchical summaries that capture all essential information in a fraction of the tokens.

Key Benefits:

  • 🚀 96% token reduction - 50,000 tokens → 2,000 tokens
  • <50ms response time - Cached summaries return instantly
  • 🧠 Intelligent extraction - Session intent, decisions, next steps, file tracking
  • 🔄 Automatic generation - Every 15 messages, no manual work
  • 📊 Three-tier summaries - Short/medium/long term context

How It Works

Automatic Compression (Every 15 Messages)

The system automatically triggers compression analysis when a session reaches message count thresholds:

Messages 1-15

15 Messages Reached

Batch Analysis

Groq LLM Processing

Extract Summaries & Fields

Cache in Parse Server

Create MessageSession Node

Ready for Retrieval

What happens:

  1. User sends 15th message to session
  2. System triggers batch analysis in background
  3. Groq LLM analyzes all messages with structured output
  4. Three-tier summaries generated (short/medium/long)
  5. Enhanced fields extracted (intent, decisions, state, files, etc.)
  6. Everything cached in Parse Server ChatSession
  7. MessageSession node created in Neo4j with all fields
  8. Future compression requests return instantly from cache

Performance:

  • First 15 messages: Background processing (~3 seconds)
  • Subsequent calls: <50ms (from cache)
  • No repeated LLM calls - generated once, cached forever

On-Demand Compression

Request compression any time using the compress endpoint:

GET /v1/messages/sessions/{sessionId}/compress

Flow:

Found

Not Found

API Request

Check Parse Cache

Return Summary <50ms

Return 404

Trigger Background Processing

Next Call Returns Summary

First call (no summary exists):

  • Returns 404 status
  • Triggers background summary generation
  • Takes ~3 seconds to complete
  • Next call returns the cached summary

Subsequent calls (summary exists):

  • Returns full summary instantly
  • ~50ms response time
  • Includes from_cache: true flag

Three-Tier Summary System

Papr generates three levels of summaries, each optimized for different use cases:

1. Short-Term Summary (Last 15 Messages)

Purpose: Immediate context and current task focus

Contains:

  • Most recent conversation state
  • Current problem or task being worked on
  • Immediate context needed for next interaction

Best for:

  • Quick context refresh
  • Continuing current task
  • Near-term decision making

Example:

"User is debugging JWT token refresh timing issues in their React authentication 
flow. The tokens are expiring correctly but the refresh mechanism isn't triggering 
at the right time. Currently checking the expiry calculation logic in useAuth.ts."

2. Medium-Term Summary (Last ~100 Messages)

Purpose: Recent session history and flow

Contains:

  • Multiple related tasks worked on recently
  • Patterns across recent work
  • Context switches and their reasons
  • Recent decisions and their outcomes

Best for:

  • Understanding session flow
  • Seeing how tasks relate
  • Identifying patterns

Example:

"Over the past 100 messages, user has been building a complete JWT authentication 
system for their React task app. Started with basic login/signup forms, then 
implemented token generation on backend, switched from localStorage to httpOnly 
cookies for security, and is now working on token refresh logic. Key challenge 
has been getting the timing right for automatic token refresh."

3. Long-Term Summary (Full Session)

Purpose: Complete conversation arc and project overview

Contains:

  • Entire project context from start
  • All major decisions and their reasoning
  • Full technical stack and architecture
  • Complete timeline of work
  • Big-picture project goals

Best for:

  • High-level project understanding
  • Project documentation
  • Context for new team members
  • Historical reference

Example:

"Full project: User is building a task management SaaS application with React 
frontend and Node.js backend. Started by asking about React best practices, then 
architected the application structure, implemented database schema with PostgreSQL, 
built REST API with Express, created authentication system with JWT tokens, and is 
now working on the authorization layer with role-based access control. Tech stack: 
React, TypeScript, Node.js, Express, PostgreSQL, JWT. Current phase: Security 
implementation."

Enhanced Fields

Beyond basic summaries, Papr extracts structured information that provides rich context:

Session Intent

What it is: The user's primary goal or what they're trying to accomplish

Format: 1-2 sentence clear statement

Example:

{
  "session_intent": "Build secure JWT authentication for task management app with httpOnly cookies and automatic token refresh"
}

Use for:

  • Understanding user's end goal
  • Keeping AI responses aligned with intent
  • Project documentation

Key Decisions

What it is: Important choices made during the conversation with reasoning

Format: Array of decision statements including "why"

Example:

{
  "key_decisions": [
    "Use httpOnly cookies instead of localStorage for JWT storage (XSS protection)",
    "Implement 5-minute refresh buffer before token expiry (prevent auth interruptions)",
    "Choose React Context over Redux for auth state (simpler for this use case)",
    "Set token expiry to 1 hour with 7-day refresh token (balance security and UX)"
  ]
}

Use for:

  • Understanding past choices
  • Avoiding decision flip-flops
  • Documenting architecture decisions
  • Onboarding new developers

Current State

What it is: What's working and what's not working right now

Format: String describing current status

Example:

{
  "current_state": "Working: JWT generation and validation, httpOnly cookie storage, login/logout flow. Not working: Token refresh timing is off by ~2 minutes, protected routes occasionally redirect to login incorrectly"
}

Use for:

  • Quick status check
  • Identifying blockers
  • Prioritizing next work
  • Bug tracking

Next Steps

What it is: Specific actionable items to work on next

Format: Array of 3-5 concrete action items

Example:

{
  "next_steps": [
    "Debug token expiry calculation in useAuth.ts (check timezone handling)",
    "Add loading states to prevent double-refresh attempts",
    "Implement ProtectedRoute component with proper redirect logic",
    "Write integration tests for token refresh flow",
    "Add error handling for network failures during refresh"
  ]
}

Use for:

  • Clear action plan
  • AI agent task planning
  • Project management
  • Avoiding paralysis

Technical Details

What it is: Important technical specifics like endpoints, configs, error messages

Format: Array of technical facts

Example:

{
  "technical_details": [
    "Token expiry: 3600 seconds (1 hour)",
    "Refresh token expiry: 604800 seconds (7 days)",
    "Login endpoint: POST /api/auth/login",
    "Refresh endpoint: POST /api/auth/refresh",
    "Cookie name: 'auth_token'",
    "Cookie settings: httpOnly=true, secure=true, sameSite=strict",
    "Error code for expired tokens: ERR_TOKEN_EXPIRED",
    "Auth hook: useAuth() from src/hooks/useAuth.ts"
  ]
}

Use for:

  • Quick reference for configs
  • Debugging error messages
  • Documentation
  • New developer onboarding

Files Accessed

What it is: Complete tracking of file operations during conversation

Format: Object with arrays for read/modified/created/deleted

Example:

{
  "files_accessed": {
    "read": [
      "src/components/Login.tsx",
      "src/api/auth.ts",
      "src/utils/tokenValidation.ts"
    ],
    "modified": [
      {
        "path": "src/components/Login.tsx",
        "description": "Added form validation with Yup schema and error display"
      },
      {
        "path": "src/api/auth.ts",
        "description": "Updated token handling to use httpOnly cookies instead of localStorage"
      },
      {
        "path": "src/hooks/useAuth.ts",
        "description": "Fixed token refresh timing calculation"
      }
    ],
    "created": [
      "src/hooks/useAuth.ts",
      "src/utils/tokenStorage.ts",
      "src/components/ProtectedRoute.tsx"
    ],
    "deleted": [
      "src/utils/localStorageAuth.ts"
    ]
  }
}

Use for:

  • Understanding what was changed
  • Code review context
  • Git commit messages
  • Rollback decisions

Project Context

What it is: High-level project information and metadata

Format: Object with project details

Example:

{
  "project_context": {
    "project_name": "Task Management SaaS",
    "project_id": "proj_task_management",
    "project_path": "/Users/dev/projects/task-app",
    "tech_stack": ["React", "TypeScript", "Node.js", "Express", "PostgreSQL", "JWT"],
    "current_task": "Implementing JWT authentication with httpOnly cookies and token refresh",
    "git_repo": "https://github.com/user/task-app"
  }
}

Use for:

  • Project identification
  • Tech stack reference
  • Context switching between projects
  • Team collaboration

Response Format

Complete Response Structure

{
  "session_id": "session_abc123",
  "summaries": {
    "short_term": "User is debugging token refresh timing...",
    "medium_term": "Over the past 100 messages, user has built...",
    "long_term": "Full project: Task management SaaS with React...",
    "topics": ["React", "JWT", "Authentication", "TypeScript", "Security"],
    "last_updated": "2026-02-11T14:30:00Z"
  },
  "enhanced_fields": {
    "session_intent": "Build secure JWT authentication...",
    "key_decisions": [
      "Use httpOnly cookies for XSS protection",
      "Implement 5-minute refresh buffer"
    ],
    "current_state": "Working: JWT generation. Not working: Refresh timing",
    "next_steps": [
      "Debug expiry calculation",
      "Add loading states",
      "Write integration tests"
    ],
    "technical_details": [
      "Token expiry: 3600 seconds",
      "Refresh endpoint: POST /api/auth/refresh"
    ],
    "files_accessed": {
      "read": ["src/api/auth.ts"],
      "modified": [
        {"path": "src/hooks/useAuth.ts", "description": "Fixed timing"}
      ],
      "created": ["src/components/ProtectedRoute.tsx"],
      "deleted": []
    },
    "project_context": {
      "project_name": "Task Management SaaS",
      "tech_stack": ["React", "TypeScript", "Node.js"],
      "current_task": "JWT authentication implementation"
    }
  },
  "from_cache": true,
  "message_count": 47
}

Field Availability

FieldAlways PresentAvailable After
session_id✅ YesFirst message
summaries.short_term✅ Yes15 messages
summaries.medium_term✅ Yes15 messages
summaries.long_term✅ Yes15 messages
summaries.topics✅ Yes15 messages
enhanced_fields.*⚠️ Optional15 messages
from_cache✅ YesAlways
message_count✅ YesAlways

Usage Examples

Python

from papr_memory import Papr
import os

client = Papr(x_api_key=os.environ.get("PAPR_MEMORY_API_KEY"))

# Get compressed context
compressed = client.messages.compress_session(
    session_id="session_abc123"
)

# Check if from cache
if compressed.from_cache:
    print("✅ Near-instant response from cache")
else:
    print("🆕 Just generated")

# Access three-tier summaries
print(f"Recent: {compressed.summaries.short_term}")
print(f"Session: {compressed.summaries.medium_term}")
print(f"Full: {compressed.summaries.long_term}")

# Access enhanced fields
print(f"Goal: {compressed.enhanced_fields.session_intent}")
print(f"Status: {compressed.enhanced_fields.current_state}")
print(f"Next: {compressed.enhanced_fields.next_steps[0]}")

# Use in LLM prompt
system_prompt = f"""
You are helping with: {compressed.enhanced_fields.project_context.project_name}
Tech stack: {', '.join(compressed.enhanced_fields.project_context.tech_stack)}

Recent work:
{compressed.summaries.short_term}

Key decisions made:
{chr(10).join(f'- {d}' for d in compressed.enhanced_fields.key_decisions)}

Current status: {compressed.enhanced_fields.current_state}

Next steps:
{chr(10).join(f'{i+1}. {s}' for i, s in enumerate(compressed.enhanced_fields.next_steps))}
"""

# Feed to your LLM
response = your_llm.chat(
    system=system_prompt,
    user="How do I fix the token refresh timing issue?"
)

TypeScript

import Papr from '@papr/memory';

const client = new Papr({
  xAPIKey: process.env.PAPR_MEMORY_API_KEY
});

// Get compressed context
const compressed = await client.messages.compressSession({
  session_id: "session_abc123"
});

// Check if from cache
if (compressed.from_cache) {
    console.log('✅ Near-instant response from cache');
} else {
  console.log('🆕 Just generated');
}

// Access three-tier summaries
console.log(`Recent: ${compressed.summaries.short_term}`);
console.log(`Session: ${compressed.summaries.medium_term}`);
console.log(`Full: ${compressed.summaries.long_term}`);

// Access enhanced fields
console.log(`Goal: ${compressed.enhanced_fields.session_intent}`);
console.log(`Status: ${compressed.enhanced_fields.current_state}`);
console.log(`Next: ${compressed.enhanced_fields.next_steps[0]}`);

// Use in LLM prompt
const systemPrompt = `
You are helping with: ${compressed.enhanced_fields.project_context.project_name}
Tech stack: ${compressed.enhanced_fields.project_context.tech_stack.join(', ')}

Recent work:
${compressed.summaries.short_term}

Key decisions made:
${compressed.enhanced_fields.key_decisions.map(d => `- ${d}`).join('\n')}

Current status: ${compressed.enhanced_fields.current_state}

Next steps:
${compressed.enhanced_fields.next_steps.map((s, i) => `${i+1}. ${s}`).join('\n')}
`;

// Feed to your LLM
const response = await yourLLM.chat({
  system: systemPrompt,
  user: "How do I fix the token refresh timing issue?"
});

cURL

# Get compressed summary
curl -X GET "https://memory.papr.ai/v1/messages/sessions/session_abc123/compress" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "X-Client-Type: curl"

# Response (if summary exists)
{
  "session_id": "session_abc123",
  "summaries": { ... },
  "enhanced_fields": { ... },
  "from_cache": true,
  "message_count": 47
}

# Response (if no summary yet)
{
  "detail": "Session not found or no summary exists"
}
# (Wait a few seconds and try again - background processing triggered)

Token Optimization

Before Compression

Full conversation history for a 50-message session:

Message 1: User: "I'm building a React app..." (150 tokens)
Message 2: Assistant: "Great! Here's how..." (300 tokens)
Message 3: User: "How do I add JWT auth?" (80 tokens)
...
Message 50: Assistant: "To fix that, try..." (250 tokens)

TOTAL: ~50,000 tokens

Cost at $1/1M tokens: $0.05 per prompt Latency: Slow processing of large context

After Compression

Compressed context using hierarchical summaries:

{
  "summaries": {
    "short_term": "User debugging token refresh..." (200 tokens),
    "medium_term": "Built complete JWT auth system..." (500 tokens),
    "long_term": "Task management SaaS project..." (800 tokens)
  },
  "enhanced_fields": {
    "session_intent": "..." (50 tokens),
    "key_decisions": [...] (200 tokens),
    "current_state": "..." (100 tokens),
    "next_steps": [...] (150 tokens)
  }
}

TOTAL: ~2,000 tokens

Cost at $1/1M tokens: $0.002 per prompt (96% savings!) Latency: Fast processing, instant retrieval

Optimization Strategy

For short conversations (< 20 messages):

# Use full history
history = client.messages.get_history(session_id=session_id, limit=20)
context = "\n".join([f"{msg.role}: {msg.content}" for msg in history.messages])

For medium conversations (20-100 messages):

# Use short + medium summaries
compressed = client.messages.compress_session(session_id=session_id)
context = f"{compressed.summaries.medium_term}\n\nRecent:\n{compressed.summaries.short_term}"

For long conversations (100+ messages):

# Use all three summaries + enhanced fields
compressed = client.messages.compress_session(session_id=session_id)
context = f"""
Project: {compressed.enhanced_fields.project_context.project_name}
Goal: {compressed.enhanced_fields.session_intent}

Full context: {compressed.summaries.long_term}

Recent work: {compressed.summaries.short_term}

Current status: {compressed.enhanced_fields.current_state}
Next steps: {', '.join(compressed.enhanced_fields.next_steps[:3])}
"""

Compression Algorithm

LLM Processing Pipeline

Papr uses Groq's LLaMA-based models with structured output mode for reliable extraction:

15+ Messages

Batch Analysis

Groq LLM
Structured Output

Extract Components

3-Tier Summaries

Session Intent

Key Decisions

Current State

Next Steps

Technical Details

File Operations

Learning Detection

User Preferences

Performance Lessons

Failed Approaches

Cache Everything

Parse Server
ChatSession

Neo4j
MessageSession Node

Structured Prompts

The system uses specialized prompts for each extraction task:

1. Summary Generation:

  • Analyze conversation flow
  • Identify main themes
  • Extract topics discussed
  • Generate three summary levels

2. File Tracking:

  • Detect file path mentions
  • Identify operation types (read/write/create/delete)
  • Extract modification descriptions
  • Track full paths

3. Learning Detection:

  • User preference patterns
  • Performance optimizations discovered
  • Failed approaches to avoid
  • Successful alternatives

4. Enhanced Fields:

  • Extract session intent (goal)
  • Identify key decisions with reasoning
  • Determine current state (working/not working)
  • List actionable next steps
  • Collect technical details

Caching Strategy

First 15 Messages:

Message 1-14: Stored in Parse PostMessage
Message 15: Triggers batch analysis

Background processing (3 seconds):
  - Groq LLM analyzes all messages
  - Generates summaries and enhanced fields
  - Saves to Parse ChatSession
  - Creates Neo4j MessageSession node

Cached for instant retrieval

Messages 16-30:

New messages stored normally

Message 30: Triggers re-analysis
  - Updates existing summaries
  - Appends to medium_term context
  - Updates long_term overview
  - Refreshes enhanced fields

Cache updated

Performance Metrics:

  • Cold start (no cache): ~3 seconds background
  • Warm retrieval (cached): <50ms
  • Cache hit rate: >99% after first generation
  • No repeated LLM calls for same session state

Best Practices

1. Choose the Right Summary Level

# Quick context for immediate continuation
short = compressed.summaries.short_term

# Understanding recent work flow
medium = compressed.summaries.medium_term

# High-level project overview
long = compressed.summaries.long_term

# Comprehensive context (combine all)
full_context = f"{long}\n\nRecent:\n{short}"

Don't rely solely on compression - combine with memory search for best results:

# Get compressed session context
compressed = client.messages.compress_session(session_id=session_id)

# Search for specific relevant memories
memories = client.memory.search(
    query="JWT token refresh implementation details",
    external_user_id=user_id,
    max_memories=5
)

# Combine for LLM
context = f"""
Session context: {compressed.summaries.medium_term}

Relevant memories:
{chr(10).join([m.content for m in memories.data.memories])}

Current task: {compressed.enhanced_fields.current_state}
"""

3. Handle Missing Summaries

try:
    compressed = client.messages.compress_session(session_id=session_id)
    
    if compressed.from_cache:
        # Use cached summary
        context = compressed.summaries.short_term
    else:
        # Just generated, good to use
        context = compressed.summaries.short_term
        
except Exception as e:
    if "not found" in str(e).lower():
        # Summary doesn't exist yet (< 15 messages or still processing)
        # Fall back to recent history
        history = client.messages.get_history(session_id=session_id, limit=10)
        context = "\n".join([f"{msg.role}: {msg.content}" for msg in history.messages])
    else:
        raise

4. Check Cache Status

compressed = client.messages.compress_session(session_id=session_id)

if compressed.from_cache:
    print(f"✅ Near-instant retrieval from cache")
    print(f"📊 Based on {compressed.message_count} messages")
else:
    print(f"🆕 Just generated fresh summary")
    print(f"⏱️ Took ~3 seconds to process")

5. Use Enhanced Fields for Structured Data

# Don't parse summaries manually - use structured fields
compressed = client.messages.compress_session(session_id=session_id)

# ❌ DON'T parse from summary text
# intent = extract_intent_from_text(compressed.summaries.long_term)

# ✅ DO use structured fields
intent = compressed.enhanced_fields.session_intent
decisions = compressed.enhanced_fields.key_decisions
next_steps = compressed.enhanced_fields.next_steps

6. Refresh Strategically

Summaries are automatically updated every 15 messages. Don't manually refresh unnecessarily:

# ✅ GOOD: Fetch once per task
compressed = client.messages.compress_session(session_id=session_id)
# ... use for multiple LLM calls ...

# ❌ BAD: Fetching on every message
for message in new_messages:
    compressed = client.messages.compress_session(session_id=session_id)  # Wasteful!
    # ... use once ...

7. Leverage File Tracking

compressed = client.messages.compress_session(session_id=session_id)

# Show what files were touched
files = compressed.enhanced_fields.files_accessed

print(f"📖 Read: {len(files.read)} files")
print(f"✏️ Modified: {len(files.modified)} files")
print(f"🆕 Created: {len(files.created)} files")
print(f"🗑️ Deleted: {len(files.deleted)} files")

# Use for context
if files.modified:
    recent_changes = [f"{m['path']}: {m['description']}" for m in files.modified]
    context = f"Recently changed:\n" + "\n".join(recent_changes)

Error Handling

Session Not Found

try:
    compressed = client.messages.compress_session(session_id="invalid_session")
except Exception as e:
    if "not found" in str(e).lower():
        print("Session doesn't exist - check session ID")
    else:
        raise

No Summary Yet

try:
    compressed = client.messages.compress_session(session_id=session_id)
except Exception as e:
    if "no summary exists" in str(e).lower():
        print("Summary not generated yet - wait for 15 messages or try again in a few seconds")
        # Fall back to recent history
        history = client.messages.get_history(session_id=session_id, limit=10)
    else:
        raise

Authentication Error

try:
    compressed = client.messages.compress_session(session_id=session_id)
except Exception as e:
    if "authentication" in str(e).lower() or "401" in str(e):
        print("API key invalid or missing")
    else:
        raise

API Reference

Compress Session

Endpoint: GET /v1/messages/sessions/{sessionId}/compress

Parameters:

  • sessionId (path, required): Session identifier

Headers:

  • X-API-Key (required): Your Papr API key
  • X-Client-Type (optional): Client identifier

Response: CompressedSessionResponse

{
  session_id: string;
  summaries: {
    short_term: string;
    medium_term: string;
    long_term: string;
    topics: string[];
    last_updated: string;
  };
  enhanced_fields: {
    session_intent?: string;
    key_decisions: string[];
    current_state?: string;
    next_steps: string[];
    technical_details: string[];
    files_accessed?: {
      read: string[];
      modified: Array<{path: string, description: string}>;
      created: string[];
      deleted: string[];
    };
    project_context?: {
      project_name?: string;
      project_id?: string;
      project_path?: string;
      tech_stack: string[];
      current_task?: string;
      git_repo?: string;
    };
  };
  from_cache: boolean;
  message_count: number;
}

Status Codes:

  • 200 OK - Summary found and returned
  • 404 Not Found - Session doesn't exist or no summary yet
  • 401 Unauthorized - Invalid/missing API key
  • 500 Internal Server Error - Server error

Related Endpoints:

  • POST /v1/messages - Store message (triggers auto-compression at 15 messages)
  • GET /v1/messages/sessions/{sessionId} - Get full conversation history
  • GET /v1/messages/sessions/{sessionId}/status - Check session metadata

Next Steps