Message Compression & Summarization
Intelligent conversation compression that reduces token usage by 96% while preserving critical context.
Overview
When working with long conversations, sending full chat history to LLMs becomes expensive and slow. Papr's message compression system automatically analyzes conversations and generates hierarchical summaries that capture all essential information in a fraction of the tokens.
Key Benefits:
- 🚀 96% token reduction - 50,000 tokens → 2,000 tokens
- ⚡ <50ms response time - Cached summaries return instantly
- 🧠 Intelligent extraction - Session intent, decisions, next steps, file tracking
- 🔄 Automatic generation - Every 15 messages, no manual work
- 📊 Three-tier summaries - Short/medium/long term context
How It Works
Automatic Compression (Every 15 Messages)
The system automatically triggers compression analysis when a session reaches message count thresholds:
What happens:
- User sends 15th message to session
- System triggers batch analysis in background
- Groq LLM analyzes all messages with structured output
- Three-tier summaries generated (short/medium/long)
- Enhanced fields extracted (intent, decisions, state, files, etc.)
- Everything cached in Parse Server
ChatSession MessageSessionnode created in Neo4j with all fields- Future compression requests return instantly from cache
Performance:
- First 15 messages: Background processing (~3 seconds)
- Subsequent calls: <50ms (from cache)
- No repeated LLM calls - generated once, cached forever
On-Demand Compression
Request compression any time using the compress endpoint:
GET /v1/messages/sessions/{sessionId}/compressFlow:
First call (no summary exists):
- Returns 404 status
- Triggers background summary generation
- Takes ~3 seconds to complete
- Next call returns the cached summary
Subsequent calls (summary exists):
- Returns full summary instantly
- ~50ms response time
- Includes
from_cache: trueflag
Three-Tier Summary System
Papr generates three levels of summaries, each optimized for different use cases:
1. Short-Term Summary (Last 15 Messages)
Purpose: Immediate context and current task focus
Contains:
- Most recent conversation state
- Current problem or task being worked on
- Immediate context needed for next interaction
Best for:
- Quick context refresh
- Continuing current task
- Near-term decision making
Example:
"User is debugging JWT token refresh timing issues in their React authentication
flow. The tokens are expiring correctly but the refresh mechanism isn't triggering
at the right time. Currently checking the expiry calculation logic in useAuth.ts."2. Medium-Term Summary (Last ~100 Messages)
Purpose: Recent session history and flow
Contains:
- Multiple related tasks worked on recently
- Patterns across recent work
- Context switches and their reasons
- Recent decisions and their outcomes
Best for:
- Understanding session flow
- Seeing how tasks relate
- Identifying patterns
Example:
"Over the past 100 messages, user has been building a complete JWT authentication
system for their React task app. Started with basic login/signup forms, then
implemented token generation on backend, switched from localStorage to httpOnly
cookies for security, and is now working on token refresh logic. Key challenge
has been getting the timing right for automatic token refresh."3. Long-Term Summary (Full Session)
Purpose: Complete conversation arc and project overview
Contains:
- Entire project context from start
- All major decisions and their reasoning
- Full technical stack and architecture
- Complete timeline of work
- Big-picture project goals
Best for:
- High-level project understanding
- Project documentation
- Context for new team members
- Historical reference
Example:
"Full project: User is building a task management SaaS application with React
frontend and Node.js backend. Started by asking about React best practices, then
architected the application structure, implemented database schema with PostgreSQL,
built REST API with Express, created authentication system with JWT tokens, and is
now working on the authorization layer with role-based access control. Tech stack:
React, TypeScript, Node.js, Express, PostgreSQL, JWT. Current phase: Security
implementation."Enhanced Fields
Beyond basic summaries, Papr extracts structured information that provides rich context:
Session Intent
What it is: The user's primary goal or what they're trying to accomplish
Format: 1-2 sentence clear statement
Example:
{
"session_intent": "Build secure JWT authentication for task management app with httpOnly cookies and automatic token refresh"
}Use for:
- Understanding user's end goal
- Keeping AI responses aligned with intent
- Project documentation
Key Decisions
What it is: Important choices made during the conversation with reasoning
Format: Array of decision statements including "why"
Example:
{
"key_decisions": [
"Use httpOnly cookies instead of localStorage for JWT storage (XSS protection)",
"Implement 5-minute refresh buffer before token expiry (prevent auth interruptions)",
"Choose React Context over Redux for auth state (simpler for this use case)",
"Set token expiry to 1 hour with 7-day refresh token (balance security and UX)"
]
}Use for:
- Understanding past choices
- Avoiding decision flip-flops
- Documenting architecture decisions
- Onboarding new developers
Current State
What it is: What's working and what's not working right now
Format: String describing current status
Example:
{
"current_state": "Working: JWT generation and validation, httpOnly cookie storage, login/logout flow. Not working: Token refresh timing is off by ~2 minutes, protected routes occasionally redirect to login incorrectly"
}Use for:
- Quick status check
- Identifying blockers
- Prioritizing next work
- Bug tracking
Next Steps
What it is: Specific actionable items to work on next
Format: Array of 3-5 concrete action items
Example:
{
"next_steps": [
"Debug token expiry calculation in useAuth.ts (check timezone handling)",
"Add loading states to prevent double-refresh attempts",
"Implement ProtectedRoute component with proper redirect logic",
"Write integration tests for token refresh flow",
"Add error handling for network failures during refresh"
]
}Use for:
- Clear action plan
- AI agent task planning
- Project management
- Avoiding paralysis
Technical Details
What it is: Important technical specifics like endpoints, configs, error messages
Format: Array of technical facts
Example:
{
"technical_details": [
"Token expiry: 3600 seconds (1 hour)",
"Refresh token expiry: 604800 seconds (7 days)",
"Login endpoint: POST /api/auth/login",
"Refresh endpoint: POST /api/auth/refresh",
"Cookie name: 'auth_token'",
"Cookie settings: httpOnly=true, secure=true, sameSite=strict",
"Error code for expired tokens: ERR_TOKEN_EXPIRED",
"Auth hook: useAuth() from src/hooks/useAuth.ts"
]
}Use for:
- Quick reference for configs
- Debugging error messages
- Documentation
- New developer onboarding
Files Accessed
What it is: Complete tracking of file operations during conversation
Format: Object with arrays for read/modified/created/deleted
Example:
{
"files_accessed": {
"read": [
"src/components/Login.tsx",
"src/api/auth.ts",
"src/utils/tokenValidation.ts"
],
"modified": [
{
"path": "src/components/Login.tsx",
"description": "Added form validation with Yup schema and error display"
},
{
"path": "src/api/auth.ts",
"description": "Updated token handling to use httpOnly cookies instead of localStorage"
},
{
"path": "src/hooks/useAuth.ts",
"description": "Fixed token refresh timing calculation"
}
],
"created": [
"src/hooks/useAuth.ts",
"src/utils/tokenStorage.ts",
"src/components/ProtectedRoute.tsx"
],
"deleted": [
"src/utils/localStorageAuth.ts"
]
}
}Use for:
- Understanding what was changed
- Code review context
- Git commit messages
- Rollback decisions
Project Context
What it is: High-level project information and metadata
Format: Object with project details
Example:
{
"project_context": {
"project_name": "Task Management SaaS",
"project_id": "proj_task_management",
"project_path": "/Users/dev/projects/task-app",
"tech_stack": ["React", "TypeScript", "Node.js", "Express", "PostgreSQL", "JWT"],
"current_task": "Implementing JWT authentication with httpOnly cookies and token refresh",
"git_repo": "https://github.com/user/task-app"
}
}Use for:
- Project identification
- Tech stack reference
- Context switching between projects
- Team collaboration
Response Format
Complete Response Structure
{
"session_id": "session_abc123",
"summaries": {
"short_term": "User is debugging token refresh timing...",
"medium_term": "Over the past 100 messages, user has built...",
"long_term": "Full project: Task management SaaS with React...",
"topics": ["React", "JWT", "Authentication", "TypeScript", "Security"],
"last_updated": "2026-02-11T14:30:00Z"
},
"enhanced_fields": {
"session_intent": "Build secure JWT authentication...",
"key_decisions": [
"Use httpOnly cookies for XSS protection",
"Implement 5-minute refresh buffer"
],
"current_state": "Working: JWT generation. Not working: Refresh timing",
"next_steps": [
"Debug expiry calculation",
"Add loading states",
"Write integration tests"
],
"technical_details": [
"Token expiry: 3600 seconds",
"Refresh endpoint: POST /api/auth/refresh"
],
"files_accessed": {
"read": ["src/api/auth.ts"],
"modified": [
{"path": "src/hooks/useAuth.ts", "description": "Fixed timing"}
],
"created": ["src/components/ProtectedRoute.tsx"],
"deleted": []
},
"project_context": {
"project_name": "Task Management SaaS",
"tech_stack": ["React", "TypeScript", "Node.js"],
"current_task": "JWT authentication implementation"
}
},
"from_cache": true,
"message_count": 47
}Field Availability
| Field | Always Present | Available After |
|---|---|---|
session_id | ✅ Yes | First message |
summaries.short_term | ✅ Yes | 15 messages |
summaries.medium_term | ✅ Yes | 15 messages |
summaries.long_term | ✅ Yes | 15 messages |
summaries.topics | ✅ Yes | 15 messages |
enhanced_fields.* | ⚠️ Optional | 15 messages |
from_cache | ✅ Yes | Always |
message_count | ✅ Yes | Always |
Usage Examples
Python
from papr_memory import Papr
import os
client = Papr(x_api_key=os.environ.get("PAPR_MEMORY_API_KEY"))
# Get compressed context
compressed = client.messages.compress_session(
session_id="session_abc123"
)
# Check if from cache
if compressed.from_cache:
print("✅ Near-instant response from cache")
else:
print("🆕 Just generated")
# Access three-tier summaries
print(f"Recent: {compressed.summaries.short_term}")
print(f"Session: {compressed.summaries.medium_term}")
print(f"Full: {compressed.summaries.long_term}")
# Access enhanced fields
print(f"Goal: {compressed.enhanced_fields.session_intent}")
print(f"Status: {compressed.enhanced_fields.current_state}")
print(f"Next: {compressed.enhanced_fields.next_steps[0]}")
# Use in LLM prompt
system_prompt = f"""
You are helping with: {compressed.enhanced_fields.project_context.project_name}
Tech stack: {', '.join(compressed.enhanced_fields.project_context.tech_stack)}
Recent work:
{compressed.summaries.short_term}
Key decisions made:
{chr(10).join(f'- {d}' for d in compressed.enhanced_fields.key_decisions)}
Current status: {compressed.enhanced_fields.current_state}
Next steps:
{chr(10).join(f'{i+1}. {s}' for i, s in enumerate(compressed.enhanced_fields.next_steps))}
"""
# Feed to your LLM
response = your_llm.chat(
system=system_prompt,
user="How do I fix the token refresh timing issue?"
)TypeScript
import Papr from '@papr/memory';
const client = new Papr({
xAPIKey: process.env.PAPR_MEMORY_API_KEY
});
// Get compressed context
const compressed = await client.messages.compressSession({
session_id: "session_abc123"
});
// Check if from cache
if (compressed.from_cache) {
console.log('✅ Near-instant response from cache');
} else {
console.log('🆕 Just generated');
}
// Access three-tier summaries
console.log(`Recent: ${compressed.summaries.short_term}`);
console.log(`Session: ${compressed.summaries.medium_term}`);
console.log(`Full: ${compressed.summaries.long_term}`);
// Access enhanced fields
console.log(`Goal: ${compressed.enhanced_fields.session_intent}`);
console.log(`Status: ${compressed.enhanced_fields.current_state}`);
console.log(`Next: ${compressed.enhanced_fields.next_steps[0]}`);
// Use in LLM prompt
const systemPrompt = `
You are helping with: ${compressed.enhanced_fields.project_context.project_name}
Tech stack: ${compressed.enhanced_fields.project_context.tech_stack.join(', ')}
Recent work:
${compressed.summaries.short_term}
Key decisions made:
${compressed.enhanced_fields.key_decisions.map(d => `- ${d}`).join('\n')}
Current status: ${compressed.enhanced_fields.current_state}
Next steps:
${compressed.enhanced_fields.next_steps.map((s, i) => `${i+1}. ${s}`).join('\n')}
`;
// Feed to your LLM
const response = await yourLLM.chat({
system: systemPrompt,
user: "How do I fix the token refresh timing issue?"
});cURL
# Get compressed summary
curl -X GET "https://memory.papr.ai/v1/messages/sessions/session_abc123/compress" \
-H "X-API-Key: YOUR_API_KEY" \
-H "X-Client-Type: curl"
# Response (if summary exists)
{
"session_id": "session_abc123",
"summaries": { ... },
"enhanced_fields": { ... },
"from_cache": true,
"message_count": 47
}
# Response (if no summary yet)
{
"detail": "Session not found or no summary exists"
}
# (Wait a few seconds and try again - background processing triggered)Token Optimization
Before Compression
Full conversation history for a 50-message session:
Message 1: User: "I'm building a React app..." (150 tokens)
Message 2: Assistant: "Great! Here's how..." (300 tokens)
Message 3: User: "How do I add JWT auth?" (80 tokens)
...
Message 50: Assistant: "To fix that, try..." (250 tokens)
TOTAL: ~50,000 tokensCost at $1/1M tokens: $0.05 per prompt Latency: Slow processing of large context
After Compression
Compressed context using hierarchical summaries:
{
"summaries": {
"short_term": "User debugging token refresh..." (200 tokens),
"medium_term": "Built complete JWT auth system..." (500 tokens),
"long_term": "Task management SaaS project..." (800 tokens)
},
"enhanced_fields": {
"session_intent": "..." (50 tokens),
"key_decisions": [...] (200 tokens),
"current_state": "..." (100 tokens),
"next_steps": [...] (150 tokens)
}
}
TOTAL: ~2,000 tokensCost at $1/1M tokens: $0.002 per prompt (96% savings!) Latency: Fast processing, instant retrieval
Optimization Strategy
For short conversations (< 20 messages):
# Use full history
history = client.messages.get_history(session_id=session_id, limit=20)
context = "\n".join([f"{msg.role}: {msg.content}" for msg in history.messages])For medium conversations (20-100 messages):
# Use short + medium summaries
compressed = client.messages.compress_session(session_id=session_id)
context = f"{compressed.summaries.medium_term}\n\nRecent:\n{compressed.summaries.short_term}"For long conversations (100+ messages):
# Use all three summaries + enhanced fields
compressed = client.messages.compress_session(session_id=session_id)
context = f"""
Project: {compressed.enhanced_fields.project_context.project_name}
Goal: {compressed.enhanced_fields.session_intent}
Full context: {compressed.summaries.long_term}
Recent work: {compressed.summaries.short_term}
Current status: {compressed.enhanced_fields.current_state}
Next steps: {', '.join(compressed.enhanced_fields.next_steps[:3])}
"""Compression Algorithm
LLM Processing Pipeline
Papr uses Groq's LLaMA-based models with structured output mode for reliable extraction:
Structured Prompts
The system uses specialized prompts for each extraction task:
1. Summary Generation:
- Analyze conversation flow
- Identify main themes
- Extract topics discussed
- Generate three summary levels
2. File Tracking:
- Detect file path mentions
- Identify operation types (read/write/create/delete)
- Extract modification descriptions
- Track full paths
3. Learning Detection:
- User preference patterns
- Performance optimizations discovered
- Failed approaches to avoid
- Successful alternatives
4. Enhanced Fields:
- Extract session intent (goal)
- Identify key decisions with reasoning
- Determine current state (working/not working)
- List actionable next steps
- Collect technical details
Caching Strategy
First 15 Messages:
Message 1-14: Stored in Parse PostMessage
Message 15: Triggers batch analysis
↓
Background processing (3 seconds):
- Groq LLM analyzes all messages
- Generates summaries and enhanced fields
- Saves to Parse ChatSession
- Creates Neo4j MessageSession node
↓
Cached for instant retrievalMessages 16-30:
New messages stored normally
↓
Message 30: Triggers re-analysis
- Updates existing summaries
- Appends to medium_term context
- Updates long_term overview
- Refreshes enhanced fields
↓
Cache updatedPerformance Metrics:
- Cold start (no cache): ~3 seconds background
- Warm retrieval (cached): <50ms
- Cache hit rate: >99% after first generation
- No repeated LLM calls for same session state
Best Practices
1. Choose the Right Summary Level
# Quick context for immediate continuation
short = compressed.summaries.short_term
# Understanding recent work flow
medium = compressed.summaries.medium_term
# High-level project overview
long = compressed.summaries.long_term
# Comprehensive context (combine all)
full_context = f"{long}\n\nRecent:\n{short}"2. Combine with Memory Search
Don't rely solely on compression - combine with memory search for best results:
# Get compressed session context
compressed = client.messages.compress_session(session_id=session_id)
# Search for specific relevant memories
memories = client.memory.search(
query="JWT token refresh implementation details",
external_user_id=user_id,
max_memories=5
)
# Combine for LLM
context = f"""
Session context: {compressed.summaries.medium_term}
Relevant memories:
{chr(10).join([m.content for m in memories.data.memories])}
Current task: {compressed.enhanced_fields.current_state}
"""3. Handle Missing Summaries
try:
compressed = client.messages.compress_session(session_id=session_id)
if compressed.from_cache:
# Use cached summary
context = compressed.summaries.short_term
else:
# Just generated, good to use
context = compressed.summaries.short_term
except Exception as e:
if "not found" in str(e).lower():
# Summary doesn't exist yet (< 15 messages or still processing)
# Fall back to recent history
history = client.messages.get_history(session_id=session_id, limit=10)
context = "\n".join([f"{msg.role}: {msg.content}" for msg in history.messages])
else:
raise4. Check Cache Status
compressed = client.messages.compress_session(session_id=session_id)
if compressed.from_cache:
print(f"✅ Near-instant retrieval from cache")
print(f"📊 Based on {compressed.message_count} messages")
else:
print(f"🆕 Just generated fresh summary")
print(f"⏱️ Took ~3 seconds to process")5. Use Enhanced Fields for Structured Data
# Don't parse summaries manually - use structured fields
compressed = client.messages.compress_session(session_id=session_id)
# ❌ DON'T parse from summary text
# intent = extract_intent_from_text(compressed.summaries.long_term)
# ✅ DO use structured fields
intent = compressed.enhanced_fields.session_intent
decisions = compressed.enhanced_fields.key_decisions
next_steps = compressed.enhanced_fields.next_steps6. Refresh Strategically
Summaries are automatically updated every 15 messages. Don't manually refresh unnecessarily:
# ✅ GOOD: Fetch once per task
compressed = client.messages.compress_session(session_id=session_id)
# ... use for multiple LLM calls ...
# ❌ BAD: Fetching on every message
for message in new_messages:
compressed = client.messages.compress_session(session_id=session_id) # Wasteful!
# ... use once ...7. Leverage File Tracking
compressed = client.messages.compress_session(session_id=session_id)
# Show what files were touched
files = compressed.enhanced_fields.files_accessed
print(f"📖 Read: {len(files.read)} files")
print(f"✏️ Modified: {len(files.modified)} files")
print(f"🆕 Created: {len(files.created)} files")
print(f"🗑️ Deleted: {len(files.deleted)} files")
# Use for context
if files.modified:
recent_changes = [f"{m['path']}: {m['description']}" for m in files.modified]
context = f"Recently changed:\n" + "\n".join(recent_changes)Error Handling
Session Not Found
try:
compressed = client.messages.compress_session(session_id="invalid_session")
except Exception as e:
if "not found" in str(e).lower():
print("Session doesn't exist - check session ID")
else:
raiseNo Summary Yet
try:
compressed = client.messages.compress_session(session_id=session_id)
except Exception as e:
if "no summary exists" in str(e).lower():
print("Summary not generated yet - wait for 15 messages or try again in a few seconds")
# Fall back to recent history
history = client.messages.get_history(session_id=session_id, limit=10)
else:
raiseAuthentication Error
try:
compressed = client.messages.compress_session(session_id=session_id)
except Exception as e:
if "authentication" in str(e).lower() or "401" in str(e):
print("API key invalid or missing")
else:
raiseAPI Reference
Compress Session
Endpoint: GET /v1/messages/sessions/{sessionId}/compress
Parameters:
sessionId(path, required): Session identifier
Headers:
X-API-Key(required): Your Papr API keyX-Client-Type(optional): Client identifier
Response: CompressedSessionResponse
{
session_id: string;
summaries: {
short_term: string;
medium_term: string;
long_term: string;
topics: string[];
last_updated: string;
};
enhanced_fields: {
session_intent?: string;
key_decisions: string[];
current_state?: string;
next_steps: string[];
technical_details: string[];
files_accessed?: {
read: string[];
modified: Array<{path: string, description: string}>;
created: string[];
deleted: string[];
};
project_context?: {
project_name?: string;
project_id?: string;
project_path?: string;
tech_stack: string[];
current_task?: string;
git_repo?: string;
};
};
from_cache: boolean;
message_count: number;
}Status Codes:
200 OK- Summary found and returned404 Not Found- Session doesn't exist or no summary yet401 Unauthorized- Invalid/missing API key500 Internal Server Error- Server error
Related Endpoints:
POST /v1/messages- Store message (triggers auto-compression at 15 messages)GET /v1/messages/sessions/{sessionId}- Get full conversation historyGET /v1/messages/sessions/{sessionId}/status- Check session metadata
Next Steps
- Messages Management Guide - Full Messages API documentation
- Context Handling - Strategies for context management
- Chat History Tutorial - Build a chat app with compression
- API Reference - Complete API documentation