Message Compression & Summarization

Intelligent conversation compression that reduces token usage by 96% while preserving critical context.

Overview

When working with long conversations, sending full chat history to LLMs becomes expensive and slow. Papr's message compression system automatically analyzes conversations and generates hierarchical summaries that capture all essential information in a fraction of the tokens.

Key Benefits:

🚀 96% token reduction - 50,000 tokens → 2,000 tokens
⚡ <50ms response time - Cached summaries return instantly
🧠 Intelligent extraction - Session intent, decisions, next steps, file tracking
🔄 Automatic generation - Every 15 messages, no manual work
📊 Three-tier summaries - Short/medium/long term context

How It Works

Automatic Compression (Every 15 Messages)

The system automatically triggers compression analysis when a session reaches message count thresholds:

What happens:

User sends 15th message to session
System triggers batch analysis in background
Groq LLM analyzes all messages with structured output
Three-tier summaries generated (short/medium/long)
Enhanced fields extracted (intent, decisions, state, files, etc.)
Everything cached in Parse Server ChatSession
MessageSession node created in Neo4j with all fields
Future compression requests return instantly from cache

Performance:

First 15 messages: Background processing (~3 seconds)
Subsequent calls: <50ms (from cache)
No repeated LLM calls - generated once, cached forever

On-Demand Compression

Request compression any time using the compress endpoint:

GET /v1/messages/sessions/{sessionId}/compress

Flow:

First call (no summary exists):

Returns 404 status
Triggers background summary generation
Takes ~3 seconds to complete
Next call returns the cached summary

Subsequent calls (summary exists):

Returns full summary instantly
~50ms response time
Includes from_cache: true flag

Three-Tier Summary System

Papr generates three levels of summaries, each optimized for different use cases:

1. Short-Term Summary (Last 15 Messages)

Purpose: Immediate context and current task focus

Contains:

Most recent conversation state
Current problem or task being worked on
Immediate context needed for next interaction

Best for:

Quick context refresh
Continuing current task
Near-term decision making

Example:

"User is debugging JWT token refresh timing issues in their React authentication 
flow. The tokens are expiring correctly but the refresh mechanism isn't triggering 
at the right time. Currently checking the expiry calculation logic in useAuth.ts."

2. Medium-Term Summary (Last ~100 Messages)

Purpose: Recent session history and flow

Contains:

Multiple related tasks worked on recently
Patterns across recent work
Context switches and their reasons
Recent decisions and their outcomes

Best for:

Understanding session flow
Seeing how tasks relate
Identifying patterns

Example:

"Over the past 100 messages, user has been building a complete JWT authentication 
system for their React task app. Started with basic login/signup forms, then 
implemented token generation on backend, switched from localStorage to httpOnly 
cookies for security, and is now working on token refresh logic. Key challenge 
has been getting the timing right for automatic token refresh."

3. Long-Term Summary (Full Session)

Purpose: Complete conversation arc and project overview

Contains:

Entire project context from start
All major decisions and their reasoning
Full technical stack and architecture
Complete timeline of work
Big-picture project goals

Best for:

High-level project understanding
Project documentation
Context for new team members
Historical reference

Example:

"Full project: User is building a task management SaaS application with React 
frontend and Node.js backend. Started by asking about React best practices, then 
architected the application structure, implemented database schema with PostgreSQL, 
built REST API with Express, created authentication system with JWT tokens, and is 
now working on the authorization layer with role-based access control. Tech stack: 
React, TypeScript, Node.js, Express, PostgreSQL, JWT. Current phase: Security 
implementation."

Enhanced Fields

Beyond basic summaries, Papr extracts structured information that provides rich context:

Session Intent

What it is: The user's primary goal or what they're trying to accomplish

Format: 1-2 sentence clear statement

Example:

{
  "session_intent": "Build secure JWT authentication for task management app with httpOnly cookies and automatic token refresh"
}

Use for:

Understanding user's end goal
Keeping AI responses aligned with intent
Project documentation

Key Decisions

What it is: Important choices made during the conversation with reasoning

Format: Array of decision statements including "why"

Example:

{
  "key_decisions": [
    "Use httpOnly cookies instead of localStorage for JWT storage (XSS protection)",
    "Implement 5-minute refresh buffer before token expiry (prevent auth interruptions)",
    "Choose React Context over Redux for auth state (simpler for this use case)",
    "Set token expiry to 1 hour with 7-day refresh token (balance security and UX)"
  ]
}

Use for:

Understanding past choices
Avoiding decision flip-flops
Documenting architecture decisions
Onboarding new developers

Current State

What it is: What's working and what's not working right now

Format: String describing current status

Example:

{
  "current_state": "Working: JWT generation and validation, httpOnly cookie storage, login/logout flow. Not working: Token refresh timing is off by ~2 minutes, protected routes occasionally redirect to login incorrectly"
}

Use for:

Quick status check
Identifying blockers
Prioritizing next work
Bug tracking

Next Steps

What it is: Specific actionable items to work on next

Format: Array of 3-5 concrete action items

Example:

{
  "next_steps": [
    "Debug token expiry calculation in useAuth.ts (check timezone handling)",
    "Add loading states to prevent double-refresh attempts",
    "Implement ProtectedRoute component with proper redirect logic",
    "Write integration tests for token refresh flow",
    "Add error handling for network failures during refresh"
  ]
}

Use for:

Clear action plan
AI agent task planning
Project management
Avoiding paralysis

Technical Details

What it is: Important technical specifics like endpoints, configs, error messages

Format: Array of technical facts

Example:

{
  "technical_details": [
    "Token expiry: 3600 seconds (1 hour)",
    "Refresh token expiry: 604800 seconds (7 days)",
    "Login endpoint: POST /api/auth/login",
    "Refresh endpoint: POST /api/auth/refresh",
    "Cookie name: 'auth_token'",
    "Cookie settings: httpOnly=true, secure=true, sameSite=strict",
    "Error code for expired tokens: ERR_TOKEN_EXPIRED",
    "Auth hook: useAuth() from src/hooks/useAuth.ts"
  ]
}

Use for:

Quick reference for configs
Debugging error messages
Documentation
New developer onboarding

Files Accessed

What it is: Complete tracking of file operations during conversation

Format: Object with arrays for read/modified/created/deleted

Example:

{
  "files_accessed": {
    "read": [
      "src/components/Login.tsx",
      "src/api/auth.ts",
      "src/utils/tokenValidation.ts"
    ],
    "modified": [
      {
        "path": "src/components/Login.tsx",
        "description": "Added form validation with Yup schema and error display"
      },
      {
        "path": "src/api/auth.ts",
        "description": "Updated token handling to use httpOnly cookies instead of localStorage"
      },
      {
        "path": "src/hooks/useAuth.ts",
        "description": "Fixed token refresh timing calculation"
      }
    ],
    "created": [
      "src/hooks/useAuth.ts",
      "src/utils/tokenStorage.ts",
      "src/components/ProtectedRoute.tsx"
    ],
    "deleted": [
      "src/utils/localStorageAuth.ts"
    ]
  }
}

Use for:

Understanding what was changed
Code review context
Git commit messages
Rollback decisions

Project Context

What it is: High-level project information and metadata

Format: Object with project details

Example:

{
  "project_context": {
    "project_name": "Task Management SaaS",
    "project_id": "proj_task_management",
    "project_path": "/Users/dev/projects/task-app",
    "tech_stack": ["React", "TypeScript", "Node.js", "Express", "PostgreSQL", "JWT"],
    "current_task": "Implementing JWT authentication with httpOnly cookies and token refresh",
    "git_repo": "https://github.com/user/task-app"
  }
}

Use for:

Project identification
Tech stack reference
Context switching between projects
Team collaboration

Response Format

Complete Response Structure

{
  "session_id": "session_abc123",
  "summaries": {
    "short_term": "User is debugging token refresh timing...",
    "medium_term": "Over the past 100 messages, user has built...",
    "long_term": "Full project: Task management SaaS with React...",
    "topics": ["React", "JWT", "Authentication", "TypeScript", "Security"],
    "last_updated": "2026-02-11T14:30:00Z"
  },
  "enhanced_fields": {
    "session_intent": "Build secure JWT authentication...",
    "key_decisions": [
      "Use httpOnly cookies for XSS protection",
      "Implement 5-minute refresh buffer"
    ],
    "current_state": "Working: JWT generation. Not working: Refresh timing",
    "next_steps": [
      "Debug expiry calculation",
      "Add loading states",
      "Write integration tests"
    ],
    "technical_details": [
      "Token expiry: 3600 seconds",
      "Refresh endpoint: POST /api/auth/refresh"
    ],
    "files_accessed": {
      "read": ["src/api/auth.ts"],
      "modified": [
        {"path": "src/hooks/useAuth.ts", "description": "Fixed timing"}
      ],
      "created": ["src/components/ProtectedRoute.tsx"],
      "deleted": []
    },
    "project_context": {
      "project_name": "Task Management SaaS",
      "tech_stack": ["React", "TypeScript", "Node.js"],
      "current_task": "JWT authentication implementation"
    }
  },
  "from_cache": true,
  "message_count": 47
}

Field Availability

Field	Always Present	Available After
`session_id`	✅ Yes	First message
`summaries.short_term`	✅ Yes	15 messages
`summaries.medium_term`	✅ Yes	15 messages
`summaries.long_term`	✅ Yes	15 messages
`summaries.topics`	✅ Yes	15 messages
`enhanced_fields.*`	⚠️ Optional	15 messages
`from_cache`	✅ Yes	Always
`message_count`	✅ Yes	Always

Usage Examples

Python

from papr_memory import Papr
import os

client = Papr(x_api_key=os.environ.get("PAPR_MEMORY_API_KEY"))

# Get compressed context
compressed = client.messages.compress_session(
    session_id="session_abc123"
)

# Check if from cache
if compressed.from_cache:
    print("✅ Near-instant response from cache")
else:
    print("🆕 Just generated")

# Access three-tier summaries
print(f"Recent: {compressed.summaries.short_term}")
print(f"Session: {compressed.summaries.medium_term}")
print(f"Full: {compressed.summaries.long_term}")

# Access enhanced fields
print(f"Goal: {compressed.enhanced_fields.session_intent}")
print(f"Status: {compressed.enhanced_fields.current_state}")
print(f"Next: {compressed.enhanced_fields.next_steps[0]}")

# Use in LLM prompt
system_prompt = f"""
You are helping with: {compressed.enhanced_fields.project_context.project_name}
Tech stack: {', '.join(compressed.enhanced_fields.project_context.tech_stack)}

Recent work:
{compressed.summaries.short_term}

Key decisions made:
{chr(10).join(f'- {d}' for d in compressed.enhanced_fields.key_decisions)}

Current status: {compressed.enhanced_fields.current_state}

Next steps:
{chr(10).join(f'{i+1}. {s}' for i, s in enumerate(compressed.enhanced_fields.next_steps))}
"""

# Feed to your LLM
response = your_llm.chat(
    system=system_prompt,
    user="How do I fix the token refresh timing issue?"
)

TypeScript

import Papr from '@papr/memory';

const client = new Papr({
  xAPIKey: process.env.PAPR_MEMORY_API_KEY
});

// Get compressed context
const compressed = await client.messages.compressSession({
  session_id: "session_abc123"
});

// Check if from cache
if (compressed.from_cache) {
    console.log('✅ Near-instant response from cache');
} else {
  console.log('🆕 Just generated');
}

// Access three-tier summaries
console.log(`Recent: ${compressed.summaries.short_term}`);
console.log(`Session: ${compressed.summaries.medium_term}`);
console.log(`Full: ${compressed.summaries.long_term}`);

// Access enhanced fields
console.log(`Goal: ${compressed.enhanced_fields.session_intent}`);
console.log(`Status: ${compressed.enhanced_fields.current_state}`);
console.log(`Next: ${compressed.enhanced_fields.next_steps[0]}`);

// Use in LLM prompt
const systemPrompt = `
You are helping with: ${compressed.enhanced_fields.project_context.project_name}
Tech stack: ${compressed.enhanced_fields.project_context.tech_stack.join(', ')}

Recent work:
${compressed.summaries.short_term}

Key decisions made:
${compressed.enhanced_fields.key_decisions.map(d => `- ${d}`).join('\n')}

Current status: ${compressed.enhanced_fields.current_state}

Next steps:
${compressed.enhanced_fields.next_steps.map((s, i) => `${i+1}. ${s}`).join('\n')}
`;

// Feed to your LLM
const response = await yourLLM.chat({
  system: systemPrompt,
  user: "How do I fix the token refresh timing issue?"
});

cURL

# Get compressed summary
curl -X GET "https://memory.papr.ai/v1/messages/sessions/session_abc123/compress" \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "X-Client-Type: curl"

# Response (if summary exists)
{
  "session_id": "session_abc123",
  "summaries": { ... },
  "enhanced_fields": { ... },
  "from_cache": true,
  "message_count": 47
}

# Response (if no summary yet)
{
  "detail": "Session not found or no summary exists"
}
# (Wait a few seconds and try again - background processing triggered)

Token Optimization

Before Compression

Full conversation history for a 50-message session:

Message 1: User: "I'm building a React app..." (150 tokens)
Message 2: Assistant: "Great! Here's how..." (300 tokens)
Message 3: User: "How do I add JWT auth?" (80 tokens)
...
Message 50: Assistant: "To fix that, try..." (250 tokens)

TOTAL: ~50,000 tokens

Cost at $1/1M tokens: $0.05 per prompt Latency: Slow processing of large context

After Compression

Compressed context using hierarchical summaries:

{
  "summaries": {
    "short_term": "User debugging token refresh..." (200 tokens),
    "medium_term": "Built complete JWT auth system..." (500 tokens),
    "long_term": "Task management SaaS project..." (800 tokens)
  },
  "enhanced_fields": {
    "session_intent": "..." (50 tokens),
    "key_decisions": [...] (200 tokens),
    "current_state": "..." (100 tokens),
    "next_steps": [...] (150 tokens)
  }
}

TOTAL: ~2,000 tokens

Cost at $1/1M tokens: $0.002 per prompt (96% savings!) Latency: Fast processing, instant retrieval

Optimization Strategy

For short conversations (< 20 messages):

# Use full history
history = client.messages.get_history(session_id=session_id, limit=20)
context = "\n".join([f"{msg.role}: {msg.content}" for msg in history.messages])

For medium conversations (20-100 messages):

# Use short + medium summaries
compressed = client.messages.compress_session(session_id=session_id)
context = f"{compressed.summaries.medium_term}\n\nRecent:\n{compressed.summaries.short_term}"

For long conversations (100+ messages):

# Use all three summaries + enhanced fields
compressed = client.messages.compress_session(session_id=session_id)
context = f"""
Project: {compressed.enhanced_fields.project_context.project_name}
Goal: {compressed.enhanced_fields.session_intent}

Full context: {compressed.summaries.long_term}

Recent work: {compressed.summaries.short_term}

Current status: {compressed.enhanced_fields.current_state}
Next steps: {', '.join(compressed.enhanced_fields.next_steps[:3])}
"""

Compression Algorithm

LLM Processing Pipeline

Papr uses Groq's LLaMA-based models with structured output mode for reliable extraction:

Structured Prompts

The system uses specialized prompts for each extraction task:

1. Summary Generation:

Analyze conversation flow
Identify main themes
Extract topics discussed
Generate three summary levels

2. File Tracking:

Detect file path mentions
Identify operation types (read/write/create/delete)
Extract modification descriptions
Track full paths

3. Learning Detection:

User preference patterns
Performance optimizations discovered
Failed approaches to avoid
Successful alternatives

4. Enhanced Fields:

Extract session intent (goal)
Identify key decisions with reasoning
Determine current state (working/not working)
List actionable next steps
Collect technical details

Caching Strategy

First 15 Messages:

Message 1-14: Stored in Parse PostMessage
Message 15: Triggers batch analysis
  ↓
Background processing (3 seconds):
  - Groq LLM analyzes all messages
  - Generates summaries and enhanced fields
  - Saves to Parse ChatSession
  - Creates Neo4j MessageSession node
  ↓
Cached for instant retrieval

Messages 16-30:

New messages stored normally
  ↓
Message 30: Triggers re-analysis
  - Updates existing summaries
  - Appends to medium_term context
  - Updates long_term overview
  - Refreshes enhanced fields
  ↓
Cache updated

Performance Metrics:

Cold start (no cache): ~3 seconds background
Warm retrieval (cached): <50ms
Cache hit rate: >99% after first generation
No repeated LLM calls for same session state

Best Practices

1. Choose the Right Summary Level

# Quick context for immediate continuation
short = compressed.summaries.short_term

# Understanding recent work flow
medium = compressed.summaries.medium_term

# High-level project overview
long = compressed.summaries.long_term

# Comprehensive context (combine all)
full_context = f"{long}\n\nRecent:\n{short}"

2. Combine with Memory Search

Don't rely solely on compression - combine with memory search for best results:

# Get compressed session context
compressed = client.messages.compress_session(session_id=session_id)

# Search for specific relevant memories
memories = client.memory.search(
    query="JWT token refresh implementation details",
    external_user_id=user_id,
    max_memories=5
)

# Combine for LLM
context = f"""
Session context: {compressed.summaries.medium_term}

Relevant memories:
{chr(10).join([m.content for m in memories.data.memories])}

Current task: {compressed.enhanced_fields.current_state}
"""

3. Handle Missing Summaries

try:
    compressed = client.messages.compress_session(session_id=session_id)
    
    if compressed.from_cache:
        # Use cached summary
        context = compressed.summaries.short_term
    else:
        # Just generated, good to use
        context = compressed.summaries.short_term
        
except Exception as e:
    if "not found" in str(e).lower():
        # Summary doesn't exist yet (< 15 messages or still processing)
        # Fall back to recent history
        history = client.messages.get_history(session_id=session_id, limit=10)
        context = "\n".join([f"{msg.role}: {msg.content}" for msg in history.messages])
    else:
        raise

4. Check Cache Status

compressed = client.messages.compress_session(session_id=session_id)

if compressed.from_cache:
    print(f"✅ Near-instant retrieval from cache")
    print(f"📊 Based on {compressed.message_count} messages")
else:
    print(f"🆕 Just generated fresh summary")
    print(f"⏱️ Took ~3 seconds to process")

5. Use Enhanced Fields for Structured Data

# Don't parse summaries manually - use structured fields
compressed = client.messages.compress_session(session_id=session_id)

# ❌ DON'T parse from summary text
# intent = extract_intent_from_text(compressed.summaries.long_term)

# ✅ DO use structured fields
intent = compressed.enhanced_fields.session_intent
decisions = compressed.enhanced_fields.key_decisions
next_steps = compressed.enhanced_fields.next_steps

6. Refresh Strategically

Summaries are automatically updated every 15 messages. Don't manually refresh unnecessarily:

# ✅ GOOD: Fetch once per task
compressed = client.messages.compress_session(session_id=session_id)
# ... use for multiple LLM calls ...

# ❌ BAD: Fetching on every message
for message in new_messages:
    compressed = client.messages.compress_session(session_id=session_id)  # Wasteful!
    # ... use once ...

7. Leverage File Tracking

compressed = client.messages.compress_session(session_id=session_id)

# Show what files were touched
files = compressed.enhanced_fields.files_accessed

print(f"📖 Read: {len(files.read)} files")
print(f"✏️ Modified: {len(files.modified)} files")
print(f"🆕 Created: {len(files.created)} files")
print(f"🗑️ Deleted: {len(files.deleted)} files")

# Use for context
if files.modified:
    recent_changes = [f"{m['path']}: {m['description']}" for m in files.modified]
    context = f"Recently changed:\n" + "\n".join(recent_changes)

Error Handling

Session Not Found

try:
    compressed = client.messages.compress_session(session_id="invalid_session")
except Exception as e:
    if "not found" in str(e).lower():
        print("Session doesn't exist - check session ID")
    else:
        raise

No Summary Yet

try:
    compressed = client.messages.compress_session(session_id=session_id)
except Exception as e:
    if "no summary exists" in str(e).lower():
        print("Summary not generated yet - wait for 15 messages or try again in a few seconds")
        # Fall back to recent history
        history = client.messages.get_history(session_id=session_id, limit=10)
    else:
        raise

Authentication Error

try:
    compressed = client.messages.compress_session(session_id=session_id)
except Exception as e:
    if "authentication" in str(e).lower() or "401" in str(e):
        print("API key invalid or missing")
    else:
        raise

API Reference

Compress Session

Endpoint: GET /v1/messages/sessions/{sessionId}/compress

Parameters:

sessionId (path, required): Session identifier

Headers:

X-API-Key (required): Your Papr API key
X-Client-Type (optional): Client identifier

Response: CompressedSessionResponse

{
  session_id: string;
  summaries: {
    short_term: string;
    medium_term: string;
    long_term: string;
    topics: string[];
    last_updated: string;
  };
  enhanced_fields: {
    session_intent?: string;
    key_decisions: string[];
    current_state?: string;
    next_steps: string[];
    technical_details: string[];
    files_accessed?: {
      read: string[];
      modified: Array<{path: string, description: string}>;
      created: string[];
      deleted: string[];
    };
    project_context?: {
      project_name?: string;
      project_id?: string;
      project_path?: string;
      tech_stack: string[];
      current_task?: string;
      git_repo?: string;
    };
  };
  from_cache: boolean;
  message_count: number;
}

Status Codes:

200 OK - Summary found and returned
404 Not Found - Session doesn't exist or no summary yet
401 Unauthorized - Invalid/missing API key
500 Internal Server Error - Server error

Related Endpoints:

POST /v1/messages - Store message (triggers auto-compression at 15 messages)
GET /v1/messages/sessions/{sessionId} - Get full conversation history
GET /v1/messages/sessions/{sessionId}/status - Check session metadata

Next Steps

Messages Management Guide - Full Messages API documentation
Context Handling - Strategies for context management
Chat History Tutorial - Build a chat app with compression
API Reference - Complete API documentation

Message Compression & Summarization

Overview

How It Works

Automatic Compression (Every 15 Messages)

On-Demand Compression

Three-Tier Summary System

1. Short-Term Summary (Last 15 Messages)

2. Medium-Term Summary (Last ~100 Messages)

3. Long-Term Summary (Full Session)

Enhanced Fields

Session Intent

Key Decisions

Current State

Next Steps

Technical Details

Files Accessed

Project Context

Response Format

Complete Response Structure

Field Availability

Usage Examples

Python

TypeScript

cURL

Token Optimization

Before Compression

After Compression

Optimization Strategy

Compression Algorithm

LLM Processing Pipeline

Structured Prompts

Caching Strategy

Best Practices

1. Choose the Right Summary Level

2. Combine with Memory Search

3. Handle Missing Summaries

4. Check Cache Status

5. Use Enhanced Fields for Structured Data

6. Refresh Strategically

7. Leverage File Tracking

Error Handling

Session Not Found

No Summary Yet

Authentication Error

API Reference

Compress Session

Next Steps

Was this helpful?