Last updated

Building a Self-Improving AI Agent

This tutorial demonstrates how to build an AI agent that learns and improves over time by storing its own reasoning, workflows, and learnings as memories.

What We'll Build

A customer service AI agent that:

  1. Documents its own successful workflows and strategies
  2. Learns from interactions and stores learnings as memories
  3. Retrieves its own past experiences to handle similar situations
  4. Evolves its capabilities based on what works
  5. Measures and improves its success metrics over time

The Self-Improvement Loop

┌─────────────────────────────────────────────────┐
│  1. Agent handles customer interaction          │
└────────────────┬────────────────────────────────┘


┌─────────────────────────────────────────────────┐
│  2. Agent documents reasoning + outcome         │
│     (stores as agent memory)                    │
└────────────────┬────────────────────────────────┘


┌─────────────────────────────────────────────────┐
│  3. Next interaction: Agent searches its own    │
│     memories for similar past experiences       │
└────────────────┬────────────────────────────────┘


┌─────────────────────────────────────────────────┐
│  4. Agent applies learned strategies            │
│     Success rate improves over time             │
└─────────────────────────────────────────────────┘

Prerequisites

  • Papr Memory API key (Get one here)
  • Python 3.8+ or Node.js 16+

Step 1: Agent Documents Initial Workflow

The agent stores its initial approach to handling customer inquiries.

from papr_memory import Papr
import os
from datetime import datetime

client = Papr(x_api_key=os.environ.get("PAPR_MEMORY_API_KEY"))

# Agent documents its refund request workflow (version 1.0)
workflow_v1 = client.memory.add(
    content="""
    REFUND REQUEST WORKFLOW (v1.0)
    
    When customer requests refund:
    1. Verify customer account exists
    2. Check purchase date
    3. Apply refund policy:
       - Within 30 days: Full refund approved
       - After 30 days: Deny refund
    4. Process refund if approved
    
    Initial implementation - no exceptions handled.
    """,
    metadata={
        "role": "assistant",  # This is agent memory
        "category": "workflow",
        "workflow_type": "refund_request",
        "version": "1.0",
        "created_at": datetime.now().isoformat()
    }
)

print("✓ Agent documented initial workflow")

Step 2: Agent Documents Interaction Outcomes

After each interaction, the agent stores what happened and whether it was successful.

# Agent documents a successful resolution
success_case_1 = client.memory.add(
    content="""
    INTERACTION OUTCOME:
    
    Customer: Jane Smith
    Issue: Requested refund for defective product
    Purchase date: 15 days ago
    
    Resolution:
    - Applied standard 30-day refund policy
    - Customer satisfied with immediate refund
    - Product defect noted for quality team
    
    Customer satisfaction: 5/5
    Resolution time: 3 minutes
    """,
    metadata={
        "role": "assistant",
        "category": "outcome",
        "interaction_type": "refund_request",
        "success": True,
        "satisfaction_score": 5,
        "resolution_time_minutes": 3,
        "tags": ["refund", "defective_product", "success"]
    }
)

# Agent documents a problematic case
problem_case_1 = client.memory.add(
    content="""
    INTERACTION OUTCOME (ISSUE IDENTIFIED):
    
    Customer: Mike Johnson
    Issue: Requested refund for product bought 35 days ago
    Reason: Product broke after normal use
    
    Resolution:
    - Rigid 30-day policy applied - refund denied
    - Customer very frustrated and upset
    - Customer mentioned competitor has 60-day policy
    - Lost customer relationship
    
    Customer satisfaction: 1/5
    Resolution time: 8 minutes (escalated to supervisor)
    
    LEARNING: Strict 30-day cutoff causes friction for borderline cases.
    Need more flexibility for loyal customers or product defects.
    """,
    metadata={
        "role": "assistant",
        "category": "outcome",
        "interaction_type": "refund_request",
        "success": False,
        "satisfaction_score": 1,
        "resolution_time_minutes": 8,
        "tags": ["refund", "policy_issue", "customer_lost", "learning_opportunity"]
    }
)

print("✓ Agent documented 2 interaction outcomes")

Step 3: Agent Analyzes Its Own Performance

The agent queries its own memories to identify patterns and areas for improvement.

# Agent searches its own memories for refund-related learnings
learnings_response = client.memory.search(
    query="What have I learned about refund requests that didn't go well?",
    metadata_filter={"role": "assistant", "interaction_type": "refund_request"},
    max_memories=20
)

print(f"\nAgent found {len(learnings_response.data.memories)} relevant past experiences")

# Agent identifies the pattern
low_satisfaction_cases = [
    m for m in learnings_response.data.memories 
    if m.metadata.get('satisfaction_score', 5) < 3
]

print(f"Identified {len(low_satisfaction_cases)} low satisfaction cases")
print("\nCommon issues:")
for case in low_satisfaction_cases[:3]:
    print(f"- {case.content[:150]}...")

Step 4: Agent Improves Its Workflow

Based on learnings, the agent creates an improved workflow version.

# Agent documents improved workflow (version 2.0)
workflow_v2 = client.memory.add(
    content="""
    REFUND REQUEST WORKFLOW (v2.0) - IMPROVED
    
    When customer requests refund:
    1. Verify customer account exists
    2. Check purchase date AND customer history
    3. Apply flexible refund policy:
       - Within 30 days: Automatic full refund
       - 30-60 days: Check customer history
         * Loyal customer (3+ purchases): Approve refund
         * Product defect reported: Approve refund
         * First-time buyer: Offer store credit
       - After 60 days: Escalate to supervisor with context
    4. For approvals, add personalized thank you message
    5. For denials, offer alternative solutions (store credit, discount on replacement)
    
    IMPROVEMENTS FROM V1.0:
    - Added customer history check
    - Flexible 30-60 day window based on circumstances
    - Always offer alternatives to maintain relationship
    - Reduced escalations by 40%
    - Increased satisfaction from 3.2/5 to 4.6/5
    
    Success rate: 87% (up from 65% in v1.0)
    """,
    metadata={
        "role": "assistant",
        "category": "workflow",
        "workflow_type": "refund_request",
        "version": "2.0",
        "previous_version_id": workflow_v1.data.id,
        "improvements": [
            "customer_history_check",
            "flexible_policy",
            "alternative_solutions"
        ],
        "success_rate": 0.87,
        "previous_success_rate": 0.65,
        "created_at": datetime.now().isoformat()
    }
)

print("✓ Agent documented improved workflow v2.0")
print("  Success rate improved from 65% to 87%")

Step 5: Agent Applies Learned Strategies

When handling new interactions, the agent retrieves its improved workflows.

def handle_customer_refund_request(customer_id: str, purchase_date: str, issue: str):
    """
    Agent handles refund request by first checking its own learnings
    """
    
    # Agent retrieves its latest workflow
    workflow_response = client.memory.search(
        query="What is my current best workflow for handling refund requests?",
        metadata_filter={
            "role": "assistant",
            "workflow_type": "refund_request"
        },
        max_memories=5
    )
    
    # Get the highest version workflow
    latest_workflow = max(
        workflow_response.data.memories,
        key=lambda m: float(m.metadata.get('version', '0'))
    )
    
    print(f"\nAgent using workflow version {latest_workflow.metadata.get('version')}")
    print(f"Success rate: {latest_workflow.metadata.get('success_rate', 'N/A')}")
    
    # Agent also checks for similar past cases
    similar_cases = client.memory.search(
        query=f"Similar refund requests with issue: {issue}",
        metadata_filter={
            "role": "assistant",
            "interaction_type": "refund_request",
            "success": True
        },
        max_memories=5
    )
    
    print(f"\nAgent found {len(similar_cases.data.memories)} similar successful cases")
    
    # Agent applies learned strategy (simplified for example)
    days_since_purchase = 35  # calculated from purchase_date
    
    if days_since_purchase <= 30:
        decision = "approve_full_refund"
        reason = "Within 30-day window"
    elif days_since_purchase <= 60:
        # Agent learned to check customer history
        customer_history = get_customer_history(customer_id)  # Your function
        if customer_history['purchase_count'] >= 3:
            decision = "approve_full_refund"
            reason = "Loyal customer with 3+ purchases"
        elif "defect" in issue.lower():
            decision = "approve_full_refund"
            reason = "Product defect reported"
        else:
            decision = "offer_store_credit"
            reason = "First-time buyer, offering alternative"
    else:
        decision = "escalate_with_context"
        reason = "Beyond 60 days, needs supervisor review"
    
    # Agent documents this new outcome
    outcome = client.memory.add(
        content=f"""
        INTERACTION OUTCOME:
        
        Customer: {customer_id}
        Issue: {issue}
        Purchase date: {days_since_purchase} days ago
        
        Decision: {decision}
        Reasoning: {reason}
        Applied workflow: v{latest_workflow.metadata.get('version')}
        
        Used learned strategy: Flexible policy based on customer history
        """,
        metadata={
            "role": "assistant",
            "category": "outcome",
            "interaction_type": "refund_request",
            "workflow_version": latest_workflow.metadata.get('version'),
            "decision": decision,
            "tags": ["refund", issue, "workflow_applied"]
        }
    )
    
    return {
        "decision": decision,
        "reason": reason,
        "workflow_version": latest_workflow.metadata.get('version')
    }

# Example usage
result = handle_customer_refund_request(
    customer_id="cust_456",
    purchase_date="35_days_ago",
    issue="product stopped working after normal use"
)

print(f"\n✓ Decision: {result['decision']}")
print(f"  Reason: {result['reason']}")
print(f"  Using workflow: v{result['workflow_version']}")

Step 6: Agent Documents Domain-Specific Learnings

The agent can store specialized knowledge about different types of issues.

# Agent documents electronics-specific learnings
electronics_learning = client.memory.add(
    content="""
    DOMAIN LEARNING: Electronics Products
    
    Pattern identified across 50+ interactions:
    - Electronics failures within 90 days are usually manufacturing defects
    - Customer satisfaction highest when offering immediate replacement + refund
    - Capturing error codes/symptoms helps product team improve quality
    
    Recommended approach for electronics:
    1. Ask for error codes/symptoms (helps product team)
    2. Offer choice: replacement OR refund
    3. Expedite shipping on replacements
    4. Follow up in 1 week to ensure satisfaction
    
    This approach increased satisfaction from 3.8/5 to 4.7/5 for electronics
    Reduced repeat issues by 25%
    """,
    metadata={
        "role": "assistant",
        "category": "domain_learning",
        "domain": "electronics",
        "interaction_count": 50,
        "satisfaction_improvement": 0.9,
        "tags": ["electronics", "product_defects", "best_practices"]
    }
)

# Agent documents communication style learnings
communication_learning = client.memory.add(
    content="""
    COMMUNICATION LEARNING: Handling Frustrated Customers
    
    Pattern: Customers who escalate are often not angry about the issue itself,
    but about feeling unheard or treated inflexibly.
    
    Effective phrases discovered:
    - "I understand this is frustrating, let me see what I can do..."
    - "You're a valued customer - let me check our options..."
    - "I'd be frustrated too in your situation..."
    
    What NOT to say:
    - "That's our policy" (satisfaction drops 40%)
    - "There's nothing I can do" (immediate escalation)
    - Technical jargon without explanation
    
    When using empathetic language + flexible problem-solving:
    - 65% fewer escalations
    - 4.2/5 satisfaction even when issue not fully resolved
    """,
    metadata={
        "role": "assistant",
        "category": "communication_learning",
        "tags": ["communication", "de_escalation", "customer_psychology"]
    }
)

print("✓ Agent documented domain-specific learnings")

Step 7: Monitor Agent Improvement Over Time

Track how the agent's performance evolves.

# Query agent's performance metrics over time
performance_query = client.graphql.query(
    query="""
    query AgentPerformance {
      workflows: memories(
        where: {
          metadata: { role: "assistant", category: "workflow" }
        }
        order_by: { created_at: asc }
      ) {
        version
        success_rate: metadata.success_rate
        created_at
      }
      
      outcomes: memories(
        where: {
          metadata: { role: "assistant", category: "outcome" }
        }
      ) {
        success: metadata.success
        satisfaction: metadata.satisfaction_score
        resolution_time: metadata.resolution_time_minutes
        created_at
      }
    }
    """
)

print("\n=== Agent Performance Evolution ===")
for workflow in performance_query.data['workflows']:
    print(f"\nWorkflow v{workflow['version']}:")
    print(f"  Success rate: {workflow.get('success_rate', 'N/A')}")
    print(f"  Created: {workflow['created_at']}")

# Calculate average satisfaction over time
outcomes = performance_query.data['outcomes']
if outcomes:
    avg_satisfaction = sum(
        o.get('satisfaction', 0) for o in outcomes
    ) / len(outcomes)
    success_rate = sum(
        1 for o in outcomes if o.get('success')
    ) / len(outcomes)
    
    print(f"\n=== Overall Metrics ===")
    print(f"Total interactions: {len(outcomes)}")
    print(f"Success rate: {success_rate * 100:.1f}%")
    print(f"Average satisfaction: {avg_satisfaction:.2f}/5")

Complete Self-Improving Agent Example

class SelfImprovingAgent:
    def __init__(self, api_key: str, agent_id: str):
        self.client = Papr(x_api_key=api_key)
        self.agent_id = agent_id
    
    def handle_interaction(self, customer_id: str, issue_type: str, details: dict):
        """Handle customer interaction using learned strategies"""
        
        # 1. Retrieve relevant learnings
        learnings = self.client.memory.search(
            query=f"How should I handle {issue_type}? What has worked well?",
            metadata_filter={"role": "assistant", "success": True},
            max_memories=5
        )
        
        print(f"Agent retrieving {len(learnings.data.memories)} relevant learnings...")
        
        # 2. Apply learned strategies (your business logic here)
        resolution = self.apply_strategy(issue_type, details, learnings.data.memories)
        
        # 3. Document this interaction
        outcome = self.client.memory.add(
            content=f"""
            Handled {issue_type} for customer {customer_id}
            Details: {details}
            Resolution: {resolution['action']}
            Used strategies: {', '.join(resolution['strategies_applied'])}
            """,
            metadata={
                "role": "assistant",
                "category": "outcome",
                "agent_id": self.agent_id,
                "interaction_type": issue_type,
                "success": resolution['success'],
                "satisfaction_score": resolution.get('satisfaction'),
                "strategies_applied": resolution['strategies_applied']
            }
        )
        
        return resolution
    
    def learn_from_feedback(self, interaction_id: str, feedback: dict):
        """Update learnings based on customer feedback"""
        
        # Store the feedback as learning
        self.client.memory.add(
            content=f"""
            Feedback received for interaction {interaction_id}:
            Satisfaction: {feedback['satisfaction']}/5
            Comments: {feedback.get('comments', 'None')}
            
            {'POSITIVE OUTCOME' if feedback['satisfaction'] >= 4 else 'NEEDS IMPROVEMENT'}
            """,
            metadata={
                "role": "assistant",
                "category": "feedback",
                "agent_id": self.agent_id,
                "interaction_id": interaction_id,
                "satisfaction": feedback['satisfaction']
            }
        )
    
    def generate_improvement_report(self):
        """Analyze performance and suggest improvements"""
        
        # Query recent outcomes
        recent_outcomes = self.client.memory.search(
            query="What were my recent interaction outcomes?",
            metadata_filter={
                "role": "assistant",
                "category": "outcome",
                "agent_id": self.agent_id
            },
            max_memories=100
        )
        
        # Identify patterns
        successful = [m for m in recent_outcomes.data.memories if m.metadata.get('success')]
        unsuccessful = [m for m in recent_outcomes.data.memories if not m.metadata.get('success')]
        
        report = {
            "total_interactions": len(recent_outcomes.data.memories),
            "success_rate": len(successful) / len(recent_outcomes.data.memories) if recent_outcomes.data.memories else 0,
            "areas_for_improvement": self.identify_improvement_areas(unsuccessful)
        }
        
        # Document the analysis
        self.client.memory.add(
            content=f"""
            SELF-ANALYSIS REPORT
            
            Period: Last 100 interactions
            Success rate: {report['success_rate'] * 100:.1f}%
            
            Areas identified for improvement:
            {chr(10).join(f"- {area}" for area in report['areas_for_improvement'])}
            
            Agent will focus on these areas in next iteration.
            """,
            metadata={
                "role": "assistant",
                "category": "self_analysis",
                "agent_id": self.agent_id,
                "success_rate": report['success_rate']
            }
        )
        
        return report
    
    def apply_strategy(self, issue_type: str, details: dict, learnings: list):
        """Your business logic to apply learned strategies"""
        # Implement based on your use case
        return {
            "action": "resolved",
            "success": True,
            "satisfaction": 5,
            "strategies_applied": ["empathy_first", "flexible_policy"]
        }
    
    def identify_improvement_areas(self, unsuccessful_cases: list):
        """Identify patterns in unsuccessful cases"""
        # Analyze and return areas for improvement
        return ["response_time", "policy_flexibility", "technical_explanations"]

# Usage
agent = SelfImprovingAgent(
    api_key=os.environ.get("PAPR_MEMORY_API_KEY"),
    agent_id="cs_agent_001"
)

# Handle interaction
result = agent.handle_interaction(
    customer_id="cust_789",
    issue_type="refund_request",
    details={"purchase_date": "35_days_ago", "reason": "product_defect"}
)

# Later, receive feedback
agent.learn_from_feedback(
    interaction_id="int_123",
    feedback={"satisfaction": 5, "comments": "Very helpful and understanding"}
)

# Generate improvement report
report = agent.generate_improvement_report()
print(f"\nAgent success rate: {report['success_rate'] * 100:.1f}%")
print(f"Identified {len(report['areas_for_improvement'])} areas for improvement")

Key Concepts

Agent Memories vs User Memories

  • User Memories: Information about users (preferences, history, context)
  • Agent Memories: Agent's own workflows, learnings, reasoning patterns
  • Both stored the same way, enabling agents to evolve independently

The Self-Improvement Cycle

  1. Document: Agent stores its reasoning and outcomes
  2. Analyze: Agent queries its own memories for patterns
  3. Learn: Agent identifies what works and what doesn't
  4. Improve: Agent updates its workflows based on learnings
  5. Apply: Agent uses improved strategies in future interactions
  6. Repeat: Continuous improvement over time

Measuring Improvement

Track these metrics in agent memories:

  • Success rate (% of resolved interactions)
  • Customer satisfaction scores
  • Resolution time
  • Escalation rate
  • Strategy effectiveness

Real-World Benefits

Continuous Learning

  • Agent improves without manual retraining
  • Learns from every interaction
  • Adapts to changing customer needs

Institutional Knowledge

  • Successful strategies are preserved
  • Knowledge doesn't depend on individual agent instances
  • New agent instances can learn from experienced ones

Measurable Evolution

  • Clear metrics show improvement over time
  • Can identify which strategies work best
  • Data-driven optimization

Next Steps