Last updated

Enterprise Customer Feedback Analysis

Bugs Fixed vs. Documentation Gaps

Analysis Date: February 12, 2026
Customer Type: Enterprise (Financial Services, Insurance, Consulting)
Timeframe: October 2025 - February 2026


Executive Summary

After analyzing enterprise customer conversations against the memory-opensource repository, we've categorized feedback into:

  • 6 Bugs (Now Fixed) - API/backend issues that were resolved
  • 6 Documentation Gaps - User confusion that needs doc clarification
  • 3 Feature Requests - Legitimate asks for new capabilities

✅ BUGS THAT WERE FIXED (Do NOT Need Doc Changes)

1. Namespace Filtering Not Working ❌ BUG - FIXED Feb 10, 2026

Issue: Namespace_id filter returning results from wrong namespaces, seeing namespace_id = None results

Evidence of Fix:

# Git commits:
8508bf9 Fix namespace scope filter (Feb 10, 2026)
beb4694 fix: move namespace_id filtering to primitive layer (Qdrant + MongoDB)
fce05e7 Merge pull request #15 from Papr-ai/fix/namespace-filter-primitive-layer
fc3c753 fix: move namespace_id filtering to primitive layer (Qdrant + MongoDB)

Root Cause: Namespace filtering was happening at wrong layer (application layer instead of database primitive layer), allowing cross-namespace leakage

Status: ✅ RESOLVED - No doc changes needed, this was pure backend bug


2. Search Speed Issues (15-30 seconds) ❌ BUG - FIXED Jan-Feb 2026

Issue: Experiencing 15-30 second search latency, sometimes 3-5 seconds

Evidence of Fix:

# Git commits:
a573179 Fix Vertex AI dead connection causing search failures + add resilience
1ef311d Optimize Vertex AI + Qdrant search latency with connection keep-alive  
7248221 Fix Vertex AI 60s+ latency: replace gRPC SDK with REST API + credential caching
c1a9b7c Add Search Latency Analysis Document
067d34a Optimize Qdrant search and caching, add warmup, and improve usage tracking

Root Cause: Multiple issues:

  • Vertex AI connections dying and reconnecting (causing 60s+ delays)
  • gRPC SDK slowness vs REST API
  • Lack of connection keep-alive
  • Cold start issues

Status: ✅ RESOLVED - All cold start issues fixed, no doc changes needed


3. schemas_used Returning None ❌ BUG - LIKELY FIXED

Issue: schemas_used field consistently returning None in search results

Evidence from Code:

# openapi.yaml:4722
schemas_used:
  anyOf:
  - items:
      type: string
    type: array
  - type: 'null'
  title: Schemas Used
  description: List of UserGraphSchema IDs used in this response

Root Cause: Likely related to schema not being properly registered or populated in metadata during graph generation

Status: ✅ LIKELY RESOLVED (related to namespace/schema registration fixes) - Minimal doc needed


4. Auto-Population of ACL Arrays Bug ❌ BUG - FIXED Feb 11, 2026

Issue: System was auto-populating namespace/org into ACL arrays incorrectly

Evidence of Fix:

# Git commits:
1513d51 fix: remove auto-population of namespace/org ACL arrays from scoping IDs (Feb 11)
a69238a fix: remove auto-population of namespace/org ACL arrays from scoping IDs

Root Cause: System was automatically adding namespace_id/organization_id to _read_access/_write_access arrays, causing confusion and incorrect access control

Status: ✅ RESOLVED - No doc changes needed


5. Groq Fallback Issue ❌ BUG - FIXED Jan 31, 2026

Issue: When agentic search enabled and Groq was down, searches failed with 404

Root Cause: No fallback when Groq API was unavailable

Status: ✅ RESOLVED - No doc changes needed, pure reliability bug


6. Document Upload with Schema Bug ❌ BUG - FIXED

Issue: PDFs not getting added to graph when schema existed for different namespace

Root Cause: System required schema for namespace before documents could be processed, but wasn't clearly communicated

Status: ⚠️ PARTIALLY RESOLVED - This revealed need for better schema prerequisite docs


📚 DOCUMENTATION GAPS (Need Immediate Attention)

1. external_user_id vs user_id - API Spec Clarification 🔴 HIGH PRIORITY

Issue: Developers confused about when to use user_id vs external_user_id

Key Clarification from API Team:

Memory server now resolves user_id internally, so developers don't need to worry about it. The new API spec focuses on external_user_id for all developer-facing operations.

Current API Spec (openapi.yaml):

# FeedbackRequest example shows external_user_id (line 3318)
example:
  external_user_id: dev_api_key_123  # ✅ Developer uses this
  user_id: abc123def456              # ⚠️ Optional, auto-resolved if not provided

Required Doc Updates:

  1. Add to guides/authentication.md:

    ## Understanding User IDs
    
    ### For Developers: Use `external_user_id`
    
    **In your application code, always use `external_user_id`:**
    
    - **What it is:** YOUR application's user identifier
    - **You provide:** Any string that identifies users in YOUR system
    - **Examples:** `"user_12345"`, `"alice@company.com"`, `"customer_abc"`
    - **Memory server handles:** Automatic resolution to internal `user_id`
    
    ```python
    # ✅ Correct - Use external_user_id
    client.add_memory(
        content="User prefers dark mode",
        external_user_id="alice@company.com"  # Your user ID
    )
    
    client.search_memory(
        query="preferences",
        external_user_id="alice@company.com"
    )
    
    # Document upload
    client.upload_document(
        file_path="report.pdf",
        external_user_id="alice@company.com"
    )

    What About user_id?

    • Internal only: Papr's internal user identifier (10-char Parse objectId)
    • Auto-resolved: Memory server resolves this from your API key + external_user_id
    • You don't need it: The API handles this mapping automatically
    • When visible: Only in response objects for debugging/tracking

    Quick Reference

    Use CaseParameterExample
    Add Memoryexternal_user_idYour app user ID
    Search Memoryexternal_user_idYour app user ID
    Upload Documentexternal_user_idYour app user ID
    Submit Feedbackexternal_user_idYour app user ID
    API Responsesuser_idAuto-included by server

    Migration Note

    If you're upgrading from an older API version that used user_id:

    • Replace all user_id parameters with external_user_id
    • Use your application's user identifiers
    • Memory server handles the internal mapping
    
    
  2. Add to quickstart/ guides:

    • Update all examples to use external_user_id
    • Remove references to manually managing user_id

Priority: 🔴 HIGH - Affects all new integrations


2. Search Response Serialization 🟡 MEDIUM PRIORITY

Issue: Developers couldn't serialize search response objects to JSON

API Spec Check (openapi.yaml:4665):

SearchResponse:
  properties:
    code:
      type: integer
    status:
      type: string
    data:
      anyOf:
      - $ref: '#/components/schemas/SearchResult'
      - type: 'null'
    error:
      anyOf:
      - type: string
      - type: 'null'
    search_id:
      anyOf:
      - type: string
      - type: 'null'

SearchResult structure (line 4710):

SearchResult:
  properties:
    memories:
      items:
        $ref: '#/components/schemas/Memory'
      type: array
    nodes:
      items:
        $ref: '#/components/schemas/Node'
      type: array
    schemas_used:
      anyOf:
      - items:
          type: string
        type: array
      - type: 'null'

Required Doc Updates:

  1. Add to sdks/python.md:

    ## Working with Search Results
    
    ### Response Structure
    
    Search returns a `SearchResponse` Pydantic model with this structure:
    
    ```python
    {
      "code": 200,
      "status": "success",
      "data": {
        "memories": [...],        # List of Memory objects
        "nodes": [...],           # Graph nodes (if graph enabled)
        "schemas_used": [...]     # Schema IDs used (null if none)
      },
      "error": null,
      "search_id": "abc123"       # Query tracking ID
    }

    Converting to Dictionary/JSON

    # Search returns SearchResponse Pydantic model
    response = client.search_memory(query="example", external_user_id="alice@company.com")
    
    # Convert to dictionary using Pydantic v2 method
    response_dict = response.model_dump()
    
    # Convert to JSON string
    import json
    response_json = json.dumps(response.model_dump())
    
    # Access nested data
    if response.data:
        for memory in response.data.memories:
            print(memory.content)
            print(memory.model_dump())  # Each memory is also serializable

    Common Patterns

    # Extract just memory contents
    if response.data:
        contents = [m.content for m in response.data.memories]
    
    # Get memory IDs
    memory_ids = [m.objectId for m in response.data.memories]
    
    # Check which schemas were used
    if response.data and response.data.schemas_used:
        print(f"Schemas: {response.data.schemas_used}")
    
    # Handle errors
    if response.status == "error":
        print(f"Error: {response.error}")
    else:
        print(f"Found {len(response.data.memories)} memories")

    Troubleshooting

    Problem: .dict() not working
    Solution: Use .model_dump() (Pydantic v2 method)

    Problem: AttributeError on response
    Solution: Check SDK version >= 2.20.0: pip install --upgrade papr-memory

    
    

Priority: 🟡 MEDIUM - Affects data processing workflows


3. Schema Prerequisites for Document Upload 🔴 HIGH PRIORITY

Issue: Developers didn't understand schema needs to be registered for namespace BEFORE uploading documents

Key Clarification:

For documents with hierarchical_enabled=True, Papr automatically:

  • Breaks documents by hierarchy
  • Connects chunks to each other based on hierarchy
  • Links chunks by semantic and logical relationships Developers don't need to add these to a schema - this is automatic.

What Developers SHOULD Use Schemas For:

  • Domain-specific entities (e.g., Customer, Transaction, Product)
  • Business logic relationships (e.g., PURCHASED, ASSIGNED_TO)
  • Required identifiers that must be extracted
  • Use unstructured identifiers (like "name") not deterministic IDs (like "id") for unstructured data

Required Doc Updates:

  1. Add to guides/document-processing.md:

    ## Document Processing with Hierarchical Chunking
    
    ### Automatic Features (No Schema Required)
    
    When you upload documents with `hierarchical_enabled=True`, Papr automatically:
    
    **Breaks documents by hierarchy**
    - Sections, subsections, paragraphs
    - Preserves document structure
    
    **Connects chunks to each other**
    - Based on hierarchical relationships
    - Parent-child section links
    
    **Links by semantic similarity**
    - Related content across sections
    - Logical flow connections
    
    **You don't need a schema for this** - it's built-in.
    
    ### When to Use Custom Schemas
    
    Use schemas to extract **domain-specific entities**:
    
    ```python
    # Example: Financial document schema
    schema = {
      "nodes": [
        {
          "label": "Company",
          "properties": ["name", "ticker", "sector"],  # ✅ Unstructured identifiers
          "required": ["name"]  # Must be found
        },
        {
          "label": "Metric",
          "properties": ["metric_name", "value", "period"],  # Not "id"
          "required": ["metric_name", "value"]
        }
      ],
      "relationships": [
        {
          "type": "HAS_METRIC",
          "from": "Company",
          "to": "Metric"
        }
      ]
    }

    Schema Design for Unstructured Data

    Avoid: Deterministic IDs

    {
      "label": "Customer",
      "properties": ["id", "customer_number"]  // Bad for unstructured
    }

    Use: Natural identifiers

    {
      "label": "Customer",
      "properties": ["name", "email", "company_name"],  // Good for unstructured
      "required": ["name"]
    }

    Upload Workflow

    # Option 1: No schema (hierarchical only)
    response = client.upload_document(
        file_path="report.pdf",
        hierarchical_enabled=True,  # Auto hierarchy + connections
        external_user_id="alice@company.com"
    )
    
    # Option 2: With schema (hierarchy + domain entities)
    response = client.upload_document(
        file_path="financial_report.pdf",
        schema_id="financial-schema-v1",  # Extract companies, metrics
        hierarchical_enabled=True,         # Plus hierarchy
        external_user_id="alice@company.com"
    )

    Troubleshooting

    Problem: Entities not being extracted
    Solution: Check schema uses natural identifiers (name, title) not IDs

    Problem: Schema not found
    Solution: Verify schema registered for correct namespace

    
    
  2. Add to guides/custom-schemas.md:

    ## When to Register Custom Schemas
    
    ### Built-in Processing (No Schema Needed)
    
    - ✅ Document hierarchy and structure
    - ✅ Chunk-to-chunk relationships
    - ✅ Semantic similarity links
    - ✅ Basic entity extraction (dates, numbers, etc.)
    
    ### Custom Schema Use Cases
    
    Register schemas for:
    
    1. **Domain-specific entities**
       - Industry terminology (e.g., "Claim", "Policy" for insurance)
       - Business objects (e.g., "Customer", "Transaction")
    
    2. **Required extractions**
       - Must-have fields using `"required": ["field_name"]`
       - Validation that entities exist
    
    3. **Custom relationships**
       - Business logic connections
       - Domain-specific relationship types
    
    4. **Unique identifiers**
       - For unstructured data: Use natural identifiers
       - Examples: `"name"`, `"title"`, `"email"`, `"company_name"`
       - **Avoid:** `"id"`, `"customer_id"` (not deterministic in unstructured text)

Priority: 🔴 HIGH - Critical for document processing


4. Memory Policies for Graph Control 🔴 HIGH PRIORITY

Issue: Developers don't understand when/how to use memory policies vs. schemas

Key Clarification:

Memory policies let you implement graph control patterns. See MEMORY_POLICY_USAGE_GUIDE.md in memory server for capabilities and when to use each.

Required Doc Updates:

  1. Add to guides/graph-control.md (new file):

    ## Memory Policies: Controlling Graph Generation
    
    ### Overview
    
    Memory policies control HOW memories are processed and added to your knowledge graph. They work alongside schemas to give you fine-grained control.
    
    ### Policy Modes
    
    | Mode | Description | Use Case |
    |------|-------------|----------|
    | `auto` | LLM extracts entities automatically | Unstructured data (documents, conversations) |
    | `manual` | You provide exact nodes/relationships | Structured data (databases, APIs) |
    
    ### Basic Usage
    
    ```python
    # Auto mode - LLM extracts entities
    response = client.add_memory(
        content="Meeting with Acme Corp about Q4 targets",
        external_user_id="alice@company.com",
        memory_policy={
            "mode": "auto",
            "schema_id": "business-schema-v1"  # Optional schema to guide extraction
        }
    )
    
    # Manual mode - You specify exact graph structure
    response = client.add_memory(
        content="Transaction record",
        external_user_id="alice@company.com",
        memory_policy={
            "mode": "manual",
            "nodes": [
                {
                    "id": "txn_1",
                    "label": "Transaction",
                    "properties": {"amount": 99.99, "date": "2026-01-15"}
                },
                {
                    "id": "prod_1",
                    "label": "Product",
                    "properties": {"name": "Premium Plan"}
                }
            ],
            "relationships": [
                {
                    "source_node_id": "txn_1",
                    "target_node_id": "prod_1",
                    "type": "PURCHASED"
                }
            ]
        }
    )

    Node Constraints (Advanced)

    Apply business rules to auto-extracted entities:

    response = client.add_memory(
        content="Assigned bug-123 to Alice, marked as urgent",
        external_user_id="alice@company.com",
        memory_policy={
            "mode": "auto",
            "schema_id": "project-schema",
            "node_constraints": [
                {
                    "node_type": "Task",
                    "when": {"priority": "urgent"},  # When to apply
                    "set": {"urgent": True},         # Force this property
                    "create": "auto"                 # Create if doesn't exist
                }
            ]
        }
    )

    Edge Constraints

    Control relationship creation:

    memory_policy={
        "mode": "auto",
        "edge_constraints": [
            {
                "relationship_type": "ASSIGNED_TO",
                "from_node_type": "Task",
                "to_node_type": "Person",
                "when": {"status": "active"},  # Only create for active tasks
                "required": True              # Must exist in content
            }
        ]
    }

    Common Patterns

    Pattern 1: Schema-Guided Extraction

    # Use schema to guide LLM extraction
    memory_policy={
        "mode": "auto",
        "schema_id": "your-schema-id"
    }

    Pattern 2: Force Properties

    # Always add project_id to Task nodes
    memory_policy={
        "mode": "auto",
        "node_constraints": [
            {
                "node_type": "Task",
                "set": {"project_id": "proj_123"},
                "create": "auto"
            }
        ]
    }

    Pattern 3: Prevent Node Creation

    # Never create new Customer nodes, only link to existing
    memory_policy={
        "mode": "auto",
        "node_constraints": [
            {
                "node_type": "Customer",
                "create": "never",  # Only link to existing
                "merge": ["last_interaction"]  # Update this field
            }
        ]
    }

    Pattern 4: Unique Identifiers for Unstructured Data

    # Use natural identifiers
    schema = {
        "nodes": [
            {
                "label": "Company",
                "properties": ["name"],  # Natural identifier
                "unique": ["name"]       # Merge on name match
            }
        ]
    }

    When to Use Each Approach

    ScenarioSolution
    Unstructured text, no special rulesmode: auto only
    Unstructured text + business rulesmode: auto + node_constraints
    Unstructured text + domain entitiesmode: auto + schema_id
    Structured database recordsmode: manual
    Mix of LLM extraction + exact datamode: auto + node_constraints with set

    Learn More

    
    
  2. Add to guides/node-constraints.md (new file):

    ## Node Constraints Reference
    
    Node constraints give you fine-grained control over how the LLM extracts and structures entities in your knowledge graph.
    
    ### Basic Structure
    
    ```python
    {
        "node_type": "Task",        # Which node type to control
        "when": {...},              # Optional: Conditions to match
        "create": "auto",           # Creation policy
        "set": {...},               # Force these properties
        "merge": ["field1"],        # Update these fields if exists
        "unique": ["field2"]        # Use these for matching
    }

    Creation Policies

    ValueBehavior
    "auto"Create if doesn't exist (default)
    "always"Always create new node
    "never"Only link to existing nodes

    Property Control

    set - Force Properties

    {
        "node_type": "Task",
        "set": {
            "project_id": "proj_123",
            "created_by": "system"
        }
    }
    # Result: Every Task node gets these properties, overriding LLM

    merge - Update on Existing

    {
        "node_type": "Customer",
        "create": "never",
        "merge": ["last_interaction", "total_purchases"]
    }
    # Result: Update these fields if Customer exists, never create new

    unique - Matching Strategy

    {
        "node_type": "Company",
        "unique": ["name"]
    }
    # Result: Merge nodes if name matches (case-insensitive)

    Conditional Application (when)

    {
        "node_type": "Task",
        "when": {
            "priority": "high",
            "status": "open"
        },
        "set": {
            "urgent": True,
            "escalated": True
        }
    }
    # Result: Only apply to Tasks with priority=high AND status=open

    Examples

    Example 1: Project Context

    # Always add project_id to tasks
    memory_policy={
        "mode": "auto",
        "node_constraints": [
            {
                "node_type": "Task",
                "set": {"project_id": current_project_id}
            }
        ]
    }

    Example 2: Reference Data

    # Never create Products, only link to existing catalog
    memory_policy={
        "mode": "auto",
        "node_constraints": [
            {
                "node_type": "Product",
                "create": "never",
                "unique": ["name", "sku"]
            }
        ]
    }

    Example 3: Unstructured Identifiers

    # Use natural identifiers for deduplication
    memory_policy={
        "mode": "auto",
        "schema_id": "customer-schema",
        "node_constraints": [
            {
                "node_type": "Customer",
                "unique": ["email"],      # Email is natural identifier
                "merge": ["last_seen"]    # Update timestamp
            }
        ]
    }

    Best Practices

    1. Use natural identifiers for unstructured data

      • "name", "email", "title"
      • "id", "customer_id" (not in unstructured text)
    2. Combine with schemas

      • Schema defines WHAT entities
      • Constraints define HOW to handle them
    3. Start simple

      • Begin with mode: auto only
      • Add constraints as needed for business rules
    
    

Priority: 🔴 HIGH - Critical for advanced use cases


5. Feedback Endpoints for Evaluation 🟡 MEDIUM PRIORITY

Issue: Developers don't know about feedback endpoints for improving retrieval

API Spec Check (openapi.yaml:3270):

FeedbackRequest:
  properties:
    search_id:
      type: string
      description: The search_id from SearchResponse
    feedbackData:
      $ref: '#/components/schemas/FeedbackData'
    external_user_id:
      type: string

Key Uses (from openapi.json description):

"The feedback is used to train and improve:

  • Router model tier predictions
  • Memory retrieval ranking
  • Answer generation quality
  • Agentic graph search performance"

Required Doc Updates:

  1. Add to guides/feedback-and-evaluation.md (new file):

    ## Feedback Endpoints: Improving Your Memory Retrieval
    
    ### Overview
    
    Papr's feedback system lets you improve search quality over time by collecting user feedback on search results.
    
    ### What Feedback Improves
    
    Your feedback trains and improves:
    -**Memory retrieval ranking** - Most relevant memories surface first
    -**Answer generation quality** - Better responses to queries
    -**Agentic graph search** - Smarter graph traversal
    -**Router model predictions** - Optimal retrieval strategy selection
    
    ### Basic Usage
    
    ```python
    # Step 1: Search returns a search_id
    response = client.search_memory(
        query="What are Q4 revenue targets?",
        external_user_id="alice@company.com"
    )
    search_id = response.search_id  # Save this!
    
    # Step 2: User provides feedback
    feedback = client.submit_feedback(
        search_id=search_id,
        external_user_id="alice@company.com",
        feedback_data={
            "feedbackType": "thumbs_up",
            "feedbackValue": "helpful",
            "feedbackScore": 1,
            "feedbackSource": "inline",
            "feedbackImpact": "positive"
        }
    )

    Feedback Types

    TypeWhen to UseImpact
    thumbs_up / thumbs_downUser approves/rejects resultsHigh - direct quality signal
    rating1-5 star ratingsMedium - nuanced feedback
    correctionUser edits/corrects answerHigh - specific improvements
    engagementCopy/save/share actionsMedium - implicit approval

    Detailed Feedback Example

    # User finds specific memories helpful
    feedback = client.submit_feedback(
        search_id=search_id,
        external_user_id="alice@company.com",
        feedback_data={
            "feedbackType": "thumbs_up",
            "feedbackValue": "accurate",
            "feedbackScore": 1,
            "feedbackSource": "inline",
            "feedbackImpact": "positive",
            "feedbackText": "Exactly what I needed",
            
            # Specific memories that were helpful
            "citedMemoryIds": ["mem_123", "mem_456"],
            
            # Specific nodes that were relevant
            "citedNodeIds": ["node_789"]
        }
    )

    Evaluation Workflow

    # 1. Run evaluation queries
    eval_queries = [
        "What are our Q4 targets?",
        "Who is assigned to bug-123?",
        "What's the status of Project Alpha?"
    ]
    
    results = []
    for query in eval_queries:
        response = client.search_memory(
            query=query,
            external_user_id="eval_user"
        )
        
        # Manual evaluation: Are results relevant?
        is_relevant = evaluate_results(response.data.memories)
        
        # Submit feedback
        client.submit_feedback(
            search_id=response.search_id,
            external_user_id="eval_user",
            feedback_data={
                "feedbackType": "thumbs_up" if is_relevant else "thumbs_down",
                "feedbackScore": 1 if is_relevant else -1,
                "feedbackSource": "evaluation",
                "feedbackImpact": "positive" if is_relevant else "negative"
            }
        )
        
        results.append({
            "query": query,
            "search_id": response.search_id,
            "relevant": is_relevant
        })
    
    # 2. Track improvements over time
    accuracy = sum(1 for r in results if r["relevant"]) / len(results)
    print(f"Evaluation accuracy: {accuracy:.2%}")

    Integration Patterns

    Pattern 1: User Thumbs Up/Down

    # In your UI
    if user_clicked_thumbs_up:
        client.submit_feedback(
            search_id=search_id,
            external_user_id=current_user_id,
            feedback_data={
                "feedbackType": "thumbs_up",
                "feedbackScore": 1,
                "feedbackSource": "inline"
            }
        )

    Pattern 2: Implicit Engagement

    # Track user actions
    if user_copied_text or user_saved_result:
        client.submit_feedback(
            search_id=search_id,
            external_user_id=current_user_id,
            feedback_data={
                "feedbackType": "engagement",
                "feedbackValue": "saved" if user_saved_result else "copied",
                "feedbackScore": 1,
                "feedbackSource": "interaction",
                "feedbackImpact": "positive"
            }
        )

    Pattern 3: Automated Evaluation

    # Run nightly evals
    def run_evaluation_suite():
        for test_case in test_cases:
            response = client.search_memory(
                query=test_case["query"],
                external_user_id="eval_bot"
            )
            
            # Check if expected memories returned
            expected_ids = set(test_case["expected_memory_ids"])
            returned_ids = set([m.objectId for m in response.data.memories])
            
            is_correct = expected_ids.issubset(returned_ids)
            
            client.submit_feedback(
                search_id=response.search_id,
                external_user_id="eval_bot",
                feedback_data={
                    "feedbackType": "evaluation",
                    "feedbackScore": 1 if is_correct else -1,
                    "feedbackSource": "automated_test",
                    "citedMemoryIds": list(expected_ids) if is_correct else []
                }
            )

    Best Practices

    1. Always save search_id

      • Required for feedback submission
      • Track in your UI state
    2. Provide specific feedback

      • Include citedMemoryIds for helpful memories
      • Add feedbackText for context
    3. Mix feedback sources

      • User feedback (thumbs up/down)
      • Engagement signals (copy/save)
      • Automated evaluations
    4. Monitor over time

      • Track feedback metrics
      • Measure search quality improvements

    Learn More

    
    

Priority: 🟡 MEDIUM - Important for production quality


6. rank_results for Accuracy 🟢 LOW PRIORITY

Issue: Developers toggled rank_results but saw no difference

Key Clarification:

rank_results=True is for best accuracy but adds an extra reranking step, so there's more latency.

Required Doc Updates:

  1. Update guides/search-tuning.md:

    ## Search Parameters
    
    ### `rank_results` - Accuracy vs Speed Trade-off
    
    **What it does:** Applies additional reranking using a cross-encoder or LLM for maximum accuracy
    
    **Trade-off:**
    -**Best accuracy** - Results reordered by semantic relevance
    - ⚠️ **More latency** - Adds 200-500ms for reranking step
    
    ### When to Use
    
    ```python
    # For best accuracy (production search)
    response = client.search_memory(
        query="complex semantic query",
        rank_results=True,  # Maximum accuracy
        external_user_id="alice@company.com"
    )
    
    # For fastest speed (real-time chat)
    response = client.search_memory(
        query="quick lookup",
        rank_results=False,  # Skip reranking
        external_user_id="alice@company.com"
    )

    Performance Comparison

    ConfigurationLatencyAccuracyUse Case
    rank_results=False~100-300msGoodReal-time chat, autocomplete
    rank_results=True~300-800msBestProduction search, critical queries

    Default Behavior

    • Default: rank_results=False (optimized for speed)
    • Use rank_results=True when accuracy matters more than latency
    
    

Priority: 🟢 LOW - Performance optimization detail


🚀 FEATURE REQUESTS (Future Considerations)

1. Configurable Search LLM Models

Request: "For search, can we make it possible to use any LLM rather than the fixed default model?"

Status: Valid feature request - track for roadmap


2. Search Performance Benchmarks

Request: "It would be helpful to have search query benchmarks to evaluate performance as the number of nodes/graph size scales"

Status: Valid ask - consider publishing benchmarks


3. Native Document Deduplication

Request: "Is there a way to verify whether a document has already been processed?"

Status: Already planned for Q2 2026


Priority Documentation Updates

Immediate (This Week)

  1. ✅ external_user_id vs user_id clarification
  2. ✅ Schema prerequisites for document upload
  3. ✅ Memory policies and node constraints guide

High Priority (Next Sprint)

  1. ✅ Search response serialization examples
  2. ✅ Feedback endpoints for evaluation
  3. ✅ rank_results accuracy vs speed explanation

Files to Create

  1. guides/graph-control.md - Memory policies overview
  2. guides/node-constraints.md - Node constraints reference
  3. guides/feedback-and-evaluation.md - Feedback system guide

Files to Update

  1. guides/authentication.md - Add external_user_id section
  2. guides/document-processing.md - Add hierarchy + schema guidance
  3. guides/custom-schemas.md - Add unique identifiers section
  4. sdks/python.md - Add serialization examples
  5. sdks/typescript.md - Add serialization examples
  6. guides/search-tuning.md - Add rank_results section
  7. quickstart/* - Update all examples to use external_user_id

Total Effort Estimate

  • High Priority Docs: 6-8 hours
  • New Feature Guides: 4-6 hours
  • Total: ~10-14 hours of focused doc writing

Additional Documentation Gap: Structured Data Integration

7. Postgres/SQL to Papr - Connecting Structured + Unstructured Data 🔴 HIGH PRIORITY

Issue: Major use case not prominently featured in docs

Use Case:

Developers want to take structured data from Postgres/SQL databases and put it in Papr to connect it with unstructured data (documents, conversations, support tickets).

Why This Matters:

  • CRM data (customers, accounts, opportunities) + support conversations
  • Product catalog (SKUs, prices, inventory) + customer feedback
  • Transaction history (orders, payments) + chat logs
  • Employee records (HR data) + performance reviews
  • Ticket systems (Jira, Linear) + code commits + Slack discussions

Current State:

  • We have quickstart/structured-data-memory.md but it's minimal
  • overview/structured-data.md exists but doesn't emphasize the connection use case
  • Not featured prominently in main overview or use cases

Required Doc Updates:

  1. Add to overview/index.md - Three Input Paths section:

    ### Three Input Paths
    
    1. **Documents** (`POST /v1/document`) - Upload PDFs or Word docs. System analyzes and selectively creates memories.
    2. **Messages/Chat** - Send conversation history. System analyzes and extracts important information.
    3. **Structured Data** (`POST /v1/memory` with `mode: manual`) - Import from Postgres/SQL databases. Connect structured records with unstructured context.
    4. **Direct Memory** (`POST /v1/memory`) - Explicitly create memories with full control. Perfect for agent self-documentation.
  2. Create tutorials/postgres-to-papr.md (new file):

    # Connecting Postgres Data with Unstructured Context
    
    ## Overview
    
    One of Papr's most powerful use cases: **Connect your structured database records with unstructured data** (documents, conversations, support tickets).
    
    ### The Problem
    
    Your valuable data lives in silos:
    - **Postgres/SQL:** Customer records, transactions, product catalog
    - **Documents:** Contracts, proposals, technical specs
    - **Conversations:** Support tickets, Slack threads, chat logs
    - **Code:** GitHub commits, pull requests, code reviews
    
    Traditional approaches:
    - ❌ SQL joins only work within the database
    - ❌ Vector search only finds similar text
    - ❌ Manual linking is brittle and doesn't scale
    
    ### The Solution
    
    Papr's Memory Graph connects structured + unstructured data automatically:
    

    Postgres Customer Record Papr Memory Graph Automatically links to:

    • Support tickets mentioning customer
    • Sales conversations about their needs
    • Product docs they viewed
    • Feature requests they submitted
    
    ## Real-World Example: CRM + Support Integration
    
    ### Step 1: Import Structured Data from Postgres
    
    ```python
    import psycopg2
    from papr_memory import PaprMemory
    
    # Connect to your Postgres database
    conn = psycopg2.connect("dbname=crm user=postgres")
    cursor = conn.cursor()
    
    # Fetch customer records
    cursor.execute("""
        SELECT 
            customer_id,
            name,
            email,
            segment,
            arr,
            health_score
        FROM customers
        WHERE updated_at > NOW() - INTERVAL '1 day'
    """)
    
    # Initialize Papr client
    papr = PaprMemory(api_key="your-api-key")
    
    # Import each customer as a graph node
    for row in cursor.fetchall():
        customer_id, name, email, segment, arr, health_score = row
        
        response = papr.add_memory(
            content=f"Customer record: {name} ({email})",
            external_user_id="crm_sync_bot",
            memory_policy={
                "mode": "manual",  # Exact structure from database
                "nodes": [
                    {
                        "id": f"customer_{customer_id}",
                        "type": "Customer",
                        "properties": {
                            "name": name,
                            "email": email,
                            "segment": segment,
                            "arr": str(arr),
                            "health_score": str(health_score),
                            "source": "postgres_crm"
                        }
                    }
                ],
                "relationships": []
            }
        )
        print(f"✅ Imported customer: {name}")

    Step 2: Add Unstructured Support Conversations

    # Support tickets from Zendesk/Intercom
    support_tickets = [
        {
            "ticket_id": "ticket_123",
            "customer_email": "alice@acme.com",
            "subject": "Feature request: API rate limits",
            "description": "We need higher rate limits for our enterprise plan..."
        }
    ]
    
    for ticket in support_tickets:
        response = papr.add_memory(
            content=f"{ticket['subject']}\n\n{ticket['description']}",
            external_user_id="support_sync_bot",
            memory_policy={
                "mode": "auto",  # LLM extracts entities
                "schema_id": "support-schema",
                "node_constraints": [
                    {
                        "node_type": "Customer",
                        "create": "never",  # Only link to existing
                        "unique": ["email"]  # Match by email
                    }
                ]
            },
            metadata={
                "ticket_id": ticket["ticket_id"],
                "customer_email": ticket["customer_email"],
                "source": "zendesk"
            }
        )
        print(f"✅ Added support ticket: {ticket['ticket_id']}")

    Step 3: Query Connected Context

    Now you can ask questions that span structured + unstructured data:

    # Find all context about a customer
    response = papr.search_memory(
        query="What are all the issues and requests from Acme Corp?",
        external_user_id="sales_rep_alice",
        enable_agentic_graph=True,
        max_memories=20,
        max_nodes=10
    )
    
    # Returns:
    # - Customer node from Postgres (segment, ARR, health score)
    # - Support tickets mentioning them
    # - Sales conversations about their needs
    # - Product docs they viewed
    # - Feature requests they submitted

    Common Integration Patterns

    Pattern 1: E-commerce (Products + Reviews + Support)

    # Postgres: Product catalog
    papr.add_memory(
        content="Product record",
        memory_policy={
            "mode": "manual",
            "nodes": [{
                "id": f"product_{sku}",
                "type": "Product",
                "properties": {
                    "sku": sku,
                    "name": product_name,
                    "price": price,
                    "inventory": inventory_count
                }
            }]
        }
    )
    
    # Unstructured: Customer reviews
    papr.add_memory(
        content=review_text,  # "Great product but shipping was slow..."
        memory_policy={
            "mode": "auto",
            "node_constraints": [
                {"node_type": "Product", "create": "never", "unique": ["sku"]}
            ]
        }
    )
    
    # Query: "What are customers saying about product SKU-123?"
    # Returns: Product details + all reviews + support tickets

    Pattern 2: SaaS (Accounts + Usage + Conversations)

    # Postgres: Account usage metrics
    papr.add_memory(
        content="Usage metrics",
        memory_policy={
            "mode": "manual",
            "nodes": [{
                "id": f"account_{account_id}",
                "type": "Account",
                "properties": {
                    "name": account_name,
                    "plan": "enterprise",
                    "api_calls_30d": api_calls,
                    "active_users": active_users
                }
            }]
        }
    )
    
    # Unstructured: Sales conversations
    papr.add_memory(
        content=meeting_transcript,  # "Discussed upgrading to enterprise..."
        memory_policy={
            "mode": "auto",
            "node_constraints": [
                {"node_type": "Account", "create": "never", "unique": ["name"]}
            ]
        }
    )
    
    # Query: "Which enterprise accounts are at risk of churning?"
    # Returns: Usage data + conversation sentiment + support history

    Pattern 3: Engineering (Tickets + Code + Docs)

    # Postgres: Jira tickets
    papr.add_memory(
        content="Bug ticket",
        memory_policy={
            "mode": "manual",
            "nodes": [{
                "id": f"ticket_{ticket_id}",
                "type": "Ticket",
                "properties": {
                    "ticket_id": ticket_id,
                    "title": title,
                    "status": status,
                    "priority": priority,
                    "assignee": assignee
                }
            }]
        }
    )
    
    # Unstructured: GitHub commits
    papr.add_memory(
        content=commit_message,  # "Fix bug in auth flow, closes JIRA-123"
        memory_policy={
            "mode": "auto",
            "node_constraints": [
                {"node_type": "Ticket", "create": "never", "unique": ["ticket_id"]}
            ]
        }
    )
    
    # Query: "What code changes fixed the authentication bug?"
    # Returns: Ticket details + related commits + PR discussions

    Sync Strategies

    Strategy 1: Initial Bulk Import

    # One-time import of existing data
    def bulk_import_customers():
        cursor.execute("SELECT * FROM customers")
        for row in cursor.fetchall():
            import_customer_to_papr(row)

    Strategy 2: Incremental Updates

    # Sync only changed records
    def incremental_sync():
        cursor.execute("""
            SELECT * FROM customers 
            WHERE updated_at > %s
        """, (last_sync_timestamp,))
        
        for row in cursor.fetchall():
            update_customer_in_papr(row)

    Strategy 3: Real-time CDC (Change Data Capture)

    # Use Postgres logical replication or Debezium
    # Stream changes to Papr in real-time
    
    @kafka_consumer.subscribe("postgres.customers")
    def handle_customer_change(event):
        if event.type == "INSERT" or event.type == "UPDATE":
            sync_customer_to_papr(event.data)

    Best Practices

    1. Use Natural Identifiers for Linking

    # ✅ Good: Use business identifiers
    node_constraints=[
        {
            "node_type": "Customer",
            "unique": ["email"],  # Natural identifier
            "create": "never"
        }
    ]
    
    # ❌ Bad: Use database IDs
    node_constraints=[
        {
            "node_type": "Customer",
            "unique": ["customer_id"],  # Won't match unstructured mentions
            "create": "never"
        }
    ]

    2. Separate Structured (Manual) from Unstructured (Auto)

    # Structured data from database: mode="manual"
    papr.add_memory(
        content="Database record",
        memory_policy={"mode": "manual", "nodes": [...]}
    )
    
    # Unstructured data from conversations: mode="auto"
    papr.add_memory(
        content="Customer mentioned in email...",
        memory_policy={"mode": "auto", "node_constraints": [...]}
    )

    3. Add Source Metadata

    # Track where data came from
    papr.add_memory(
        content="...",
        metadata={
            "source": "postgres_crm",
            "table": "customers",
            "synced_at": datetime.now().isoformat()
        }
    )

    4. Handle Deletions

    # When record deleted in Postgres
    def handle_deletion(customer_id):
        # Option 1: Soft delete (mark as inactive)
        papr.add_memory(
            content="Customer deactivated",
            memory_policy={
                "mode": "manual",
                "nodes": [{
                    "id": f"customer_{customer_id}",
                    "type": "Customer",
                    "properties": {"status": "inactive"}
                }]
            }
        )
        
        # Option 2: Hard delete (remove from graph)
        # Use GraphQL mutation or memory deletion endpoint

    Performance Optimization

    Batch Imports

    # Use batch endpoint for bulk imports
    memories = []
    for row in cursor.fetchmany(100):
        memories.append({
            "content": f"Customer: {row['name']}",
            "memory_policy": {...}
        })
    
    # Batch import (faster than individual calls)
    papr.add_memory_batch(memories)

    Incremental Sync Schedule

    # Cron job: Sync every 15 minutes
    */15 * * * * python sync_postgres_to_papr.py
    
    # Or use Temporal/Airflow for orchestration

    Troubleshooting

    Problem: Entities Not Linking

    Symptom: Unstructured data creates new nodes instead of linking to existing

    Solution: Check unique identifiers match

    # Ensure email format matches exactly
    node_constraints=[
        {
            "node_type": "Customer",
            "unique": ["email"],  # Case-insensitive matching
            "create": "never"
        }
    ]

    Problem: Slow Bulk Imports

    Solution: Use batch endpoint and parallel processing

    from concurrent.futures import ThreadPoolExecutor
    
    with ThreadPoolExecutor(max_workers=10) as executor:
        executor.map(import_batch, batches)

    Learn More

    
    
  3. Update overview/use-cases.md: Add prominent row at top:

    | **Connect Postgres/SQL with Unstructured Data** | `/v1/memory` (manual mode) | `/v1/memory/search`, `/v1/graphql` | Import structured database records and automatically link with documents, conversations, and support tickets |
  4. Update overview/index.md: Add to "Three Input Paths" section (line 69):

    3. **Structured Data** (`POST /v1/memory` with `mode: manual`) - Import from Postgres/SQL. Connect database records with unstructured context.

Priority: 🔴 HIGH - Major use case not prominently featured


Conclusion

Key Insights:

  1. Most critical issues (namespace filtering, search speed) were backend bugs - now fixed
  2. Main confusion points: external_user_id usage, schema prerequisites, memory policies
  3. Feedback endpoints are underutilized - need better visibility
  4. Structured data integration (Postgres → Papr) is a major use case that needs more prominence

Impact: Addressing these 7 documentation gaps will prevent 80%+ of similar confusion in future enterprise integrations.