Last updated

Documentation Review: Developer Experience Analysis

Executive Summary

Reviewed all changes from apis/index.yaml update. Three major areas updated:

  1. Session history order — Fixed newest-first behavior across all examples
  2. Memory lifecycle — Added async status, webhooks, holographic/graph-aware
  3. Graph-aware embeddings — New positioning guide with built-in schemas

What's Working Well ✅

1. Graph-aware embeddings guide

  • Clear concept: "reshapes vector space" metaphor
  • Concrete tables: Three built-in schemas with all 14 dimensions
  • API mapping table: Shows where each capability lives
  • Decision criteria: "When to enable" / "When to skip"

2. Session history fixes

  • Consistent: All Python/TypeScript examples now reversed() / .reverse()
  • Explicit callout: States "newest first" upfront
  • Complete coverage: Fixed in messages-management, compression, context-handling, tutorials

3. Memory lifecycle

  • Clear table: Poll vs webhook vs WebSocket
  • Status states: Listed all five states (queued → completed/failed)
  • Cross-references: Points to document-processing for webhook patterns

Critical DX Issues 🔴

Issue 1: Graph-aware embeddings — Dense opening

Problem:

Graph-aware embeddings are a Papr embedding mode that reshapes the vector space 
so it behaves like a structured graph of meaning, not only a flat semantic similarity.

This is accurate but intimidating. Developers scan the first paragraph to decide "is this for me?"

Fix: Add a problem-first opening before the mechanism:

# Graph-aware embeddings and domain schemas

**The problem:** Standard vector search ranks by "semantic closeness" but can't tell if two results 
are close for the **same reason**. A code snippet about "sorting arrays" and "sorting linked lists" 
might be semantically near but use different algorithms, data structures, and APIs.

**Graph-aware embeddings** solve this by encoding **structured domain dimensions** alongside your 
base vector—things like "programming language," "operation type," "temporal context," or custom 
fields you define. Search can then filter and boost by **topical alignment**, **domain-specific 
context**, and other axes beyond flat similarity.

**Implementation:** Uses Papr's holographic pipeline (`enable_holographic`, `holographic_config`, 
`/v1/holographic/*`). This guide covers **concepts and schemas**; API request/response shapes are 
in the [API reference](/apis/index.yaml).

Issue 2: Hz vs frequency — Terminology confusion

Problem:

  • Built-in schema tables show: Hz (0.1, 0.5, 2.0, 4.0...)
  • Custom schema API uses: frequency (with allowed values from spec)
  • Footnote says "Table types describe dimension; API uses lowercase" but doesn't clarify Hz mapping

Fix: Add a Hz explainer before custom schema section:

## About frequency bands (Hz values)

The built-in schemas show **Hz** values (0.1, 0.5, 2.0, etc.) representing the 14 standard 
**brain-inspired frequency bands** in holographic encoding. When registering a custom schema, 
your `frequency` field must use one of these allowed values (see `CustomFrequencyField` in the 
API reference for the exact enum).

**Rule of thumb:** 
- Lower Hz (0.1–2.0) → Categorical/Enum dimensions (language, domain)
- Mid Hz (4.0–14) → Descriptive/FreeText dimensions (intent, operation)
- Higher Hz (18–70) → List/MultiValue dimensions (APIs, entities)

Issue 3: Missing "should I use this?" decision tree

Problem: The guide says "when to enable" but doesn't help developers self-assess if their problem fits.

Fix: Add a decision flowchart before "When to enable":

## Do you need graph-aware embeddings?

Follow this decision tree:

1. **Is your baseline search "good enough"?**  
   → Yes: Skip graph-aware. Standard semantic + agentic graph is sufficient.  
   → No: Continue ↓

2. **Can you describe the quality gap in domain terms?**  
   Examples: "Returns Python when I need JavaScript," "Mixes bug reports with feature requests"  
   → Yes: You have a domain mismatch → Continue ↓  
   → No: Your problem might be query quality or data sparsity, not embeddings

3. **Does a built-in schema match your domain?**  
   - **cosqa** → Code search (snippet ↔ natural language)  
   - **scifact** → Scientific claims ↔ evidence  
   - **general** → Mixed content domains  
   → Yes: Start with built-in → Measure → Tune  
   → No: Continue ↓

4. **Can you define 4-14 structured dimensions for your domain?**  
   Examples: contract_type, jurisdiction, ticket_priority, user_intent  
   → Yes: Register custom schema with Papr → Co-design recommended  
   → No: Graph-aware mode requires a schema; revisit when you can articulate dimensions

Issue 4: No concrete before/after example

Problem: No example showing what difference it makes. Developers need to see impact to justify complexity.

Fix: Add a comparison example after "Why it matters":

## Example: Code search with vs without graph-aware

**Query:** "How do I sort a list in Python?"

### Standard semantic search (without graph-aware):
Returns mixed results:
1. Python `list.sort()`
2. JavaScript `array.sort()` ❌ (different language)
3. Python sorting algorithms tutorial ⚠️ (conceptual, not code)
4. SQL `ORDER BY` ❌ (different domain)

### With graph-aware embeddings (cosqa schema):
Returns filtered results aligned on **language=Python** and **primary_operation=sorting**:
1. Python `list.sort()`
2. Python `sorted()` function ✅
3. Python custom sort with `key=`
4. Python `heapq.nsmallest()` for partial sorts ✅

**Why:** The schema encodes **programming_domain**, **language**, and **primary_operation** dimensions. 
H-COND scoring boosts results with high alignment on these fields.

Medium DX Issues 🟡

Issue 5: Session history — Missing "why reverse"

Problem: Examples show reversed() but don't explain when you need chronological vs newest-first display.

Fix: Add guidance in messages-management.md after the callout:

**When to reverse:**
- **Building LLM prompts**: Models expect chronological flow (oldest → newest)
- **Timeline displays**: Users expect conversation order
- **Transcript exports**: Standard format is oldest-first

**When to keep as-is (newest-first):**
- **Recent activity feeds**: "What happened recently?"
- **Pagination UX**: Load latest messages first, older on scroll
- **Admin dashboards**: Show most recent interactions

Issue 6: Memory status — No full lifecycle example

Problem: The table shows what to use but not how or when in a real flow.

Fix: Add a complete example after the status table:

### Full lifecycle example

```python
# 1. Create memory with webhook
memory = client.memory.add(
    content="Q4 planning: Launch new API, hire 2 engineers, $200k budget",
    enable_holographic=True,
    frequency_schema_id="general",
    webhook_url="https://api.myapp.com/papr-webhook",
    webhook_secret="my_webhook_secret_key"
)

memory_id = memory.memory_id
print(f"Created: {memory_id}, initial status: quick_saved")

# 2. Poll status if you need it immediately (webhook takes ~seconds)
import time
for i in range(5):
    status = client.memory.get_status(memory_id)
    print(f"Poll {i+1}: {status.status}")
    if status.status == "completed":
        break
    time.sleep(2)

# 3. Webhook handler (receives POST when done)
# POST https://api.myapp.com/papr-webhook
# Headers: X-Webhook-Secret, X-Webhook-Signature (HMAC)
# Body: {"event": "memory.completed", "memory_id": "...", "status": "completed", "completed_at": "..."}

When to use each:

  • Webhook: Background processing, decoupled systems, batch workflows
  • Polling: Need result immediately in same request flow
  • WebSocket: Real-time UI updates, progress bars, live dashboards

### Issue 7: Holographic terminology inconsistency

**Problem:**
We use both "holographic" and "graph-aware" — relationship not always clear.

**Fix:**
Add a **terminology box** at the top of graph-aware-embeddings.md:

```markdown
> **Terminology:** We call this **graph-aware embeddings** (concept) implemented via 
> **holographic** APIs (technology). When you see `enable_holographic` or 
> `holographic_config`, that's the implementation layer. When we discuss "domain schemas" 
> and "structured dimensions," that's the conceptual layer this guide focuses on.

Minor DX Issues 🟢

Issue 8: Custom schema example — No response handling

Problem: Shows the POST request but not what comes back or how to use the returned schema_id.

Fix: Complete the example:

// Response
{
  "status": "success",
  "schema_id": "acme:legal_contracts:1.0.0",
  "domain": "legal_contracts",
  "num_frequencies": 4
}

Then show usage:

# Use the returned schema_id when adding memories
client.memory.add(
    content="Signed NDA with Acme Corp, jurisdiction: US, expires 2027-01-01",
    enable_holographic=True,
    frequency_schema_id="acme:legal_contracts:1.0.0"  # Use registered schema
)

# And in search
results = client.memory.search(
    query="Find all active NDAs in US jurisdiction",
    holographic_config={
        "enabled": True,
        "frequency_schema_id": "acme:legal_contracts:1.0.0",
        "frequency_filters": {
            "contract_type": 0.9,  # Must be 90%+ aligned on contract type
            "jurisdiction": 0.8    # Must be 80%+ aligned on jurisdiction
        }
    }
)

Issue 9: Changelog — "Graph-aware" not aligned with guide title

Problem: Changelog says "Graph-aware embeddings" but then references holographic endpoints.

Fix:

- **Graph-aware embeddings (holographic APIs)** — Documented as domain-tuned vector space 
  (built-in **cosqa**, **scifact**, **general**; custom schemas via `POST /v1/holographic/domains`). 
  Implementation uses `enable_holographic`, `holographic_config`, `/v1/holographic/*` endpoints. 
  See [Graph-aware embeddings guide](/guides/graph-aware-embeddings.md); API shapes in 
  [API reference](/apis/index.yaml).

Issue 10: Capability matrix — No quickstart path for graph-aware

Problem: Matrix shows graph-aware row but no "get started" link like other capabilities have.

Fix: Add a "Getting Started" column or quick example in the key fields:

| Domain-tuned (graph-aware) retrieval | ... | Start with `frequency_schema_id="general"`, 
then move to domain-specific (`cosqa`, `scifact`) or custom. See 
[Graph-aware guide](../guides/graph-aware-embeddings.md) for decision tree |

Structural Recommendations 📐

1. Add "Graph-aware quickstart" tutorial

Problem: No 5-minute path to see it working.

Recommendation: Create tutorials/graph-aware-search.md:

  • Use cosqa schema (code search)
  • Show baseline search returning mixed languages
  • Enable holographic with cosqa
  • Show filtered results
  • Explain the difference

Current: Graph-aware guide is standalone Recommendation: Link from:

  • guides/search-tuning.md → "If baseline ranking isn't precise enough..."
  • guides/retrieval.md → "For domain-specific..."
  • quickstart/index.md → "Advanced: Domain-tuned search"

3. Add troubleshooting section to graph-aware guide

Common issues:

  • "Results didn't improve" → Wrong schema / query still too broad
  • "frequency_schema_id not found" → Need to GET /v1/frequencies for valid ids
  • "Slower than baseline" → Some scoring methods are GPU-heavy (check docs)

Validation Checklist ✓

  • All code examples use correct API paths
  • Request/response shapes match OpenAPI spec
  • No broken internal links (validated with script)
  • Consistent terminology within each guide
  • Changelog accurately describes changes
  • Examples include error handling patterns

Priority Improvements

Do First (High Impact, Low Effort):

  1. Add problem-first opening to graph-aware guide
  2. Add Hz explainer before custom schema section
  3. Add terminology box clarifying holographic vs graph-aware
  4. Add "when to reverse" guidance for session history
  5. Complete custom schema example with response + usage

Do Next (High Impact, Medium Effort):

  1. Add decision tree flowchart to graph-aware guide
  2. Add before/after comparison example
  3. Add full lifecycle example to memory status section
  4. Cross-link graph-aware from search-tuning and retrieval

Nice to Have (Medium Impact, Higher Effort):

  1. Create graph-aware quickstart tutorial
  2. Add troubleshooting section
  3. Add "Getting Started" column to capability matrix

Final Assessment

Overall quality: B+ → A- (with recommended fixes)

Strengths:

  • Accurate and complete coverage of API changes
  • Consistent terminology within each doc
  • Good use of tables and structured information
  • Examples match OpenAPI spec

Opportunities:

  • Lead with problems, not mechanisms
  • Show impact (before/after examples)
  • Reduce cognitive load at decision points
  • Complete the "getting started → production" journey

Developer journey gaps:

  • "Should I use graph-aware?" → Needs decision tree
  • "What will it do for me?" → Needs comparison example
  • "How do I get started?" → Needs quickstart or inline example
  • "Something's not working" → Needs troubleshooting section