Documentation Review: Developer Experience Analysis

Executive Summary

Reviewed all changes from apis/index.yaml update. Three major areas updated:

Session history order — Fixed newest-first behavior across all examples
Memory lifecycle — Added async status, webhooks, holographic/graph-aware
Graph-aware embeddings — New positioning guide with built-in schemas

What's Working Well ✅

1. Graph-aware embeddings guide

Clear concept: "reshapes vector space" metaphor
Concrete tables: Three built-in schemas with all 14 dimensions
API mapping table: Shows where each capability lives
Decision criteria: "When to enable" / "When to skip"

2. Session history fixes

Consistent: All Python/TypeScript examples now reversed() / .reverse()
Explicit callout: States "newest first" upfront
Complete coverage: Fixed in messages-management, compression, context-handling, tutorials

3. Memory lifecycle

Clear table: Poll vs webhook vs WebSocket
Status states: Listed all five states (queued → completed/failed)
Cross-references: Points to document-processing for webhook patterns

Critical DX Issues 🔴

Issue 1: Graph-aware embeddings — Dense opening

Problem:

Graph-aware embeddings are a Papr embedding mode that reshapes the vector space 
so it behaves like a structured graph of meaning, not only a flat semantic similarity.

This is accurate but intimidating. Developers scan the first paragraph to decide "is this for me?"

Fix: Add a problem-first opening before the mechanism:

# Graph-aware embeddings and domain schemas

**The problem:** Standard vector search ranks by "semantic closeness" but can't tell if two results 
are close for the **same reason**. A code snippet about "sorting arrays" and "sorting linked lists" 
might be semantically near but use different algorithms, data structures, and APIs.

**Graph-aware embeddings** solve this by encoding **structured domain dimensions** alongside your 
base vector—things like "programming language," "operation type," "temporal context," or custom 
fields you define. Search can then filter and boost by **topical alignment**, **domain-specific 
context**, and other axes beyond flat similarity.

**Implementation:** Uses Papr's holographic pipeline (`enable_holographic`, `holographic_config`, 
`/v1/holographic/*`). This guide covers **concepts and schemas**; API request/response shapes are 
in the [API reference](/apis/index.yaml).

Issue 2: Hz vs frequency — Terminology confusion

Problem:

Built-in schema tables show: Hz (0.1, 0.5, 2.0, 4.0...)
Custom schema API uses: frequency (with allowed values from spec)
Footnote says "Table types describe dimension; API uses lowercase" but doesn't clarify Hz mapping

Fix: Add a Hz explainer before custom schema section:

## About frequency bands (Hz values)

The built-in schemas show **Hz** values (0.1, 0.5, 2.0, etc.) representing the 14 standard 
**brain-inspired frequency bands** in holographic encoding. When registering a custom schema, 
your `frequency` field must use one of these allowed values (see `CustomFrequencyField` in the 
API reference for the exact enum).

**Rule of thumb:** 
- Lower Hz (0.1–2.0) → Categorical/Enum dimensions (language, domain)
- Mid Hz (4.0–14) → Descriptive/FreeText dimensions (intent, operation)
- Higher Hz (18–70) → List/MultiValue dimensions (APIs, entities)

Issue 3: Missing "should I use this?" decision tree

Problem: The guide says "when to enable" but doesn't help developers self-assess if their problem fits.

Fix: Add a decision flowchart before "When to enable":

## Do you need graph-aware embeddings?

Follow this decision tree:

1. **Is your baseline search "good enough"?**  
   → Yes: Skip graph-aware. Standard semantic + agentic graph is sufficient.  
   → No: Continue ↓

2. **Can you describe the quality gap in domain terms?**  
   Examples: "Returns Python when I need JavaScript," "Mixes bug reports with feature requests"  
   → Yes: You have a domain mismatch → Continue ↓  
   → No: Your problem might be query quality or data sparsity, not embeddings

3. **Does a built-in schema match your domain?**  
   - **cosqa** → Code search (snippet ↔ natural language)  
   - **scifact** → Scientific claims ↔ evidence  
   - **general** → Mixed content domains  
   → Yes: Start with built-in → Measure → Tune  
   → No: Continue ↓

4. **Can you define 4-14 structured dimensions for your domain?**  
   Examples: contract_type, jurisdiction, ticket_priority, user_intent  
   → Yes: Register custom schema with Papr → Co-design recommended  
   → No: Graph-aware mode requires a schema; revisit when you can articulate dimensions

Issue 4: No concrete before/after example

Problem: No example showing what difference it makes. Developers need to see impact to justify complexity.

Fix: Add a comparison example after "Why it matters":

## Example: Code search with vs without graph-aware

**Query:** "How do I sort a list in Python?"

### Standard semantic search (without graph-aware):
Returns mixed results:
1. Python `list.sort()` ✅
2. JavaScript `array.sort()` ❌ (different language)
3. Python sorting algorithms tutorial ⚠️ (conceptual, not code)
4. SQL `ORDER BY` ❌ (different domain)

### With graph-aware embeddings (cosqa schema):
Returns filtered results aligned on **language=Python** and **primary_operation=sorting**:
1. Python `list.sort()` ✅
2. Python `sorted()` function ✅
3. Python custom sort with `key=` ✅
4. Python `heapq.nsmallest()` for partial sorts ✅

**Why:** The schema encodes **programming_domain**, **language**, and **primary_operation** dimensions. 
H-COND scoring boosts results with high alignment on these fields.

Medium DX Issues 🟡

Issue 5: Session history — Missing "why reverse"

Problem: Examples show reversed() but don't explain when you need chronological vs newest-first display.

Fix: Add guidance in messages-management.md after the callout:

**When to reverse:**
- **Building LLM prompts**: Models expect chronological flow (oldest → newest)
- **Timeline displays**: Users expect conversation order
- **Transcript exports**: Standard format is oldest-first

**When to keep as-is (newest-first):**
- **Recent activity feeds**: "What happened recently?"
- **Pagination UX**: Load latest messages first, older on scroll
- **Admin dashboards**: Show most recent interactions

Issue 6: Memory status — No full lifecycle example

Problem: The table shows what to use but not how or when in a real flow.

Fix: Add a complete example after the status table:

### Full lifecycle example

```python
# 1. Create memory with webhook
memory = client.memory.add(
    content="Q4 planning: Launch new API, hire 2 engineers, $200k budget",
    enable_holographic=True,
    frequency_schema_id="general",
    webhook_url="https://api.myapp.com/papr-webhook",
    webhook_secret="my_webhook_secret_key"
)

memory_id = memory.memory_id
print(f"Created: {memory_id}, initial status: quick_saved")

# 2. Poll status if you need it immediately (webhook takes ~seconds)
import time
for i in range(5):
    status = client.memory.get_status(memory_id)
    print(f"Poll {i+1}: {status.status}")
    if status.status == "completed":
        break
    time.sleep(2)

# 3. Webhook handler (receives POST when done)
# POST https://api.myapp.com/papr-webhook
# Headers: X-Webhook-Secret, X-Webhook-Signature (HMAC)
# Body: {"event": "memory.completed", "memory_id": "...", "status": "completed", "completed_at": "..."}

When to use each:

Webhook: Background processing, decoupled systems, batch workflows
Polling: Need result immediately in same request flow
WebSocket: Real-time UI updates, progress bars, live dashboards


### Issue 7: Holographic terminology inconsistency

**Problem:**
We use both "holographic" and "graph-aware" — relationship not always clear.

**Fix:**
Add a **terminology box** at the top of graph-aware-embeddings.md:

```markdown
> **Terminology:** We call this **graph-aware embeddings** (concept) implemented via 
> **holographic** APIs (technology). When you see `enable_holographic` or 
> `holographic_config`, that's the implementation layer. When we discuss "domain schemas" 
> and "structured dimensions," that's the conceptual layer this guide focuses on.

Minor DX Issues 🟢

Issue 8: Custom schema example — No response handling

Problem: Shows the POST request but not what comes back or how to use the returned schema_id.

Fix: Complete the example:

// Response
{
  "status": "success",
  "schema_id": "acme:legal_contracts:1.0.0",
  "domain": "legal_contracts",
  "num_frequencies": 4
}

Then show usage:

# Use the returned schema_id when adding memories
client.memory.add(
    content="Signed NDA with Acme Corp, jurisdiction: US, expires 2027-01-01",
    enable_holographic=True,
    frequency_schema_id="acme:legal_contracts:1.0.0"  # Use registered schema
)

# And in search
results = client.memory.search(
    query="Find all active NDAs in US jurisdiction",
    holographic_config={
        "enabled": True,
        "frequency_schema_id": "acme:legal_contracts:1.0.0",
        "frequency_filters": {
            "contract_type": 0.9,  # Must be 90%+ aligned on contract type
            "jurisdiction": 0.8    # Must be 80%+ aligned on jurisdiction
        }
    }
)

Issue 9: Changelog — "Graph-aware" not aligned with guide title

Problem: Changelog says "Graph-aware embeddings" but then references holographic endpoints.

Fix:

- **Graph-aware embeddings (holographic APIs)** — Documented as domain-tuned vector space 
  (built-in **cosqa**, **scifact**, **general**; custom schemas via `POST /v1/holographic/domains`). 
  Implementation uses `enable_holographic`, `holographic_config`, `/v1/holographic/*` endpoints. 
  See [Graph-aware embeddings guide](/guides/graph-aware-embeddings.md); API shapes in 
  [API reference](/apis/index.yaml).

Issue 10: Capability matrix — No quickstart path for graph-aware

Problem: Matrix shows graph-aware row but no "get started" link like other capabilities have.

Fix: Add a "Getting Started" column or quick example in the key fields:

| Domain-tuned (graph-aware) retrieval | ... | Start with `frequency_schema_id="general"`, 
then move to domain-specific (`cosqa`, `scifact`) or custom. See 
[Graph-aware guide](../guides/graph-aware-embeddings.md) for decision tree |

Structural Recommendations 📐

1. Add "Graph-aware quickstart" tutorial

Problem: No 5-minute path to see it working.

Recommendation: Create tutorials/graph-aware-search.md:

Use cosqa schema (code search)
Show baseline search returning mixed languages
Enable holographic with cosqa
Show filtered results
Explain the difference

2. Cross-link decision flow

Current: Graph-aware guide is standalone Recommendation: Link from:

guides/search-tuning.md → "If baseline ranking isn't precise enough..."
guides/retrieval.md → "For domain-specific..."
quickstart/index.md → "Advanced: Domain-tuned search"

3. Add troubleshooting section to graph-aware guide

Common issues:

"Results didn't improve" → Wrong schema / query still too broad
"frequency_schema_id not found" → Need to GET /v1/frequencies for valid ids
"Slower than baseline" → Some scoring methods are GPU-heavy (check docs)

Validation Checklist ✓

All code examples use correct API paths
Request/response shapes match OpenAPI spec
No broken internal links (validated with script)
Consistent terminology within each guide
Changelog accurately describes changes
Examples include error handling patterns

Priority Improvements

Do First (High Impact, Low Effort):

Add problem-first opening to graph-aware guide
Add Hz explainer before custom schema section
Add terminology box clarifying holographic vs graph-aware
Add "when to reverse" guidance for session history
Complete custom schema example with response + usage

Do Next (High Impact, Medium Effort):

Add decision tree flowchart to graph-aware guide
Add before/after comparison example
Add full lifecycle example to memory status section
Cross-link graph-aware from search-tuning and retrieval

Nice to Have (Medium Impact, Higher Effort):

Create graph-aware quickstart tutorial
Add troubleshooting section
Add "Getting Started" column to capability matrix

Final Assessment

Overall quality: B+ → A- (with recommended fixes)

Strengths:

Accurate and complete coverage of API changes
Consistent terminology within each doc
Good use of tables and structured information
Examples match OpenAPI spec

Opportunities:

Lead with problems, not mechanisms
Show impact (before/after examples)
Reduce cognitive load at decision points
Complete the "getting started → production" journey

Developer journey gaps:

"Should I use graph-aware?" → Needs decision tree
"What will it do for me?" → Needs comparison example
"How do I get started?" → Needs quickstart or inline example
"Something's not working" → Needs troubleshooting section