Last updated

Osmosis Use Case Analysis: Papr Capability Assessment

Date: February 16, 2026
Developer: Osmosis Project
Use Case: Documentary knowledge governance with evidence-anchored claims


Executive Summary

Can the developer use Papr for their use case?

YES - with custom schema design, but they'll need to build governance logic on top.

Key Finding: Papr provides a strong foundation for the storage, retrieval, and relationship-tracking layer. However, the epistemological tracking (conflict detection, corroboration scoring, temporal versioning) requires custom application logic built on top of Papr's primitives.

Recommendation: Use Papr as the knowledge graph + retrieval engine, build Osmosis governance layer on top using custom schemas and GraphQL analytics.


Understanding the Osmosis System

Core Mission

Building a documentary knowledge governance system focused on:

  • Evidence-anchored claims - Assertions tied to specific source evidence
  • Cross-document validation - Detecting conflicts and corroboration
  • Structured knowledge - Not just retrieval, but epistemological tracking

Philosophical Approach

This is fundamentally about knowledge provenance and confidence:

  • What do we know?
  • How confident are we?
  • Where did it come from?
  • Does it conflict with other sources?
  • How has our understanding evolved over time?

The 5 Questions - Detailed Analysis

Question 1: Handling Conflicting Statements

"How does Papr handle situations where different documents assert conflicting statements or values?"

What Papr Provides ✅

Automatic Deduplication + Multi-Source Tracking:

When you define unique_identifiers in your schema:

schema = {
    "Claim": {
        "properties": {
            "subject": {"type": "string"},
            "predicate": {"type": "string"},
            "object": {"type": "string"}
        },
        "unique_identifiers": ["subject", "predicate", "object"]  # Triple-based dedup
    }
}

Papr automatically:

  1. Deduplicates claims with same (subject, predicate, object)
  2. Creates multiple EXTRACTED_FROM relationships to different sources
  3. Merges into ONE node instead of creating duplicates

Example - Same claim from 3 documents:

Document 1: "Product X costs $100"
Document 2: "Product X is priced at $100"  
Document 3: "Product X: $100"

→ Papr creates ONE Claim node:
   Claim {subject: "Product X", predicate: "costs", object: "$100"}
     ← EXTRACTED_FROM → Source: doc_1
     ← EXTRACTED_FROM → Source: doc_2
     ← EXTRACTED_FROM → Source: doc_3

Conflict Detection - Built into GraphQL:

# Query automatically shows:
# 1. All claims for same subject (grouped by Papr's dedup)
# 2. Count of sources per claim (via EXTRACTED_FROM relationships)
# 3. Different values (different object values)

query = """
query ConflictDetection($subject: String!) {
  claims(where: { subject: $subject }) {
    object  # The value
    sources {  # Auto-counted by Papr
      document_id
      version
      date
    }
  }
}
"""

result = await client.graphql.query(query, {"subject": "Product X"})

# Returns:
# Claim 1: object="$100", sources=[doc_1, doc_2, doc_3] (3 sources)
# Claim 2: object="$150", sources=[doc_4] (1 source)
# → CONFLICT DETECTED: 2 different values for same subject

Resolution Logic:

# Developer decides resolution strategy:
# - Count-based: "Pick claim with most sources" (3 > 1 → $100 wins)
# - Freshness: "Pick claim from most recent document"
# - Authority: "Pick claim from official source"
# - Combined: "Most sources from official docs"

What They Can Build:

# Custom schema for Claims
schema = client.schemas.create(
    name="Osmosis Claims Schema",
    node_types={
        "Claim": {
            "properties": {
                "statement": {"type": "string", "required": True},
                "subject": {"type": "string"},      # "Product X"
                "predicate": {"type": "string"},    # "costs"
                "object": {"type": "string"},       # "$100"
                "confidence": {"type": "float"},
                "extracted_at": {"type": "datetime"}
            },
            "unique_identifiers": ["statement"]
        },
        "Source": {
            "properties": {
                "document_id": {"type": "string", "required": True},
                "version": {"type": "string"},
                "page": {"type": "integer"},
                "section": {"type": "string"}
            }
        },
        "ConflictSet": {
            "properties": {
                "subject": {"type": "string"},
                "predicate": {"type": "string"},
                "detected_at": {"type": "datetime"},
                "resolution_status": {"type": "string", "enum_values": ["unresolved", "resolved", "ignored"]}
            }
        }
    },
    relationship_types={
        "EXTRACTED_FROM": {
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["Source"]
        },
        "CONFLICTS_WITH": {
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["Claim"]
        },
        "MEMBER_OF": {
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["ConflictSet"]
        }
    }
)

Conflict Detection Logic (Custom):

# 1. Extract claims from documents
client.memory.add(
    content="Product X costs $100 according to pricing sheet",
    memory_policy={
        "mode": "auto",
        "schema_id": osmosis_schema_id,
        "node_constraints": [
            {
                "node_type": "Claim",
                "set": {
                    "subject": "Product X",
                    "predicate": "costs",
                    "object": "$100",
                    "confidence": 0.95
                }
            }
        ]
    }
)

# 2. Periodic conflict detection service
async def detect_conflicts():
    # Query all claims about same subject+predicate
    query = """
    query FindPotentialConflicts($subject: String!, $predicate: String!) {
      claims(where: {
        subject: $subject,
        predicate: $predicate
      }) {
        id
        object
        confidence
        extracted_from {
          document_id
          version
        }
      }
    }
    """
    
    result = await client.graphql.query(query, {"subject": "Product X", "predicate": "costs"})
    
    # Find claims with different objects
    unique_values = set(claim['object'] for claim in result['claims'])
    
    if len(unique_values) > 1:
        # CONFLICT DETECTED
        conflict_set_id = create_conflict_set()
        for claim in result['claims']:
            link_claim_to_conflict(claim['id'], conflict_set_id)

Gap Analysis ⚠️

  • Papr automatically deduplicates same claim across sources
  • Papr tracks all sources per claim via EXTRACTED_FROM relationships
  • GraphQL can count sources and compare different values
  • Conflict identification is just: "same subject, different objects, >1 claim"
  • ⚠️ Resolution strategy is custom logic (count-based, freshness, authority)
  • ⚠️ ConflictSet nodes optional (for tracking resolution workflow)

Verdict: Papr handles 80% automatically (dedup, multi-source tracking, querying). Developer adds 20% (resolution rules, workflow tracking).


Question 2: Contextual Metadata vs. Explicit Claims

"Does the graph model distinguish between contextual metadata and explicit claims?"

What Papr Provides ✅

Two Ways to Add Metadata:

1. Memory-level metadata (about the source/context):

client.memory.add(
    content="API rate limit is 1000 requests/hour",
    metadata={
        # Contextual metadata - NOT extracted, just stored
        "source_type": "documentation",
        "version": "v2.0",
        "last_updated": "2026-01-15",
        "authority": "official",
        "document_section": "rate-limits"
    },
    memory_policy={"mode": "auto", "schema_id": osmosis_schema_id}
)

2. Node property overrides (forced onto extracted nodes):

client.memory.add(
    content="API rate limit is 1000 requests/hour",
    memory_policy={
        "mode": "auto",  # Still let LLM extract the claim
        "schema_id": osmosis_schema_id,
        "node_constraints": [
            {
                "node_type": "Claim",
                "set": {
                    # Metadata injected directly onto node
                    "document_version": "v2.0",
                    "source_authority": "official",
                    "extraction_date": "2026-01-15",
                    "reviewed": False,
                    # LLM extracts: subject, predicate, object
                    # Developer injects: version, authority, etc.
                }
            }
        ]
    }
)

Key Difference:

  • Memory metadata: Stored on memory object, queryable via memory search
  • Node properties: Stored on graph nodes, queryable via GraphQL, part of entity
  • Auto-extracted: LLM extracts from content (subject, predicate, object)
  • Property overrides: Developer forces specific values (version, authority, dates)

Schema-Level Distinction:

# They can model the distinction in their schema
node_types={
    "Claim": {  # First-class claim
        "properties": {
            "statement": {"type": "string"},
            "confidence": {"type": "float"},
            "claim_type": {"type": "string", "enum_values": ["fact", "opinion", "policy"]}
        }
    },
    "SourceMetadata": {  # Contextual information
        "properties": {
            "version": {"type": "string"},
            "authority_level": {"type": "string"},
            "publication_date": {"type": "datetime"}
        }
    }
}

relationship_types={
    "HAS_METADATA": {
        "allowed_source_types": ["Claim"],
        "allowed_target_types": ["SourceMetadata"]
    }
}

Gap Analysis ⚠️

  • Memory metadata: Contextual info about source
  • Property overrides: Force metadata onto nodes
  • Auto-extraction: LLM extracts claims from content
  • Clear separation: metadata vs node properties vs extracted properties
  • Flexible control: Choose what's auto-extracted vs injected

Verdict: Full control over metadata vs claims. Use memory.metadata for context, node_constraints.set for forced properties, auto mode for extracted claims.


Question 3: Versioned Knowledge / Temporal Evolution

"Is there a notion of versioned knowledge or temporal evolution across documents?"

What Papr Provides ✅

Timestamp Tracking:

  • Every memory has context.timestamp
  • Can filter by time ranges
  • Time-weighted search available

Two Version Approaches:

Approach 1: Version as node property (simpler):

# Add version directly on Claim nodes
client.memory.add(
    content="API v1.0 rate limit is 100/hour",
    memory_policy={
        "mode": "auto",
        "schema_id": osmosis_schema_id,
        "node_constraints": [
            {
                "node_type": "Claim",
                "set": {
                    "version": "v1.0",
                    "effective_from": "2024-01-01",
                    "effective_until": "2025-06-01"
                    # LLM extracts: subject, predicate, object
                }
            }
        ]
    }
)

# Query by version
query = """
query GetClaimsByVersion($subject: String!, $version: String!) {
  claims(where: { subject: $subject, version: $version }) {
    object
    effective_from
    effective_until
  }
}
"""

Approach 2: Separate Version nodes (more structured):

# Create explicit version chain
node_types={
    "KnowledgeVersion": {
        "properties": {
            "subject": {"type": "string"},
            "version": {"type": "string"},
            "effective_from": {"type": "datetime"},
            "effective_until": {"type": "datetime"}
        }
    }
}

relationship_types={
    "SUPERSEDES": {
        "allowed_source_types": ["KnowledgeVersion"],
        "allowed_target_types": ["KnowledgeVersion"]
    },
    "VERSION_OF": {
        "allowed_source_types": ["Claim"],
        "allowed_target_types": ["KnowledgeVersion"]
    }
}

relationship_types={
    "SUPERSEDES": {
        "allowed_source_types": ["KnowledgeVersion"],
        "allowed_target_types": ["KnowledgeVersion"]
    },
    "VERSION_OF": {
        "allowed_source_types": ["Claim"],
        "allowed_target_types": ["KnowledgeVersion"]
    }
}

# Store versioned claims
client.memory.add(
    content="API v1.0 says rate limit is 100/hour",
    memory_policy={
        "mode": "auto",
        "schema_id": osmosis_schema_id,
        "node_constraints": [
            {
                "node_type": "Claim",
                "set": {
                    "as_of_version": "v1.0",
                    "statement": "rate limit is 100/hour"
                }
            },
            {
                "node_type": "KnowledgeVersion",
                "set": {
                    "version": "v1.0",
                    "effective_from": "2024-01-01T00:00:00Z",
                    "effective_until": "2025-06-01T00:00:00Z"
                }
            }
        ]
    }
)

# Query version history
query = """
query GetVersionHistory($subject: String!) {
  versions: knowledge_versions(
    where: { subject: $subject }
    order_by: { effective_from: ASC }
  ) {
    version
    effective_from
    effective_until
    claims {
      statement
    }
    supersedes {
      version
    }
  }
}
"""

Temporal Queries:

# Point-in-time query (custom logic)
async def get_knowledge_at_time(subject: str, timestamp: datetime):
    """Get what we knew about subject at specific time."""
    query = """
    query PointInTime($subject: String!, $timestamp: DateTime!) {
      claims(where: {
        subject: $subject,
        version: {
          effective_from: { $lte: $timestamp },
          effective_until: { $gte: $timestamp }
        }
      }) {
        statement
        version
        sources {
          document_id
        }
      }
    }
    """
    return await client.graphql.query(query, {
        "subject": subject,
        "timestamp": timestamp.isoformat()
    })

Gap Analysis ⚠️

  • ✅ Timestamps on all memories (automatic)
  • Can inject version via property overrides (node_constraints.set)
  • ✅ Can model versions as explicit nodes (more structured)
  • ✅ Can create version chains (SUPERSEDES relationships)
  • ✅ Time-based filtering in queries
  • ⚠️ Version semantics defined by developer (what version means)
  • ⚠️ Point-in-time queries via effective_from/effective_until filters

Verdict: Temporal tracking is straightforward via property injection. Two options: (1) version property on claims, or (2) separate version nodes with relationships.


Question 4: Source Evidence vs. Inferred Relationships

"Are graph relationships strictly tied to source evidence, or can they be inferred dynamically?"

What Papr Provides ✅

Two Graph Generation Modes:

1. Auto Mode (Extraction-based):

# LLM extracts relationships from content
client.memory.add(
    content="Jane Smith works at Acme Corp as CTO. Acme Corp is partnered with TechVentures.",
    memory_policy={"mode": "auto", "schema_id": osmosis_schema_id}
)

# Papr automatically creates:
# - Jane Smith (Person node)
# - Acme Corp (Company node)
# - TechVentures (Company node)
# - WORKS_AT relationship (Jane -> Acme)
# - PARTNERED_WITH relationship (Acme -> TechVentures)

Each extracted relationship is tied to the memory (source):

(Memory) --EXTRACTED--> (Relationship) --CONNECTS--> (Node)

2. Manual Mode (Explicit):

# Developer specifies exact graph structure
client.memory.add(
    content="Explicit claim linking",
    memory_policy={
        "mode": "manual",
        "nodes": [
            {"id": "claim_1", "type": "Claim", "properties": {"statement": "X"}},
            {"id": "claim_2", "type": "Claim", "properties": {"statement": "Y"}}
        ],
        "relationships": [
            {
                "source": "claim_1",
                "target": "claim_2",
                "type": "SUPPORTS",
                "properties": {"confidence": 0.8, "inferred": False}
            }
        ]
    }
)

Custom Inference Layer:

# Developer can build inference rules
async def infer_transitive_relationships():
    """Example: If A→B and B→C, infer A→C"""
    
    # 1. Query existing relationships
    query = """
    query GetChains {
      relationships(where: { type: "RELATED_TO" }) {
        source { id }
        target { id }
        properties
      }
    }
    """
    
    result = await client.graphql.query(query)
    
    # 2. Find transitive patterns
    chains = find_transitive_chains(result['relationships'])
    
    # 3. Create inferred relationships
    for chain in chains:
        await client.memory.add(
            content=f"Inferred relationship: {chain['source']} -> {chain['target']}",
            memory_policy={
                "mode": "manual",
                "relationships": [
                    {
                        "source": chain['source'],
                        "target": chain['target'],
                        "type": "RELATED_TO",
                        "properties": {
                            "inferred": True,
                            "inference_rule": "transitive",
                            "confidence": 0.7
                        }
                    }
                ]
            }
        )

Relationship Provenance:

# Can track whether relationship is extracted or inferred
relationship_types={
    "SUPPORTS": {
        "properties": {
            "provenance_type": {
                "type": "string", 
                "enum_values": ["extracted", "inferred", "manual"]
            },
            "extraction_source": {"type": "string"},
            "inference_rule": {"type": "string"}
        }
    }
}

Gap Analysis ⚠️

  • ✅ Relationships are tied to source memories (provenance)
  • ✅ Can mark relationships as extracted vs inferred
  • ✅ Manual mode for explicit relationship creation
  • ❌ No built-in inference engine
  • ❌ No rule-based relationship derivation
  • ❌ Transitive/logical inference requires custom code

Verdict: Relationships are primarily extracted, not inferred. Developer can build inference layer using GraphQL queries + manual relationship creation.


Question 5: Single-Source vs. Multi-Source Corroboration

"Do you differentiate between single-source extracted claims and multi-source corroborated knowledge?"

What Papr Provides ✅

Entity Resolution Across Sources:

# Document 1
client.memory.add(
    content="Product X costs $100 (from pricing sheet)",
    memory_policy={"mode": "auto", "schema_id": osmosis_schema_id}
)

# Document 2  
client.memory.add(
    content="Product X is priced at $100 (from sales deck)",
    memory_policy={"mode": "auto", "schema_id": osmosis_schema_id}
)

# Document 3
client.memory.add(
    content="Product X: $100 (from website)",
    memory_policy={"mode": "auto", "schema_id": osmosis_schema_id}
)

# Papr's entity resolution merges "Product X" into single node
# Can query which memories reference it

Corroboration is Built-In via Deduplication:

Papr automatically:

  1. Merges same claim from multiple sources → ONE node
  2. Creates EXTRACTED_FROM relationship for each source
  3. GraphQL can count these relationships

Simple Query to Get Corroboration:

# No custom service needed - just query!
query = """
query GetClaimCorroboration($subject: String!) {
  claims(where: { subject: $subject }) {
    object
    statement
    sources {  # Papr automatically populates this
      document_id
      version
      authority
      date
    }
  }
}
"""

result = await client.graphql.query(query, {"subject": "Product X price"})

# Analyze results
for claim in result['data']['claims']:
    source_count = len(claim['sources'])
    official_count = sum(1 for s in claim['sources'] if s['authority'] == 'official')
    
    print(f"Claim: {claim['object']}")
    print(f"  Supported by {source_count} sources ({official_count} official)")
    
    # Apply resolution rules
    if source_count >= 3 and official_count >= 2:
        print(f"  ✅ High confidence (3+ sources, 2+ official)")
    elif source_count == 1:
        print(f"  ⚠️ Single source - needs verification")

Optional: Store Computed Score:

# If you want to cache the score on the node
await client.memory.add(
    content=f"Update corroboration score",
    memory_policy={
        "mode": "manual",
        "node_constraints": [{
            "node_type": "Claim",
            "search": {"properties": [{"name": "id", "mode": "exact", "value": claim_id}]},
            "set": {
                "source_count": source_count,
                "corroboration_score": source_count / 5.0
            }
        }]
    }
)

Key Point: Counting sources is a simple GraphQL query, not a complex background service. Papr's deduplication does the heavy lifting.

Gap Analysis ⚠️

  • Entity merging across sources (automatic via unique_identifiers)
  • Multi-source relationships (EXTRACTED_FROM to each source)
  • GraphQL counts sources (via relationship traversal)
  • Simple query shows source count and details
  • ⚠️ Corroboration formula is developer's choice (count? weighted? freshness?)
  • ⚠️ Caching score optional (can compute on-demand or store on node)

Verdict: Multi-source tracking is automatic. Counting sources is a simple GraphQL query. Developer only chooses scoring formula (if needed).


┌─────────────────────────────────────────────────────────────┐
│                   OSMOSIS APPLICATION LAYER                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Governance Services (Custom Logic)                  │  │
│  │  • Conflict Detection                                │  │
│  │  • Corroboration Scoring                            │  │
│  │  • Version History Management                        │  │
│  │  • Inference Engine (transitive, rule-based)       │  │
│  │  • Evidence Strength Calculation                    │  │
│  └──────────────────────────────────────────────────────┘  │
│                           ↕︎                                  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Osmosis Data Model (Custom Schema)                 │  │
│  │  • Claim nodes (statement, confidence, version)     │  │
│  │  • Source nodes (document, page, authority)         │  │
│  │  • ConflictSet nodes (subject, resolution_status)   │  │
│  │  • KnowledgeVersion nodes (version, timespan)       │  │
│  │  • EXTRACTED_FROM, CONFLICTS_WITH, SUPERSEDES       │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                            ↕︎
┌─────────────────────────────────────────────────────────────┐
│                    PAPR MEMORY API                           │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Knowledge Graph Storage                             │  │
│  │  • Neo4j graph database                              │  │
│  │  • Custom schema support                             │  │
│  │  • Node/relationship creation                        │  │
│  │  • Entity resolution & merging                       │  │
│  └──────────────────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  Hybrid Retrieval Engine                             │  │
│  │  • Vector embeddings (semantic search)               │  │
│  │  • Keyword search (BM25)                             │  │
│  │  • Graph traversal (multi-hop)                       │  │
│  └──────────────────────────────────────────────────────┘  │
│  ┌──────────────────────────────────────────────────────┐  │
│  │  GraphQL Query Layer                                 │  │
│  │  • Complex relationship queries                      │  │
│  │  • Aggregations & analytics                          │  │
│  │  • Multi-hop traversals                              │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Why Can They Build This with Papr?

Core Strengths That Enable Osmosis

1. Custom Schema System

# Full control over domain ontology
schema = client.schemas.create(
    name="Osmosis",
    node_types={
        "Claim": {...},
        "Source": {...},
        "ConflictSet": {...},
        "Evidence": {...}
    },
    relationship_types={
        "EXTRACTED_FROM": {...},
        "CONFLICTS_WITH": {...},
        "SUPPORTS": {...},
        "REFUTES": {...}
    }
)

Why This Matters:

  • Define exactly what a "claim" is
  • Model provenance relationships explicitly
  • Track conflict resolution status
  • Store confidence/corroboration scores

2. Node Constraints

# Force properties onto nodes
memory_policy={
    "node_constraints": [
        {
            "node_type": "Claim",
            "set": {
                "confidence": 0.95,
                "extraction_method": "llm",
                "extracted_at": "2026-02-16T10:00:00Z",
                "document_version": "v2.1"
            }
        }
    ]
}

Why This Matters:

  • Every claim gets metadata automatically
  • Consistent property injection
  • Source attribution baked in
  • Version tracking per claim

3. GraphQL Analytics

# Complex queries for governance
query = """
query ConflictAnalysis($subject: String!) {
  conflict_sets(where: { subject: $subject, resolution_status: "unresolved" }) {
    subject
    predicate
    claims {
      statement
      confidence
      sources {
        document_id
        authority_level
        version
      }
    }
  }
}
"""

Why This Matters:

  • Find all conflicts for a subject
  • Compare source authority levels
  • Identify version discrepancies
  • Aggregate corroboration counts

4. Flexible Metadata

# Store any custom fields
metadata={
    "claim_type": "factual",
    "extraction_confidence": 0.92,
    "reviewed_by_human": False,
    "conflict_status": "none",
    "source_authority": "official",
    "as_of_version": "v2.1"
}

Why This Matters:

  • Track governance workflow state
  • Store review status
  • Flag conflicts
  • Version everything

5. Entity Resolution

# Same entity mentioned in multiple documents → one node
# "Product X", "product x", "ProductX" → merged automatically

Why This Matters:

  • Cross-document references work
  • Corroboration counting possible
  • Conflict detection feasible
  • No manual de-duplication needed

6. Provenance Built-In

Every memory has:

  • memory_id (unique)
  • created_at (timestamp)
  • source (where it came from)
  • metadata.document_id, metadata.page, etc.

Why This Matters:

  • Trace every claim to source
  • Audit trail automatic
  • Evidence chain preserved
  • Compliance-ready

What They Still Need to Build

1. Conflict Detection Service

# Periodic job
@schedule.every(1).hours
async def detect_conflicts():
    # Query claims by subject+predicate
    # Find different values
    # Create ConflictSet nodes
    # Link claims to conflict sets
    pass

2. Corroboration Scoring Service

# Update scores based on source count
@schedule.every(6).hours
async def update_corroboration():
    # Count sources per claim
    # Compute confidence score
    # Update Claim.corroboration_score
    pass

3. Version History Service

# Track knowledge evolution
async def track_version_change(claim_id, new_version):
    # Create new KnowledgeVersion node
    # Link old version → new version (SUPERSEDES)
    # Update claims to reference new version
    pass

4. Inference Engine

# Apply logical rules
async def apply_inference_rules():
    # Transitive relationships
    # Contradiction detection
    # Support/refutation chains
    pass

5. Governance UI

# Visualize conflicts, history, evidence
# Allow manual conflict resolution
# Show corroboration strength
# Display version timelines

Concrete Example: Full Workflow

Step 1: Upload Documents

# Document 1 (v1.0 spec)
client.document.upload(
    file=open("api_spec_v1.pdf", "rb"),
    schema_id=osmosis_schema_id,
    metadata={"version": "v1.0", "authority": "official", "date": "2024-01-01"}
)

# Document 2 (v2.0 spec)
client.document.upload(
    file=open("api_spec_v2.pdf", "rb"),
    schema_id=osmosis_schema_id,
    metadata={"version": "v2.0", "authority": "official", "date": "2025-06-01"}
)

# Document 3 (blog post)
client.document.upload(
    file=open("blog_pricing.md", "rb"),
    schema_id=osmosis_schema_id,
    metadata={"version": "unknown", "authority": "unofficial", "date": "2025-12-01"}
)

Step 2: Extract Claims (Automatic)

Papr's extraction creates:

Claim: "Rate limit is 100/hour"
  ← EXTRACTED_FROM → Source: "api_spec_v1.pdf" (page 12)
  version: "v1.0"
  confidence: 0.95

Claim: "Rate limit is 1000/hour"
  ← EXTRACTED_FROM → Source: "api_spec_v2.pdf" (page 15)
  version: "v2.0"
  confidence: 0.98

Claim: "Rate limit is 1000/hour"
  ← EXTRACTED_FROM → Source: "blog_pricing.md" 
  version: "unknown"
  confidence: 0.75

Step 3: Conflict Detection (Simple Query)

# Query for all claims about "Rate limit"
query = """
query FindConflicts($subject: String!) {
  claims(where: { subject: $subject }) {
    object
    sources {
      document_id
      version
      authority
      date
    }
  }
}
"""

result = await client.graphql.query(query, {"subject": "Rate limit"})

# Analyze: different values = conflict
unique_values = set(claim['object'] for claim in result['claims'])

if len(unique_values) > 1:
    print(f"⚠️ CONFLICT: {unique_values}")
    # Returns: {"100/hour", "1000/hour"}
    
    for claim in result['claims']:
        print(f"  {claim['object']}: {len(claim['sources'])} sources")
    # Output:
    #   100/hour: 1 source (v1.0 spec)
    #   1000/hour: 2 sources (v2.0 spec + blog)

Step 4: Resolution Logic (Custom)

# Osmosis resolution strategy: "most recent official source wins"
async def resolve_conflict(conflict_set_id):
    # Query claims in conflict set
    query = """
    query ConflictClaims($conflict_id: ID!) {
      conflict_set(id: $conflict_id) {
        claims {
          id
          object
          sources {
            authority
            version
            date
          }
        }
      }
    }
    """
    
    result = await client.graphql.query(query, {"conflict_id": conflict_set_id})
    
    # Filter official sources
    official_claims = [c for c in result['claims'] 
                      if c['sources'][0]['authority'] == 'official']
    
    # Pick most recent
    winner = max(official_claims, key=lambda c: c['sources'][0]['date'])
    
    # Update conflict set
    await update_conflict_set(
        conflict_set_id,
        resolution_status="resolved",
        winner_claim_id=winner['id'],
        resolution_rule="most_recent_official"
    )

Step 5: Apply Resolution Rules (Simple Logic)

# From Step 3, we have:
# Claim 1: "100/hour" - 1 source (v1.0, official)
# Claim 2: "1000/hour" - 2 sources (v2.0 official + blog unofficial)

# Resolution strategy: "Most sources from official documents"
claims_analyzed = []
for claim in result['claims']:
    official_sources = [s for s in claim['sources'] if s['authority'] == 'official']
    claims_analyzed.append({
        'value': claim['object'],
        'total_sources': len(claim['sources']),
        'official_sources': len(official_sources),
        'most_recent': max(s['date'] for s in claim['sources'])
    })

# Apply scoring
def resolution_score(claim):
    return (
        claim['official_sources'] * 10 +  # Official sources weighted high
        claim['total_sources'] * 2 +       # Total sources matter
        recency_weight(claim['most_recent'])  # Recent is better
    )

winner = max(claims_analyzed, key=resolution_score)
print(f"✅ Resolved: {winner['value']}")
# Output: "1000/hour" (v2.0 is more recent + has official source)

Step 6: Version Tracking (Custom)

# Create version timeline
await create_version_node(
    subject="Rate limit",
    version="v1.0",
    value="100/hour",
    effective_from="2024-01-01",
    effective_until="2025-06-01"
)

await create_version_node(
    subject="Rate limit",
    version="v2.0",
    value="1000/hour",
    effective_from="2025-06-01",
    effective_until=None  # Current
)

await create_supersedes_relationship(
    old_version="v1.0",
    new_version="v2.0"
)

Step 7: Query History (GraphQL)

# What did we know about rate limits in March 2025?
query = """
query HistoricalKnowledge($subject: String!, $date: DateTime!) {
  knowledge_versions(where: {
    subject: $subject,
    effective_from: { $lte: $date },
    effective_until: { $gte: $date }
  }) {
    version
    value
    claims {
      statement
      sources {
        document_id
      }
    }
  }
}
"""

historical = await client.graphql.query(query, {
    "subject": "Rate limit",
    "date": "2025-03-01T00:00:00Z"
})

# Returns: "100/hour" (v1.0 was effective until June 2025)

Performance Considerations

What Papr Handles Efficiently ✅

  • Fast retrieval: <150ms (when cached)
  • Entity resolution: Automatic merging
  • Vector search: Semantic similarity
  • Graph traversal: Multi-hop relationships
  • Hybrid ranking: Keyword + semantic + graph

What Requires Optimization ⚠️

  • Conflict detection: Periodic batch jobs (don't run on every write)
  • Corroboration updates: Background service, not real-time
  • Version tracking: Compute on demand, cache results
  • Large-scale queries: Use pagination, limit result sets

Cost Estimation

Papr Cloud Costs

  • Storage: ~$0.02/GB/month
  • API calls: ~$0.001 per search
  • Document processing: ~$0.05 per document

For 10,000 documents:

  • Storage: ~500 documents/GB → 20 GB → $0.40/month
  • Processing: 10,000 docs × $0.05 → $500 one-time
  • Search: 1M queries/month × $0.001 → $1,000/month

Custom Logic Costs

  • Compute for services: ~$50-100/month (AWS Lambda/Cloud Run)
  • Database for governance state: ~$20/month (PostgreSQL)

Total: ~$1,100-1,200/month for production scale


Decision Framework

Use Papr If:

✅ Need knowledge graph + retrieval in one API
✅ Want entity resolution handled automatically
✅ Need hybrid search (semantic + keyword + graph)
✅ Schema flexibility is important
✅ GraphQL analytics valuable
✅ Prefer hosted/managed service

Build from Scratch If:

❌ Need built-in temporal graph (point-in-time snapshots)
❌ Require automatic inference engine
❌ Want conflict detection as core DB feature
❌ Need ACID transactions across graph operations
❌ Extreme performance requirements (microsecond latency)


Phase 1: Proof of Concept (2 weeks)

  1. Week 1: Design Osmosis schema in Papr

    • Define Claim, Source, ConflictSet node types
    • Define EXTRACTED_FROM, CONFLICTS_WITH relationships
    • Test schema creation API
  2. Week 2: Implement basic workflow

    • Upload 3 test documents with overlapping claims
    • Extract claims via Papr's auto mode
    • Write simple conflict detection script
    • Query results via GraphQL

Success Criteria: Can detect conflicting claims across 3 documents

Phase 2: Core Services (4 weeks)

  1. Week 3-4: Build governance services

    • Conflict detection (batch job)
    • Corroboration scoring
    • Basic resolution logic
  2. Week 5-6: Add version tracking

    • Model KnowledgeVersion nodes
    • Track SUPERSEDES relationships
    • Implement temporal queries

Success Criteria: Full workflow works end-to-end

Phase 3: Production Hardening (4 weeks)

  1. Week 7-8: Performance optimization

    • Batch operations
    • Caching strategies
    • Query optimization
  2. Week 9-10: UI + monitoring

    • Conflict visualization dashboard
    • Evidence strength display
    • Governance metrics

Success Criteria: Production-ready system


Final Recommendation

YES, use Papr as the foundation layer.

Reasoning:

  1. 90% of infrastructure handled (storage, retrieval, graph, entity resolution)
  2. Schema system enables custom modeling (claims, conflicts, versions)
  3. GraphQL provides analytics power (complex queries, aggregations)
  4. Saves 6-12 months of development vs building from scratch
  5. Custom logic is straightforward (conflict detection, scoring, versioning)

Trade-offs:

  • Must build governance logic (but would need this even with custom DB)
  • No built-in temporal graph (but can model versions explicitly)
  • No automatic inference (but can add via custom services)

Bottom Line: Papr provides the hard parts (graph database, retrieval, entity resolution). The governance layer on top is domain-specific logic they'd build regardless of underlying database choice.


Next Step: Schedule architecture review call to walk through schema design and confirm approach.