Last updated

Email Response: Osmosis + Papr Integration Analysis


To: Osmosis Developer

Thanks for the detailed questions! I've done a deep analysis of how Papr aligns with your Osmosis requirements. Here's what I found:


TL;DR: Yes, Papr can be your foundation layer

What Papr handles automatically:

  • ✅ Knowledge graph storage with custom schemas
  • Deduplication (same claim from 3 docs → 1 node with 3 sources)
  • Multi-source tracking (EXTRACTED_FROM relationships)
  • Source counting (GraphQL traverses relationships)
  • ✅ Provenance tracking (every claim → source)
  • ✅ Hybrid retrieval (semantic + keyword + graph)
  • ✅ Property injection (version, authority, dates)

What you'd build on top:

  • Resolution rules (count > 3? freshness? authority?)
  • Inference engine (transitive rules, logical derivation)
  • Optional: Workflow tracking (ConflictSet nodes, review status)

Bottom line: Papr gives you 90% of the infrastructure. The epistemological governance layer is custom logic you'd build regardless of underlying database.


Your 5 Questions - Direct Answers

1. Conflicting Statements

Papr provides automatically:

  • Deduplication via unique_identifiers: Same (subject, predicate, object) → ONE node
  • Multi-source tracking: Multiple EXTRACTED_FROM relationships per claim
  • Built-in source counting: GraphQL returns all sources per claim

Example:

# Define deduplication in schema
schema = {
    "Claim": {
        "properties": {"subject": "...", "predicate": "...", "object": "..."},
        "unique_identifiers": ["subject", "predicate", "object"]  # Dedup key
    }
}

# Upload 3 documents saying "Product X costs $100"
# → Papr creates ONE Claim node with 3 EXTRACTED_FROM relationships

# Upload 1 document saying "Product X costs $150"  
# → Papr creates DIFFERENT Claim node (different object value)

# Query for conflicts (built into GraphQL)
query = """
query FindConflicts($subject: String!) {
  claims(where: { subject: $subject }) {
    object  # The value
    sources { document_id, version, date }  # Auto-counted!
  }
}
"""

# Returns:
# Claim 1: object="$100", sources=[doc1, doc2, doc3] (3 sources)
# Claim 2: object="$150", sources=[doc4] (1 source)
# → CONFLICT: same subject, different objects

You decide resolution:

  • Count-based: "Pick claim with most sources" (3 > 1)
  • Freshness: "Pick most recent document"
  • Authority: "Pick official source over unofficial"
  • Combined: "Most sources from official docs"

2. Contextual Metadata vs. Claims

Papr provides two mechanisms:

1. Memory metadata (contextual info about source):

client.memory.add(
    content="API rate limit is 1000/hour",
    metadata={
        "version": "v2.0",
        "authority": "official",
        "document_section": "rate-limits"
    }
)

2. Property overrides (forced onto extracted nodes):

client.memory.add(
    content="API rate limit is 1000/hour",
    memory_policy={
        "mode": "auto",  # LLM extracts claim
        "node_constraints": [{
            "node_type": "Claim",
            "set": {
                "version": "v2.0",  # Injected metadata
                "authority": "official",  # Injected metadata
                # LLM extracts: subject, predicate, object
            }
        }]
    }
)

Answer: Full control. Use memory.metadata for context, node_constraints.set to inject properties onto nodes, mode: auto for extracted claims.


3. Versioned Knowledge / Temporal Evolution

Papr provides:

  • Timestamps on all memories (automatic)
  • Property overrides to inject version info

Two approaches:

Option 1: Version as property (simpler):

client.memory.add(
    content="API v1.0 rate limit is 100/hour",
    memory_policy={
        "mode": "auto",
        "node_constraints": [{
            "node_type": "Claim",
            "set": {
                "version": "v1.0",
                "effective_from": "2024-01-01",
                "effective_until": "2025-06-01"
            }
        }]
    }
)

# Query by version
query = "claims(where: { subject: '...', version: 'v1.0' })"

Option 2: Separate version nodes (more structured):

# Create explicit KnowledgeVersion nodes
# Link via SUPERSEDES relationships
# Query version chains

Answer: Inject version via property overrides, OR create separate version nodes. Point-in-time queries via GraphQL filters.


4. Source Evidence vs. Inferred Relationships

Papr provides:

  • Two modes:
    • Auto mode: LLM extracts relationships from content (tied to source memory)
    • Manual mode: You specify exact graph structure
  • All extracted relationships link back to source

You build:

  • Inference engine queries existing relationships
  • Creates new relationships with inferred: true flag
  • Tracks inference rules (transitive, contradiction, etc.)

Example:

# Papr extracts: A --RELATED_TO--> B, B --RELATED_TO--> C
# Your inference engine creates: A --RELATED_TO--> C
# With properties: {inferred: true, rule: "transitive"}

Answer: Relationships are primarily extracted, not inferred. You can build inference layer using GraphQL + manual relationship creation.


5. Single-Source vs. Multi-Source Corroboration

Papr provides automatically:

  • Deduplication merges same claim from multiple sources → ONE node
  • Multiple EXTRACTED_FROM relationships (one per source)
  • GraphQL traverses relationships and counts sources

Simple query:

query = """
query GetCorroboration($subject: String!) {
  claims(where: { subject: $subject }) {
    object
    sources {  # Papr populates this automatically
      document_id
      authority
    }
  }
}
"""

result = await client.graphql.query(query, {"subject": "rate limit"})

for claim in result['data']['claims']:
    source_count = len(claim['sources'])
    official_count = sum(1 for s in claim['sources'] if s['authority'] == 'official')
    
    # Apply your resolution rule
    if source_count >= 3 and official_count >= 2:
        # High confidence
    elif source_count == 1:
        # Single source - needs review

Answer: Counting sources is a simple GraphQL query. Papr's deduplication handles multi-source tracking. You just decide the scoring formula (count? weighted? freshness?).


Concrete Schema Example for Osmosis

from papr_memory import Papr

client = Papr(x_api_key="...")

# Your Osmosis schema
schema = client.schemas.create(
    name="Osmosis Claims Schema",
    node_types={
        "Claim": {
            "properties": {
                "statement": {"type": "string", "required": True},
                "subject": {"type": "string"},
                "predicate": {"type": "string"},
                "object": {"type": "string"},
                "confidence": {"type": "float"},
                "source_count": {"type": "integer"},
                "corroboration_score": {"type": "float"},
                "version": {"type": "string"}
            }
        },
        "Source": {
            "properties": {
                "document_id": {"type": "string", "required": True},
                "version": {"type": "string"},
                "page": {"type": "integer"},
                "authority": {"type": "string", "enum_values": ["official", "unofficial"]},
                "publication_date": {"type": "datetime"}
            }
        },
        "ConflictSet": {
            "properties": {
                "subject": {"type": "string"},
                "predicate": {"type": "string"},
                "resolution_status": {"type": "string", "enum_values": ["unresolved", "resolved", "ignored"]},
                "resolution_rule": {"type": "string"}
            }
        },
        "KnowledgeVersion": {
            "properties": {
                "version": {"type": "string"},
                "effective_from": {"type": "datetime"},
                "effective_until": {"type": "datetime"},
                "superseded": {"type": "boolean"}
            }
        }
    },
    relationship_types={
        "EXTRACTED_FROM": {
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["Source"]
        },
        "CONFLICTS_WITH": {
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["Claim"]
        },
        "MEMBER_OF": {
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["ConflictSet"]
        },
        "SUPERSEDES": {
            "allowed_source_types": ["KnowledgeVersion"],
            "allowed_target_types": ["KnowledgeVersion"]
        },
        "VERSION_OF": {
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["KnowledgeVersion"]
        }
    }
)

# Upload document with extraction
client.document.upload(
    file=open("api_spec_v2.pdf", "rb"),
    schema_id=schema.data.id,
    metadata={"version": "v2.0", "authority": "official"}
)

# Query conflicts
conflicts = client.graphql.query("""
    query GetConflicts($subject: String!) {
      conflict_sets(where: { 
        subject: $subject, 
        resolution_status: "unresolved" 
      }) {
        subject
        predicate
        claims {
          statement
          confidence
          sources {
            document_id
            authority
            version
          }
        }
      }
    }
""", {"subject": "rate limit"})

Architecture Overview

┌─────────────────────────────────────────┐
│      OSMOSIS APPLICATION LAYER          │
│  • Conflict detection service           │
│  • Corroboration scoring                │
│  • Version history tracking             │
│  • Inference engine                     │
│  • Resolution rules                     │
└─────────────────────────────────────────┘

┌─────────────────────────────────────────┐
│         PAPR MEMORY API                 │
│  • Knowledge graph (Neo4j)              │
│  • Custom schema support                │
│  • Entity resolution                    │
│  • Hybrid retrieval                     │
│  • GraphQL analytics                    │
│  • Provenance tracking                  │
└─────────────────────────────────────────┘

Papr handles: Storage, retrieval, entity merging, graph queries
You handle: Governance logic, conflict detection, scoring, versioning


Why This Works

1. Schema Flexibility

You define exactly what a "claim" is, what properties it has, how it relates to sources.

2. Node Constraints

Force metadata onto every extracted claim:

memory_policy={
    "node_constraints": [{
        "node_type": "Claim",
        "set": {
            "confidence": 0.95,
            "extracted_at": "2026-02-16T10:00:00Z",
            "version": "v2.1"
        }
    }]
}

3. GraphQL Power

Complex queries for governance:

query ConflictAnalysis($subject: String!) {
  conflict_sets(where: { subject: $subject }) {
    claims {
      statement
      sources {
        authority
        version
      }
    }
  }
}

4. Provenance Built-In

Every memory has source, created_at, metadata.document_id. Trace every claim back to evidence.

5. Entity Resolution

"Product X" mentioned in 5 documents → one node. Corroboration counting works automatically.


What You Still Build

Resolution Rules

# Simple logic - no complex service needed
async def resolve_conflict(claims):
    # Count-based: Pick claim with most sources
    # Freshness-based: Pick most recent
    # Authority-based: Pick official over unofficial
    # Combined: Your formula
    
    winner = max(claims, key=lambda c: 
        len(c['sources']) * 0.5 +  # Source count
        sum(1 for s in c['sources'] if s['authority'] == 'official') * 0.3 +  # Authority
        recency_weight(c['sources']) * 0.2  # Freshness
    )

Version Tracking

async def track_version(claim_id, new_version):
    # Create KnowledgeVersion nodes
    # Link SUPERSEDES relationships

Inference Engine

async def apply_inference_rules():
    # Transitive: A→B, B→C ⇒ A→C
    # Contradiction: A→B, A→¬B ⇒ conflict

Time & Cost Estimates

Development Timeline

  • Phase 1 (2 weeks): POC with 3 test documents
  • Phase 2 (4 weeks): Core governance services
  • Phase 3 (4 weeks): Production hardening

Total: ~10 weeks to production (vs 6-12 months building from scratch)

Cost (10,000 documents)

  • Papr Cloud: ~$1,100/month (storage + API calls)
  • Custom services: ~$70/month (compute + database)
  • Total: ~$1,200/month

  1. Week 1: Quick prototype

    • Create Osmosis schema in Papr
    • Upload 3 test documents
    • Extract claims (auto mode)
    • Query via GraphQL
  2. Week 2: Validate approach

    • Write conflict detection script
    • Test corroboration counting
    • Verify provenance tracking
  3. Decision point: If POC works, commit to Papr as foundation


My Assessment

Papr is positioned as both:

  • Retrieval & memory layer (your threshold requirement)
  • Knowledge graph foundation for governance (90% of infrastructure)

What Papr is NOT:

  • ❌ Complete governance system out-of-box
  • ❌ Built-in conflict detection
  • ❌ Automatic temporal snapshots

But you'd need custom governance logic regardless of database choice. The question is whether you want to build:

  • Graph database + entity resolution + retrieval + schema system + provenance (Papr does this), OR
  • Just governance logic on top of existing foundation (Papr route)

Recommendation: Use Papr. Saves 6-12 months. Custom schema system enables exactly what you need.


Questions for You

  1. Scale: How many documents? How many cross-references?
  2. Latency: Real-time conflict detection, or batch jobs acceptable?
  3. Inference: How complex? (Simple transitive, or full first-order logic?)
  4. Version granularity: Document-level or statement-level versioning?
  5. POC timeline: Can you allocate 2 weeks for proof-of-concept?

Let's Talk

I've prepared a full technical analysis (50+ pages) covering:

  • Detailed answers to each question with code examples
  • Complete schema design for Osmosis
  • Full workflow (upload → extract → detect → resolve → query)
  • Performance considerations & optimization strategies
  • Decision framework & trade-off analysis

Ready to dive deeper? Let's schedule a call to walk through:

  1. Schema design review
  2. Governance services architecture
  3. Integration patterns
  4. Performance optimization

Book time here: [calendly link]

Or reply with questions and I'll follow up.


Best regards,
Amir
Papr Team

P.S. The fact that you're thinking about provenance, corroboration, and versioning tells me you're building something sophisticated. Papr's schema system was designed exactly for this kind of domain-specific knowledge modeling. Happy to help you nail the design.