Osmosis Use Case Analysis: Papr Capability Assessment
Date: February 16, 2026
Developer: Osmosis Project
Use Case: Documentary knowledge governance with evidence-anchored claims
Executive Summary
Can the developer use Papr for their use case?
YES - with custom schema design, but they'll need to build governance logic on top.
Key Finding: Papr provides a strong foundation for the storage, retrieval, and relationship-tracking layer. However, the epistemological tracking (conflict detection, corroboration scoring, temporal versioning) requires custom application logic built on top of Papr's primitives.
Recommendation: Use Papr as the knowledge graph + retrieval engine, build Osmosis governance layer on top using custom schemas and GraphQL analytics.
Understanding the Osmosis System
Core Mission
Building a documentary knowledge governance system focused on:
- Evidence-anchored claims - Assertions tied to specific source evidence
- Cross-document validation - Detecting conflicts and corroboration
- Structured knowledge - Not just retrieval, but epistemological tracking
Philosophical Approach
This is fundamentally about knowledge provenance and confidence:
- What do we know?
- How confident are we?
- Where did it come from?
- Does it conflict with other sources?
- How has our understanding evolved over time?
The 5 Questions - Detailed Analysis
Question 1: Handling Conflicting Statements
"How does Papr handle situations where different documents assert conflicting statements or values?"
What Papr Provides ✅
Automatic Deduplication + Multi-Source Tracking:
When you define unique_identifiers in your schema:
schema = {
"Claim": {
"properties": {
"subject": {"type": "string"},
"predicate": {"type": "string"},
"object": {"type": "string"}
},
"unique_identifiers": ["subject", "predicate", "object"] # Triple-based dedup
}
}Papr automatically:
- Deduplicates claims with same (subject, predicate, object)
- Creates multiple EXTRACTED_FROM relationships to different sources
- Merges into ONE node instead of creating duplicates
Example - Same claim from 3 documents:
Document 1: "Product X costs $100"
Document 2: "Product X is priced at $100"
Document 3: "Product X: $100"
→ Papr creates ONE Claim node:
Claim {subject: "Product X", predicate: "costs", object: "$100"}
← EXTRACTED_FROM → Source: doc_1
← EXTRACTED_FROM → Source: doc_2
← EXTRACTED_FROM → Source: doc_3Conflict Detection - Built into GraphQL:
# Query automatically shows:
# 1. All claims for same subject (grouped by Papr's dedup)
# 2. Count of sources per claim (via EXTRACTED_FROM relationships)
# 3. Different values (different object values)
query = """
query ConflictDetection($subject: String!) {
claims(where: { subject: $subject }) {
object # The value
sources { # Auto-counted by Papr
document_id
version
date
}
}
}
"""
result = await client.graphql.query(query, {"subject": "Product X"})
# Returns:
# Claim 1: object="$100", sources=[doc_1, doc_2, doc_3] (3 sources)
# Claim 2: object="$150", sources=[doc_4] (1 source)
# → CONFLICT DETECTED: 2 different values for same subjectResolution Logic:
# Developer decides resolution strategy:
# - Count-based: "Pick claim with most sources" (3 > 1 → $100 wins)
# - Freshness: "Pick claim from most recent document"
# - Authority: "Pick claim from official source"
# - Combined: "Most sources from official docs"What They Can Build:
# Custom schema for Claims
schema = client.schemas.create(
name="Osmosis Claims Schema",
node_types={
"Claim": {
"properties": {
"statement": {"type": "string", "required": True},
"subject": {"type": "string"}, # "Product X"
"predicate": {"type": "string"}, # "costs"
"object": {"type": "string"}, # "$100"
"confidence": {"type": "float"},
"extracted_at": {"type": "datetime"}
},
"unique_identifiers": ["statement"]
},
"Source": {
"properties": {
"document_id": {"type": "string", "required": True},
"version": {"type": "string"},
"page": {"type": "integer"},
"section": {"type": "string"}
}
},
"ConflictSet": {
"properties": {
"subject": {"type": "string"},
"predicate": {"type": "string"},
"detected_at": {"type": "datetime"},
"resolution_status": {"type": "string", "enum_values": ["unresolved", "resolved", "ignored"]}
}
}
},
relationship_types={
"EXTRACTED_FROM": {
"allowed_source_types": ["Claim"],
"allowed_target_types": ["Source"]
},
"CONFLICTS_WITH": {
"allowed_source_types": ["Claim"],
"allowed_target_types": ["Claim"]
},
"MEMBER_OF": {
"allowed_source_types": ["Claim"],
"allowed_target_types": ["ConflictSet"]
}
}
)Conflict Detection Logic (Custom):
# 1. Extract claims from documents
client.memory.add(
content="Product X costs $100 according to pricing sheet",
memory_policy={
"mode": "auto",
"schema_id": osmosis_schema_id,
"node_constraints": [
{
"node_type": "Claim",
"set": {
"subject": "Product X",
"predicate": "costs",
"object": "$100",
"confidence": 0.95
}
}
]
}
)
# 2. Periodic conflict detection service
async def detect_conflicts():
# Query all claims about same subject+predicate
query = """
query FindPotentialConflicts($subject: String!, $predicate: String!) {
claims(where: {
subject: $subject,
predicate: $predicate
}) {
id
object
confidence
extracted_from {
document_id
version
}
}
}
"""
result = await client.graphql.query(query, {"subject": "Product X", "predicate": "costs"})
# Find claims with different objects
unique_values = set(claim['object'] for claim in result['claims'])
if len(unique_values) > 1:
# CONFLICT DETECTED
conflict_set_id = create_conflict_set()
for claim in result['claims']:
link_claim_to_conflict(claim['id'], conflict_set_id)Gap Analysis ⚠️
- ✅ Papr automatically deduplicates same claim across sources
- ✅ Papr tracks all sources per claim via EXTRACTED_FROM relationships
- ✅ GraphQL can count sources and compare different values
- ✅ Conflict identification is just: "same subject, different objects, >1 claim"
- ⚠️ Resolution strategy is custom logic (count-based, freshness, authority)
- ⚠️ ConflictSet nodes optional (for tracking resolution workflow)
Verdict: Papr handles 80% automatically (dedup, multi-source tracking, querying). Developer adds 20% (resolution rules, workflow tracking).
Question 2: Contextual Metadata vs. Explicit Claims
"Does the graph model distinguish between contextual metadata and explicit claims?"
What Papr Provides ✅
Two Ways to Add Metadata:
1. Memory-level metadata (about the source/context):
client.memory.add(
content="API rate limit is 1000 requests/hour",
metadata={
# Contextual metadata - NOT extracted, just stored
"source_type": "documentation",
"version": "v2.0",
"last_updated": "2026-01-15",
"authority": "official",
"document_section": "rate-limits"
},
memory_policy={"mode": "auto", "schema_id": osmosis_schema_id}
)2. Node property overrides (forced onto extracted nodes):
client.memory.add(
content="API rate limit is 1000 requests/hour",
memory_policy={
"mode": "auto", # Still let LLM extract the claim
"schema_id": osmosis_schema_id,
"node_constraints": [
{
"node_type": "Claim",
"set": {
# Metadata injected directly onto node
"document_version": "v2.0",
"source_authority": "official",
"extraction_date": "2026-01-15",
"reviewed": False,
# LLM extracts: subject, predicate, object
# Developer injects: version, authority, etc.
}
}
]
}
)Key Difference:
- Memory metadata: Stored on memory object, queryable via memory search
- Node properties: Stored on graph nodes, queryable via GraphQL, part of entity
- Auto-extracted: LLM extracts from content (subject, predicate, object)
- Property overrides: Developer forces specific values (version, authority, dates)
Schema-Level Distinction:
# They can model the distinction in their schema
node_types={
"Claim": { # First-class claim
"properties": {
"statement": {"type": "string"},
"confidence": {"type": "float"},
"claim_type": {"type": "string", "enum_values": ["fact", "opinion", "policy"]}
}
},
"SourceMetadata": { # Contextual information
"properties": {
"version": {"type": "string"},
"authority_level": {"type": "string"},
"publication_date": {"type": "datetime"}
}
}
}
relationship_types={
"HAS_METADATA": {
"allowed_source_types": ["Claim"],
"allowed_target_types": ["SourceMetadata"]
}
}Gap Analysis ⚠️
- ✅ Memory metadata: Contextual info about source
- ✅ Property overrides: Force metadata onto nodes
- ✅ Auto-extraction: LLM extracts claims from content
- ✅ Clear separation: metadata vs node properties vs extracted properties
- ✅ Flexible control: Choose what's auto-extracted vs injected
Verdict: Full control over metadata vs claims. Use memory.metadata for context, node_constraints.set for forced properties, auto mode for extracted claims.
Question 3: Versioned Knowledge / Temporal Evolution
"Is there a notion of versioned knowledge or temporal evolution across documents?"
What Papr Provides ✅
Timestamp Tracking:
- Every memory has
context.timestamp - Can filter by time ranges
- Time-weighted search available
Two Version Approaches:
Approach 1: Version as node property (simpler):
# Add version directly on Claim nodes
client.memory.add(
content="API v1.0 rate limit is 100/hour",
memory_policy={
"mode": "auto",
"schema_id": osmosis_schema_id,
"node_constraints": [
{
"node_type": "Claim",
"set": {
"version": "v1.0",
"effective_from": "2024-01-01",
"effective_until": "2025-06-01"
# LLM extracts: subject, predicate, object
}
}
]
}
)
# Query by version
query = """
query GetClaimsByVersion($subject: String!, $version: String!) {
claims(where: { subject: $subject, version: $version }) {
object
effective_from
effective_until
}
}
"""Approach 2: Separate Version nodes (more structured):
# Create explicit version chain
node_types={
"KnowledgeVersion": {
"properties": {
"subject": {"type": "string"},
"version": {"type": "string"},
"effective_from": {"type": "datetime"},
"effective_until": {"type": "datetime"}
}
}
}
relationship_types={
"SUPERSEDES": {
"allowed_source_types": ["KnowledgeVersion"],
"allowed_target_types": ["KnowledgeVersion"]
},
"VERSION_OF": {
"allowed_source_types": ["Claim"],
"allowed_target_types": ["KnowledgeVersion"]
}
}
relationship_types={
"SUPERSEDES": {
"allowed_source_types": ["KnowledgeVersion"],
"allowed_target_types": ["KnowledgeVersion"]
},
"VERSION_OF": {
"allowed_source_types": ["Claim"],
"allowed_target_types": ["KnowledgeVersion"]
}
}
# Store versioned claims
client.memory.add(
content="API v1.0 says rate limit is 100/hour",
memory_policy={
"mode": "auto",
"schema_id": osmosis_schema_id,
"node_constraints": [
{
"node_type": "Claim",
"set": {
"as_of_version": "v1.0",
"statement": "rate limit is 100/hour"
}
},
{
"node_type": "KnowledgeVersion",
"set": {
"version": "v1.0",
"effective_from": "2024-01-01T00:00:00Z",
"effective_until": "2025-06-01T00:00:00Z"
}
}
]
}
)
# Query version history
query = """
query GetVersionHistory($subject: String!) {
versions: knowledge_versions(
where: { subject: $subject }
order_by: { effective_from: ASC }
) {
version
effective_from
effective_until
claims {
statement
}
supersedes {
version
}
}
}
"""Temporal Queries:
# Point-in-time query (custom logic)
async def get_knowledge_at_time(subject: str, timestamp: datetime):
"""Get what we knew about subject at specific time."""
query = """
query PointInTime($subject: String!, $timestamp: DateTime!) {
claims(where: {
subject: $subject,
version: {
effective_from: { $lte: $timestamp },
effective_until: { $gte: $timestamp }
}
}) {
statement
version
sources {
document_id
}
}
}
"""
return await client.graphql.query(query, {
"subject": subject,
"timestamp": timestamp.isoformat()
})Gap Analysis ⚠️
- ✅ Timestamps on all memories (automatic)
- ✅ Can inject version via property overrides (node_constraints.set)
- ✅ Can model versions as explicit nodes (more structured)
- ✅ Can create version chains (SUPERSEDES relationships)
- ✅ Time-based filtering in queries
- ⚠️ Version semantics defined by developer (what version means)
- ⚠️ Point-in-time queries via effective_from/effective_until filters
Verdict: Temporal tracking is straightforward via property injection. Two options: (1) version property on claims, or (2) separate version nodes with relationships.
Question 4: Source Evidence vs. Inferred Relationships
"Are graph relationships strictly tied to source evidence, or can they be inferred dynamically?"
What Papr Provides ✅
Two Graph Generation Modes:
1. Auto Mode (Extraction-based):
# LLM extracts relationships from content
client.memory.add(
content="Jane Smith works at Acme Corp as CTO. Acme Corp is partnered with TechVentures.",
memory_policy={"mode": "auto", "schema_id": osmosis_schema_id}
)
# Papr automatically creates:
# - Jane Smith (Person node)
# - Acme Corp (Company node)
# - TechVentures (Company node)
# - WORKS_AT relationship (Jane -> Acme)
# - PARTNERED_WITH relationship (Acme -> TechVentures)Each extracted relationship is tied to the memory (source):
(Memory) --EXTRACTED--> (Relationship) --CONNECTS--> (Node)2. Manual Mode (Explicit):
# Developer specifies exact graph structure
client.memory.add(
content="Explicit claim linking",
memory_policy={
"mode": "manual",
"nodes": [
{"id": "claim_1", "type": "Claim", "properties": {"statement": "X"}},
{"id": "claim_2", "type": "Claim", "properties": {"statement": "Y"}}
],
"relationships": [
{
"source": "claim_1",
"target": "claim_2",
"type": "SUPPORTS",
"properties": {"confidence": 0.8, "inferred": False}
}
]
}
)Custom Inference Layer:
# Developer can build inference rules
async def infer_transitive_relationships():
"""Example: If A→B and B→C, infer A→C"""
# 1. Query existing relationships
query = """
query GetChains {
relationships(where: { type: "RELATED_TO" }) {
source { id }
target { id }
properties
}
}
"""
result = await client.graphql.query(query)
# 2. Find transitive patterns
chains = find_transitive_chains(result['relationships'])
# 3. Create inferred relationships
for chain in chains:
await client.memory.add(
content=f"Inferred relationship: {chain['source']} -> {chain['target']}",
memory_policy={
"mode": "manual",
"relationships": [
{
"source": chain['source'],
"target": chain['target'],
"type": "RELATED_TO",
"properties": {
"inferred": True,
"inference_rule": "transitive",
"confidence": 0.7
}
}
]
}
)Relationship Provenance:
# Can track whether relationship is extracted or inferred
relationship_types={
"SUPPORTS": {
"properties": {
"provenance_type": {
"type": "string",
"enum_values": ["extracted", "inferred", "manual"]
},
"extraction_source": {"type": "string"},
"inference_rule": {"type": "string"}
}
}
}Gap Analysis ⚠️
- ✅ Relationships are tied to source memories (provenance)
- ✅ Can mark relationships as extracted vs inferred
- ✅ Manual mode for explicit relationship creation
- ❌ No built-in inference engine
- ❌ No rule-based relationship derivation
- ❌ Transitive/logical inference requires custom code
Verdict: Relationships are primarily extracted, not inferred. Developer can build inference layer using GraphQL queries + manual relationship creation.
Question 5: Single-Source vs. Multi-Source Corroboration
"Do you differentiate between single-source extracted claims and multi-source corroborated knowledge?"
What Papr Provides ✅
Entity Resolution Across Sources:
# Document 1
client.memory.add(
content="Product X costs $100 (from pricing sheet)",
memory_policy={"mode": "auto", "schema_id": osmosis_schema_id}
)
# Document 2
client.memory.add(
content="Product X is priced at $100 (from sales deck)",
memory_policy={"mode": "auto", "schema_id": osmosis_schema_id}
)
# Document 3
client.memory.add(
content="Product X: $100 (from website)",
memory_policy={"mode": "auto", "schema_id": osmosis_schema_id}
)
# Papr's entity resolution merges "Product X" into single node
# Can query which memories reference itCorroboration is Built-In via Deduplication:
Papr automatically:
- Merges same claim from multiple sources → ONE node
- Creates EXTRACTED_FROM relationship for each source
- GraphQL can count these relationships
Simple Query to Get Corroboration:
# No custom service needed - just query!
query = """
query GetClaimCorroboration($subject: String!) {
claims(where: { subject: $subject }) {
object
statement
sources { # Papr automatically populates this
document_id
version
authority
date
}
}
}
"""
result = await client.graphql.query(query, {"subject": "Product X price"})
# Analyze results
for claim in result['data']['claims']:
source_count = len(claim['sources'])
official_count = sum(1 for s in claim['sources'] if s['authority'] == 'official')
print(f"Claim: {claim['object']}")
print(f" Supported by {source_count} sources ({official_count} official)")
# Apply resolution rules
if source_count >= 3 and official_count >= 2:
print(f" ✅ High confidence (3+ sources, 2+ official)")
elif source_count == 1:
print(f" ⚠️ Single source - needs verification")Optional: Store Computed Score:
# If you want to cache the score on the node
await client.memory.add(
content=f"Update corroboration score",
memory_policy={
"mode": "manual",
"node_constraints": [{
"node_type": "Claim",
"search": {"properties": [{"name": "id", "mode": "exact", "value": claim_id}]},
"set": {
"source_count": source_count,
"corroboration_score": source_count / 5.0
}
}]
}
)Key Point: Counting sources is a simple GraphQL query, not a complex background service. Papr's deduplication does the heavy lifting.
Gap Analysis ⚠️
- ✅ Entity merging across sources (automatic via unique_identifiers)
- ✅ Multi-source relationships (EXTRACTED_FROM to each source)
- ✅ GraphQL counts sources (via relationship traversal)
- ✅ Simple query shows source count and details
- ⚠️ Corroboration formula is developer's choice (count? weighted? freshness?)
- ⚠️ Caching score optional (can compute on-demand or store on node)
Verdict: Multi-source tracking is automatic. Counting sources is a simple GraphQL query. Developer only chooses scoring formula (if needed).
Recommended Architecture
┌─────────────────────────────────────────────────────────────┐
│ OSMOSIS APPLICATION LAYER │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Governance Services (Custom Logic) │ │
│ │ • Conflict Detection │ │
│ │ • Corroboration Scoring │ │
│ │ • Version History Management │ │
│ │ • Inference Engine (transitive, rule-based) │ │
│ │ • Evidence Strength Calculation │ │
│ └──────────────────────────────────────────────────────┘ │
│ ↕︎ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Osmosis Data Model (Custom Schema) │ │
│ │ • Claim nodes (statement, confidence, version) │ │
│ │ • Source nodes (document, page, authority) │ │
│ │ • ConflictSet nodes (subject, resolution_status) │ │
│ │ • KnowledgeVersion nodes (version, timespan) │ │
│ │ • EXTRACTED_FROM, CONFLICTS_WITH, SUPERSEDES │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
↕︎
┌─────────────────────────────────────────────────────────────┐
│ PAPR MEMORY API │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Knowledge Graph Storage │ │
│ │ • Neo4j graph database │ │
│ │ • Custom schema support │ │
│ │ • Node/relationship creation │ │
│ │ • Entity resolution & merging │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Hybrid Retrieval Engine │ │
│ │ • Vector embeddings (semantic search) │ │
│ │ • Keyword search (BM25) │ │
│ │ • Graph traversal (multi-hop) │ │
│ └──────────────────────────────────────────────────────┘ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ GraphQL Query Layer │ │
│ │ • Complex relationship queries │ │
│ │ • Aggregations & analytics │ │
│ │ • Multi-hop traversals │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘Why Can They Build This with Papr?
Core Strengths That Enable Osmosis
1. Custom Schema System ✅
# Full control over domain ontology
schema = client.schemas.create(
name="Osmosis",
node_types={
"Claim": {...},
"Source": {...},
"ConflictSet": {...},
"Evidence": {...}
},
relationship_types={
"EXTRACTED_FROM": {...},
"CONFLICTS_WITH": {...},
"SUPPORTS": {...},
"REFUTES": {...}
}
)Why This Matters:
- Define exactly what a "claim" is
- Model provenance relationships explicitly
- Track conflict resolution status
- Store confidence/corroboration scores
2. Node Constraints ✅
# Force properties onto nodes
memory_policy={
"node_constraints": [
{
"node_type": "Claim",
"set": {
"confidence": 0.95,
"extraction_method": "llm",
"extracted_at": "2026-02-16T10:00:00Z",
"document_version": "v2.1"
}
}
]
}Why This Matters:
- Every claim gets metadata automatically
- Consistent property injection
- Source attribution baked in
- Version tracking per claim
3. GraphQL Analytics ✅
# Complex queries for governance
query = """
query ConflictAnalysis($subject: String!) {
conflict_sets(where: { subject: $subject, resolution_status: "unresolved" }) {
subject
predicate
claims {
statement
confidence
sources {
document_id
authority_level
version
}
}
}
}
"""Why This Matters:
- Find all conflicts for a subject
- Compare source authority levels
- Identify version discrepancies
- Aggregate corroboration counts
4. Flexible Metadata ✅
# Store any custom fields
metadata={
"claim_type": "factual",
"extraction_confidence": 0.92,
"reviewed_by_human": False,
"conflict_status": "none",
"source_authority": "official",
"as_of_version": "v2.1"
}Why This Matters:
- Track governance workflow state
- Store review status
- Flag conflicts
- Version everything
5. Entity Resolution ✅
# Same entity mentioned in multiple documents → one node
# "Product X", "product x", "ProductX" → merged automaticallyWhy This Matters:
- Cross-document references work
- Corroboration counting possible
- Conflict detection feasible
- No manual de-duplication needed
6. Provenance Built-In ✅
Every memory has:
memory_id(unique)created_at(timestamp)source(where it came from)metadata.document_id,metadata.page, etc.
Why This Matters:
- Trace every claim to source
- Audit trail automatic
- Evidence chain preserved
- Compliance-ready
What They Still Need to Build
1. Conflict Detection Service
# Periodic job
@schedule.every(1).hours
async def detect_conflicts():
# Query claims by subject+predicate
# Find different values
# Create ConflictSet nodes
# Link claims to conflict sets
pass2. Corroboration Scoring Service
# Update scores based on source count
@schedule.every(6).hours
async def update_corroboration():
# Count sources per claim
# Compute confidence score
# Update Claim.corroboration_score
pass3. Version History Service
# Track knowledge evolution
async def track_version_change(claim_id, new_version):
# Create new KnowledgeVersion node
# Link old version → new version (SUPERSEDES)
# Update claims to reference new version
pass4. Inference Engine
# Apply logical rules
async def apply_inference_rules():
# Transitive relationships
# Contradiction detection
# Support/refutation chains
pass5. Governance UI
# Visualize conflicts, history, evidence
# Allow manual conflict resolution
# Show corroboration strength
# Display version timelinesConcrete Example: Full Workflow
Step 1: Upload Documents
# Document 1 (v1.0 spec)
client.document.upload(
file=open("api_spec_v1.pdf", "rb"),
schema_id=osmosis_schema_id,
metadata={"version": "v1.0", "authority": "official", "date": "2024-01-01"}
)
# Document 2 (v2.0 spec)
client.document.upload(
file=open("api_spec_v2.pdf", "rb"),
schema_id=osmosis_schema_id,
metadata={"version": "v2.0", "authority": "official", "date": "2025-06-01"}
)
# Document 3 (blog post)
client.document.upload(
file=open("blog_pricing.md", "rb"),
schema_id=osmosis_schema_id,
metadata={"version": "unknown", "authority": "unofficial", "date": "2025-12-01"}
)Step 2: Extract Claims (Automatic)
Papr's extraction creates:
Claim: "Rate limit is 100/hour"
← EXTRACTED_FROM → Source: "api_spec_v1.pdf" (page 12)
version: "v1.0"
confidence: 0.95
Claim: "Rate limit is 1000/hour"
← EXTRACTED_FROM → Source: "api_spec_v2.pdf" (page 15)
version: "v2.0"
confidence: 0.98
Claim: "Rate limit is 1000/hour"
← EXTRACTED_FROM → Source: "blog_pricing.md"
version: "unknown"
confidence: 0.75Step 3: Conflict Detection (Simple Query)
# Query for all claims about "Rate limit"
query = """
query FindConflicts($subject: String!) {
claims(where: { subject: $subject }) {
object
sources {
document_id
version
authority
date
}
}
}
"""
result = await client.graphql.query(query, {"subject": "Rate limit"})
# Analyze: different values = conflict
unique_values = set(claim['object'] for claim in result['claims'])
if len(unique_values) > 1:
print(f"⚠️ CONFLICT: {unique_values}")
# Returns: {"100/hour", "1000/hour"}
for claim in result['claims']:
print(f" {claim['object']}: {len(claim['sources'])} sources")
# Output:
# 100/hour: 1 source (v1.0 spec)
# 1000/hour: 2 sources (v2.0 spec + blog)Step 4: Resolution Logic (Custom)
# Osmosis resolution strategy: "most recent official source wins"
async def resolve_conflict(conflict_set_id):
# Query claims in conflict set
query = """
query ConflictClaims($conflict_id: ID!) {
conflict_set(id: $conflict_id) {
claims {
id
object
sources {
authority
version
date
}
}
}
}
"""
result = await client.graphql.query(query, {"conflict_id": conflict_set_id})
# Filter official sources
official_claims = [c for c in result['claims']
if c['sources'][0]['authority'] == 'official']
# Pick most recent
winner = max(official_claims, key=lambda c: c['sources'][0]['date'])
# Update conflict set
await update_conflict_set(
conflict_set_id,
resolution_status="resolved",
winner_claim_id=winner['id'],
resolution_rule="most_recent_official"
)Step 5: Apply Resolution Rules (Simple Logic)
# From Step 3, we have:
# Claim 1: "100/hour" - 1 source (v1.0, official)
# Claim 2: "1000/hour" - 2 sources (v2.0 official + blog unofficial)
# Resolution strategy: "Most sources from official documents"
claims_analyzed = []
for claim in result['claims']:
official_sources = [s for s in claim['sources'] if s['authority'] == 'official']
claims_analyzed.append({
'value': claim['object'],
'total_sources': len(claim['sources']),
'official_sources': len(official_sources),
'most_recent': max(s['date'] for s in claim['sources'])
})
# Apply scoring
def resolution_score(claim):
return (
claim['official_sources'] * 10 + # Official sources weighted high
claim['total_sources'] * 2 + # Total sources matter
recency_weight(claim['most_recent']) # Recent is better
)
winner = max(claims_analyzed, key=resolution_score)
print(f"✅ Resolved: {winner['value']}")
# Output: "1000/hour" (v2.0 is more recent + has official source)Step 6: Version Tracking (Custom)
# Create version timeline
await create_version_node(
subject="Rate limit",
version="v1.0",
value="100/hour",
effective_from="2024-01-01",
effective_until="2025-06-01"
)
await create_version_node(
subject="Rate limit",
version="v2.0",
value="1000/hour",
effective_from="2025-06-01",
effective_until=None # Current
)
await create_supersedes_relationship(
old_version="v1.0",
new_version="v2.0"
)Step 7: Query History (GraphQL)
# What did we know about rate limits in March 2025?
query = """
query HistoricalKnowledge($subject: String!, $date: DateTime!) {
knowledge_versions(where: {
subject: $subject,
effective_from: { $lte: $date },
effective_until: { $gte: $date }
}) {
version
value
claims {
statement
sources {
document_id
}
}
}
}
"""
historical = await client.graphql.query(query, {
"subject": "Rate limit",
"date": "2025-03-01T00:00:00Z"
})
# Returns: "100/hour" (v1.0 was effective until June 2025)Performance Considerations
What Papr Handles Efficiently ✅
- Fast retrieval: <150ms (when cached)
- Entity resolution: Automatic merging
- Vector search: Semantic similarity
- Graph traversal: Multi-hop relationships
- Hybrid ranking: Keyword + semantic + graph
What Requires Optimization ⚠️
- Conflict detection: Periodic batch jobs (don't run on every write)
- Corroboration updates: Background service, not real-time
- Version tracking: Compute on demand, cache results
- Large-scale queries: Use pagination, limit result sets
Cost Estimation
Papr Cloud Costs
- Storage: ~$0.02/GB/month
- API calls: ~$0.001 per search
- Document processing: ~$0.05 per document
For 10,000 documents:
- Storage: ~500 documents/GB → 20 GB → $0.40/month
- Processing: 10,000 docs × $0.05 → $500 one-time
- Search: 1M queries/month × $0.001 → $1,000/month
Custom Logic Costs
- Compute for services: ~$50-100/month (AWS Lambda/Cloud Run)
- Database for governance state: ~$20/month (PostgreSQL)
Total: ~$1,100-1,200/month for production scale
Decision Framework
Use Papr If:
✅ Need knowledge graph + retrieval in one API
✅ Want entity resolution handled automatically
✅ Need hybrid search (semantic + keyword + graph)
✅ Schema flexibility is important
✅ GraphQL analytics valuable
✅ Prefer hosted/managed service
Build from Scratch If:
❌ Need built-in temporal graph (point-in-time snapshots)
❌ Require automatic inference engine
❌ Want conflict detection as core DB feature
❌ Need ACID transactions across graph operations
❌ Extreme performance requirements (microsecond latency)
Recommended Next Steps
Phase 1: Proof of Concept (2 weeks)
Week 1: Design Osmosis schema in Papr
- Define Claim, Source, ConflictSet node types
- Define EXTRACTED_FROM, CONFLICTS_WITH relationships
- Test schema creation API
Week 2: Implement basic workflow
- Upload 3 test documents with overlapping claims
- Extract claims via Papr's auto mode
- Write simple conflict detection script
- Query results via GraphQL
Success Criteria: Can detect conflicting claims across 3 documents
Phase 2: Core Services (4 weeks)
Week 3-4: Build governance services
- Conflict detection (batch job)
- Corroboration scoring
- Basic resolution logic
Week 5-6: Add version tracking
- Model KnowledgeVersion nodes
- Track SUPERSEDES relationships
- Implement temporal queries
Success Criteria: Full workflow works end-to-end
Phase 3: Production Hardening (4 weeks)
Week 7-8: Performance optimization
- Batch operations
- Caching strategies
- Query optimization
Week 9-10: UI + monitoring
- Conflict visualization dashboard
- Evidence strength display
- Governance metrics
Success Criteria: Production-ready system
Final Recommendation
YES, use Papr as the foundation layer.
Reasoning:
- 90% of infrastructure handled (storage, retrieval, graph, entity resolution)
- Schema system enables custom modeling (claims, conflicts, versions)
- GraphQL provides analytics power (complex queries, aggregations)
- Saves 6-12 months of development vs building from scratch
- Custom logic is straightforward (conflict detection, scoring, versioning)
Trade-offs:
- Must build governance logic (but would need this even with custom DB)
- No built-in temporal graph (but can model versions explicitly)
- No automatic inference (but can add via custom services)
Bottom Line: Papr provides the hard parts (graph database, retrieval, entity resolution). The governance layer on top is domain-specific logic they'd build regardless of underlying database choice.
Next Step: Schedule architecture review call to walk through schema design and confirm approach.