Why Developers CAN Build Osmosis-Style Systems with Papr

Question: Why can't a developer just create a schema with a "Claim" node type that tracks claims and implements the governance features Osmosis needs?

Answer: They absolutely can. That's exactly what Papr's custom schema system is designed for.

The Key Insight

The developer's questions suggest they're thinking about what they need (conflict detection, versioning, corroboration) when they should first consider where to store the data that enables those features.

Papr provides the "where":

Knowledge graph for storing claims and relationships
Custom schemas for defining domain models
Node constraints for property injection
GraphQL for complex queries

The developer builds the "what":

Logic that uses those queries to detect conflicts
Services that compute corroboration scores
Rules that resolve contradictions

Concrete Example: Why It Works

1. Create Custom Schema

from papr_memory import Papr

client = Papr(x_api_key="your_api_key")

# Define EXACTLY what a Claim is in your domain
schema = client.schemas.create(
    name="Documentary Claims Tracking",
    description="Schema for evidence-based documentary knowledge",
    
    node_types={
        # First-class Claim node type
        "Claim": {
            "name": "Claim",
            "label": "Claim",
            "description": "A factual assertion extracted from sources",
            "properties": {
                # Core claim components (RDF-style triple)
                "subject": {
                    "type": "string",
                    "required": True,
                    "description": "What the claim is about"
                },
                "predicate": {
                    "type": "string", 
                    "required": True,
                    "description": "The relationship or property"
                },
                "object": {
                    "type": "string",
                    "required": True,
                    "description": "The value or target"
                },
                
                # Full statement for readability
                "statement": {
                    "type": "string",
                    "required": True,
                    "description": "Human-readable claim text"
                },
                
                # Confidence metrics
                "extraction_confidence": {
                    "type": "float",
                    "default": 0.0,
                    "description": "LLM confidence in extraction (0-1)"
                },
                "source_count": {
                    "type": "integer",
                    "default": 1,
                    "description": "Number of sources supporting this claim"
                },
                "corroboration_score": {
                    "type": "float",
                    "default": 0.0,
                    "description": "Multi-source corroboration strength (0-1)"
                },
                
                # Provenance
                "extracted_from_document": {
                    "type": "string",
                    "description": "Source document ID"
                },
                "extracted_from_page": {
                    "type": "integer",
                    "description": "Source page number"
                },
                "extracted_at": {
                    "type": "datetime",
                    "description": "When extraction occurred"
                },
                
                # Versioning
                "as_of_version": {
                    "type": "string",
                    "description": "Document/API version this claim comes from"
                },
                "temporal_scope": {
                    "type": "string",
                    "enum_values": ["historical", "current", "future"],
                    "description": "Temporal validity of claim"
                },
                
                # Governance
                "conflict_status": {
                    "type": "string",
                    "enum_values": ["none", "potential", "confirmed", "resolved"],
                    "default": "none"
                },
                "reviewed": {
                    "type": "boolean",
                    "default": False,
                    "description": "Human review status"
                },
                "authoritative": {
                    "type": "boolean",
                    "default": False,
                    "description": "Marked as authoritative/canonical"
                }
            },
            "required_properties": ["subject", "predicate", "object", "statement"],
            "unique_identifiers": ["subject", "predicate", "object"]  # Dedupe on triple
        },
        
        # Source tracking
        "DocumentSource": {
            "name": "DocumentSource",
            "label": "Document Source",
            "properties": {
                "document_id": {"type": "string", "required": True},
                "document_title": {"type": "string"},
                "version": {"type": "string"},
                "publication_date": {"type": "datetime"},
                "authority_level": {
                    "type": "string",
                    "enum_values": ["official", "draft", "unofficial", "third_party"]
                },
                "document_type": {
                    "type": "string",
                    "enum_values": ["specification", "contract", "report", "article", "email"]
                }
            },
            "unique_identifiers": ["document_id"]
        },
        
        # Conflict tracking
        "ConflictSet": {
            "name": "ConflictSet",
            "label": "Conflict Set",
            "description": "Group of claims that conflict with each other",
            "properties": {
                "subject": {"type": "string", "required": True},
                "predicate": {"type": "string", "required": True},
                "detected_at": {"type": "datetime"},
                "resolution_status": {
                    "type": "string",
                    "enum_values": ["unresolved", "resolved", "acknowledged", "ignored"]
                },
                "resolution_method": {
                    "type": "string",
                    "description": "How conflict was resolved (e.g., 'most_recent', 'highest_authority')"
                },
                "resolved_at": {"type": "datetime"},
                "resolved_by": {"type": "string"},
                "notes": {"type": "string"}
            }
        },
        
        # Version tracking
        "KnowledgeVersion": {
            "name": "KnowledgeVersion",
            "label": "Knowledge Version",
            "description": "Temporal version of knowledge about a subject",
            "properties": {
                "subject": {"type": "string", "required": True},
                "version": {"type": "string", "required": True},
                "effective_from": {"type": "datetime"},
                "effective_until": {"type": "datetime"},
                "superseded": {"type": "boolean", "default": False}
            }
        }
    },
    
    relationship_types={
        # Provenance relationships
        "EXTRACTED_FROM": {
            "name": "EXTRACTED_FROM",
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["DocumentSource"],
            "properties": {
                "page_number": {"type": "integer"},
                "section": {"type": "string"},
                "extraction_method": {
                    "type": "string",
                    "enum_values": ["llm", "manual", "structured"]
                }
            }
        },
        
        # Conflict relationships
        "CONFLICTS_WITH": {
            "name": "CONFLICTS_WITH",
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["Claim"],
            "properties": {
                "conflict_type": {
                    "type": "string",
                    "enum_values": ["direct_contradiction", "incompatible_values", "temporal_inconsistency"]
                },
                "detected_at": {"type": "datetime"}
            }
        },
        
        "MEMBER_OF": {
            "name": "MEMBER_OF",
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["ConflictSet"]
        },
        
        # Support relationships
        "SUPPORTS": {
            "name": "SUPPORTS",
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["Claim"],
            "properties": {
                "support_type": {
                    "type": "string",
                    "enum_values": ["corroborates", "provides_evidence", "logical_entailment"]
                },
                "confidence": {"type": "float"}
            }
        },
        
        "REFUTES": {
            "name": "REFUTES",
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["Claim"]
        },
        
        # Version relationships
        "VERSION_OF": {
            "name": "VERSION_OF",
            "allowed_source_types": ["Claim"],
            "allowed_target_types": ["KnowledgeVersion"]
        },
        
        "SUPERSEDES": {
            "name": "SUPERSEDES",
            "allowed_source_types": ["KnowledgeVersion"],
            "allowed_target_types": ["KnowledgeVersion"],
            "properties": {
                "change_type": {
                    "type": "string",
                    "enum_values": ["correction", "update", "deprecation", "clarification"]
                }
            }
        }
    }
)

print(f"Schema created: {schema.data.id}")

2. Extract Claims from Documents

# Upload document - Papr extracts claims automatically
response = client.document.upload(
    file=open("api_specification_v2.pdf", "rb"),
    schema_id=schema.data.id,
    metadata={
        "version": "v2.0",
        "authority": "official",
        "document_type": "specification"
    }
)

# OR add claim explicitly with full control
response = client.memory.add(
    content="The API rate limit is 1000 requests per hour according to section 3.2 of the v2.0 specification",
    memory_policy={
        "mode": "auto",  # Let LLM extract entities
        "schema_id": schema.data.id,
        
        # Force specific properties on extracted Claim node
        "node_constraints": [
            {
                "node_type": "Claim",
                "set": {
                    "subject": "API rate limit",
                    "predicate": "is",
                    "object": "1000 requests per hour",
                    "statement": "The API rate limit is 1000 requests per hour",
                    "as_of_version": "v2.0",
                    "extracted_from_document": "api_spec_v2",
                    "extracted_from_page": 12,
                    "extracted_at": datetime.now().isoformat(),
                    "extraction_confidence": 0.95,
                    "temporal_scope": "current"
                }
            },
            {
                "node_type": "DocumentSource",
                "set": {
                    "document_id": "api_spec_v2",
                    "version": "v2.0",
                    "authority_level": "official",
                    "document_type": "specification"
                }
            }
        ]
    }
)

What just happened:

Claim node created with ALL the properties you need for governance
DocumentSource node created (or linked if exists)
EXTRACTED_FROM relationship automatically created
Full provenance chain established

3. Detect Conflicts with GraphQL

# Query all claims about same subject+predicate
conflict_query = """
query FindPotentialConflicts($subject: String!, $predicate: String!) {
  claims(where: {
    subject: $subject,
    predicate: $predicate
  }) {
    id
    statement
    object
    as_of_version
    extraction_confidence
    extracted_from {
      document_id
      version
      authority_level
    }
  }
}
"""

# Execute query
result = await client.graphql.query(
    query=conflict_query,
    variables={
        "subject": "API rate limit",
        "predicate": "is"
    }
)

# Analyze results
claims = result['data']['claims']
unique_values = set(claim['object'] for claim in claims)

if len(unique_values) > 1:
    print(f"⚠️ CONFLICT DETECTED: {len(unique_values)} different values")
    for claim in claims:
        print(f"  - '{claim['object']}' from {claim['extracted_from']['document_id']} (v{claim['extracted_from']['version']})")
    
    # Create conflict set
    conflict_response = await client.memory.add(
        content=f"Conflict detected for '{subject}' {predicate}",
        memory_policy={
            "mode": "manual",
            "nodes": [
                {
                    "id": "conflict_set_1",
                    "type": "ConflictSet",
                    "properties": {
                        "subject": subject,
                        "predicate": predicate,
                        "detected_at": datetime.now().isoformat(),
                        "resolution_status": "unresolved"
                    }
                }
            ],
            "relationships": [
                {
                    "source": claim['id'],
                    "target": "conflict_set_1",
                    "type": "MEMBER_OF"
                }
                for claim in claims
            ]
        }
    )

Why this works:

Your schema defines EXACTLY what properties each Claim has
GraphQL lets you query by those properties
You can group by (subject, predicate) to find conflicts
ConflictSet nodes track resolution status

4. Compute Corroboration Scores

# Service that runs periodically
async def update_corroboration_scores():
    """Count how many sources support each claim."""
    
    # Get all claims with their sources
    query = """
    query GetClaimSources {
      claims {
        id
        subject
        predicate
        object
        extracted_from {
          document_id
          authority_level
        }
      }
    }
    """
    
    result = await client.graphql.query(query)
    
    # Group claims by (subject, predicate, object) triple
    claim_groups = {}
    for claim in result['data']['claims']:
        key = (claim['subject'], claim['predicate'], claim['object'])
        if key not in claim_groups:
            claim_groups[key] = []
        claim_groups[key].append(claim)
    
    # Update each claim's corroboration score
    for (subject, predicate, obj), claims in claim_groups.items():
        source_count = len(set(c['extracted_from']['document_id'] for c in claims))
        
        # Weight by authority level
        official_sources = sum(1 for c in claims if c['extracted_from']['authority_level'] == 'official')
        
        # Compute score (example formula)
        corroboration_score = min((source_count * 0.2) + (official_sources * 0.3), 1.0)
        
        # Update all claims in this group
        for claim in claims:
            await client.memory.add(
                content=f"Corroboration update for claim {claim['id']}",
                memory_policy={
                    "mode": "manual",
                    "node_constraints": [
                        {
                            "node_type": "Claim",
                            "search": {
                                "properties": [
                                    {"name": "id", "mode": "exact", "value": claim['id']}
                                ]
                            },
                            "set": {
                                "source_count": source_count,
                                "corroboration_score": corroboration_score
                            }
                        }
                    ]
                }
            )
        
        print(f"✅ Updated corroboration for '{subject} {predicate} {obj}': {source_count} sources, score={corroboration_score}")

Why this works:

You stored source_count and corroboration_score in your Claim schema
GraphQL lets you query all claims with their sources
You compute the score using whatever formula makes sense
Node constraints let you update specific claims by ID

5. Track Version History

# Create version nodes as knowledge evolves
async def track_version_change(subject, old_version, new_version, change_type):
    """Track that knowledge changed from one version to another."""
    
    response = await client.memory.add(
        content=f"Version transition: {subject} changed from {old_version} to {new_version}",
        memory_policy={
            "mode": "manual",
            "nodes": [
                {
                    "id": f"version_{old_version}",
                    "type": "KnowledgeVersion",
                    "properties": {
                        "subject": subject,
                        "version": old_version,
                        "effective_until": datetime.now().isoformat(),
                        "superseded": True
                    }
                },
                {
                    "id": f"version_{new_version}",
                    "type": "KnowledgeVersion",
                    "properties": {
                        "subject": subject,
                        "version": new_version,
                        "effective_from": datetime.now().isoformat(),
                        "superseded": False
                    }
                }
            ],
            "relationships": [
                {
                    "source": f"version_{new_version}",
                    "target": f"version_{old_version}",
                    "type": "SUPERSEDES",
                    "properties": {
                        "change_type": change_type
                    }
                }
            ]
        }
    )

# Query version history
version_history_query = """
query GetVersionHistory($subject: String!) {
  knowledge_versions(
    where: { subject: $subject }
    order_by: { effective_from: ASC }
  ) {
    version
    effective_from
    effective_until
    superseded
    supersedes {
      version
    }
    claims {
      statement
      object
    }
  }
}
"""

history = await client.graphql.query(
    query=version_history_query,
    variables={"subject": "API rate limit"}
)

# Result:
# v1.0: "100 requests/hour" (2024-01-01 to 2025-06-01) [superseded]
#   └─> superseded by v2.0
# v2.0: "1000 requests/hour" (2025-06-01 to present) [current]

Why this works:

KnowledgeVersion nodes explicitly model temporal scope
SUPERSEDES relationships create version chains
effective_from/effective_until enable point-in-time queries
Claims link to versions via VERSION_OF relationship

6. Point-in-Time Queries

# What did we know about X on date Y?
async def get_knowledge_at_time(subject: str, timestamp: datetime):
    """Get what we knew about subject at specific time."""
    
    query = """
    query PointInTimeKnowledge($subject: String!, $timestamp: DateTime!) {
      knowledge_versions(where: {
        subject: $subject,
        effective_from: { $lte: $timestamp },
        effective_until: { $gte: $timestamp }
      }) {
        version
        claims {
          statement
          object
          corroboration_score
          sources {
            document_id
            authority_level
          }
        }
      }
    }
    """
    
    result = await client.graphql.query(
        query=query,
        variables={
            "subject": subject,
            "timestamp": timestamp.isoformat()
        }
    )
    
    return result['data']['knowledge_versions']

# Example usage
march_2025_knowledge = await get_knowledge_at_time(
    subject="API rate limit",
    timestamp=datetime(2025, 3, 1)
)

print(f"In March 2025, we knew: {march_2025_knowledge[0]['claims'][0]['statement']}")
# Output: "In March 2025, we knew: The API rate limit is 100 requests per hour"

Why The Developer Can Build This

1. Full Schema Control ✅

# You define EXACTLY what properties claims have
"Claim": {
    "properties": {
        "subject": ...,
        "predicate": ...,
        "object": ...,
        "confidence": ...,
        "source_count": ...,
        # ANY properties you need for governance
    }
}

2. Property Injection ✅

# Force metadata onto every extracted node
"node_constraints": [{
    "node_type": "Claim",
    "set": {
        "extracted_at": datetime.now(),
        "version": "v2.0",
        "reviewed": False
        # Whatever you need
    }
}]

3. Relationship Modeling ✅

# Model ANY relationship type
"relationship_types": {
    "CONFLICTS_WITH": {...},
    "SUPPORTS": {...},
    "SUPERSEDES": {...}
    # Define what makes sense for your domain
}

4. GraphQL Queries ✅

# Complex analytics queries
query = """
query ConflictAnalysis {
  conflict_sets(where: { resolution_status: "unresolved" }) {
    claims {
      object
      sources {
        authority_level
      }
    }
  }
}
"""

5. Provenance Built-In ✅

Every memory automatically has:

memory_id
created_at
source (document/message that created it)
Link to original content

6. Entity Resolution ✅

"API rate limit" mentioned in 10 documents → Papr merges into one node automatically

What Makes This Different from DIY

If You Build from Scratch:

# You'd need to implement:
- Graph database (Neo4j setup, indexes, backups)
- Vector embeddings (model selection, embedding generation)
- Hybrid search (BM25 + vector + graph ranking)
- Entity resolution (fuzzy matching, deduplication)
- Schema validation (custom code)
- GraphQL server (schema generation, resolvers)
- API layer (authentication, rate limiting)
- Multi-tenancy (data isolation)

# Time: 6-12 months
# Cost: $200k+ in engineering

With Papr:

# You implement:
- Schema design (your domain model)
- Governance services (conflict detection, scoring)
- Resolution rules (your business logic)

# Time: 8-10 weeks
# Cost: ~$1,200/month + developer time

The Crucial Insight

Papr isn't trying to be a complete governance system.

It's providing the primitives you need to build one:

┌─────────────────────────────────────┐
│  YOUR GOVERNANCE LOGIC              │  ← You build this
│  • Conflict detection               │
│  • Corroboration scoring            │
│  • Resolution rules                 │
└─────────────────────────────────────┘
               ↕
┌─────────────────────────────────────┐
│  PAPR PRIMITIVES                    │  ← We provide this
│  • Knowledge graph                  │
│  • Custom schemas                   │
│  • Property injection               │
│  • GraphQL queries                  │
│  • Entity resolution                │
│  • Provenance tracking              │
└─────────────────────────────────────┘

The governance layer is domain-specific. Every system has different:

Conflict resolution strategies
Authority hierarchies
Versioning semantics
Corroboration formulas

You wouldn't want Papr to hard-code these. You want flexibility to model your domain.

Why Schema System is Powerful

Compare to other databases:

PostgreSQL:

CREATE TABLE claims (
  id UUID PRIMARY KEY,
  subject TEXT,
  predicate TEXT,
  object TEXT
  -- But no automatic entity resolution
  -- No graph relationships
  -- No semantic search
);

Neo4j (bare):

CREATE (c:Claim {subject: "...", predicate: "..."})
-- But no schema validation
-- No automatic extraction
-- No semantic search
-- No hybrid retrieval

Papr:

schema = client.schemas.create(
    node_types={"Claim": {...}},
    relationship_types={"CONFLICTS_WITH": {...}}
)
# Gets you:
# ✅ Schema validation
# ✅ Entity resolution
# ✅ Semantic search
# ✅ Graph relationships
# ✅ Hybrid retrieval
# ✅ GraphQL queries
# All in one API

Summary

Q: Why can't they just create a Claim schema?

A: They absolutely can. That's the point.

Papr's custom schema system lets you define EXACTLY what your domain needs:

What node types exist (Claim, Source, ConflictSet, etc.)
What properties they have (confidence, version, authority, etc.)
What relationships connect them (EXTRACTED_FROM, CONFLICTS_WITH, etc.)
How to match/dedupe (unique_identifiers, node_constraints)

Then you get:

Storage (graph database)
Retrieval (hybrid search)
Analytics (GraphQL)
Provenance (automatic tracking)
Entity resolution (auto-merging)

All automatically.

What you still build:

Services that USE those queries for conflict detection
Logic that COMPUTES corroboration scores
Rules that RESOLVE contradictions
UI that DISPLAYS governance status

But that's domain logic you'd write regardless of underlying database.

The question isn't "can you do this with Papr?"

The question is "do you want to build the database layer yourself, or focus on governance logic?"

Papr = focus on governance. DIY = build database first, then governance.

Time saved: 6-12 months.

That's why they can (and should) build Osmosis on Papr.

Why Developers CAN Build Osmosis-Style Systems with Papr

The Key Insight

Concrete Example: Why It Works

1. Create Custom Schema

2. Extract Claims from Documents

3. Detect Conflicts with GraphQL

4. Compute Corroboration Scores

5. Track Version History

6. Point-in-Time Queries

Why The Developer Can Build This

1. Full Schema Control ✅

2. Property Injection ✅

3. Relationship Modeling ✅

4. GraphQL Queries ✅

5. Provenance Built-In ✅

6. Entity Resolution ✅

What Makes This Different from DIY

If You Build from Scratch:

With Papr:

The Crucial Insight

Why Schema System is Powerful

PostgreSQL:

Neo4j (bare):

Papr:

Summary

Was this helpful?