Why Developers CAN Build Osmosis-Style Systems with Papr
Question: Why can't a developer just create a schema with a "Claim" node type that tracks claims and implements the governance features Osmosis needs?
Answer: They absolutely can. That's exactly what Papr's custom schema system is designed for.
The Key Insight
The developer's questions suggest they're thinking about what they need (conflict detection, versioning, corroboration) when they should first consider where to store the data that enables those features.
Papr provides the "where":
- Knowledge graph for storing claims and relationships
- Custom schemas for defining domain models
- Node constraints for property injection
- GraphQL for complex queries
The developer builds the "what":
- Logic that uses those queries to detect conflicts
- Services that compute corroboration scores
- Rules that resolve contradictions
Concrete Example: Why It Works
1. Create Custom Schema
from papr_memory import Papr
client = Papr(x_api_key="your_api_key")
# Define EXACTLY what a Claim is in your domain
schema = client.schemas.create(
name="Documentary Claims Tracking",
description="Schema for evidence-based documentary knowledge",
node_types={
# First-class Claim node type
"Claim": {
"name": "Claim",
"label": "Claim",
"description": "A factual assertion extracted from sources",
"properties": {
# Core claim components (RDF-style triple)
"subject": {
"type": "string",
"required": True,
"description": "What the claim is about"
},
"predicate": {
"type": "string",
"required": True,
"description": "The relationship or property"
},
"object": {
"type": "string",
"required": True,
"description": "The value or target"
},
# Full statement for readability
"statement": {
"type": "string",
"required": True,
"description": "Human-readable claim text"
},
# Confidence metrics
"extraction_confidence": {
"type": "float",
"default": 0.0,
"description": "LLM confidence in extraction (0-1)"
},
"source_count": {
"type": "integer",
"default": 1,
"description": "Number of sources supporting this claim"
},
"corroboration_score": {
"type": "float",
"default": 0.0,
"description": "Multi-source corroboration strength (0-1)"
},
# Provenance
"extracted_from_document": {
"type": "string",
"description": "Source document ID"
},
"extracted_from_page": {
"type": "integer",
"description": "Source page number"
},
"extracted_at": {
"type": "datetime",
"description": "When extraction occurred"
},
# Versioning
"as_of_version": {
"type": "string",
"description": "Document/API version this claim comes from"
},
"temporal_scope": {
"type": "string",
"enum_values": ["historical", "current", "future"],
"description": "Temporal validity of claim"
},
# Governance
"conflict_status": {
"type": "string",
"enum_values": ["none", "potential", "confirmed", "resolved"],
"default": "none"
},
"reviewed": {
"type": "boolean",
"default": False,
"description": "Human review status"
},
"authoritative": {
"type": "boolean",
"default": False,
"description": "Marked as authoritative/canonical"
}
},
"required_properties": ["subject", "predicate", "object", "statement"],
"unique_identifiers": ["subject", "predicate", "object"] # Dedupe on triple
},
# Source tracking
"DocumentSource": {
"name": "DocumentSource",
"label": "Document Source",
"properties": {
"document_id": {"type": "string", "required": True},
"document_title": {"type": "string"},
"version": {"type": "string"},
"publication_date": {"type": "datetime"},
"authority_level": {
"type": "string",
"enum_values": ["official", "draft", "unofficial", "third_party"]
},
"document_type": {
"type": "string",
"enum_values": ["specification", "contract", "report", "article", "email"]
}
},
"unique_identifiers": ["document_id"]
},
# Conflict tracking
"ConflictSet": {
"name": "ConflictSet",
"label": "Conflict Set",
"description": "Group of claims that conflict with each other",
"properties": {
"subject": {"type": "string", "required": True},
"predicate": {"type": "string", "required": True},
"detected_at": {"type": "datetime"},
"resolution_status": {
"type": "string",
"enum_values": ["unresolved", "resolved", "acknowledged", "ignored"]
},
"resolution_method": {
"type": "string",
"description": "How conflict was resolved (e.g., 'most_recent', 'highest_authority')"
},
"resolved_at": {"type": "datetime"},
"resolved_by": {"type": "string"},
"notes": {"type": "string"}
}
},
# Version tracking
"KnowledgeVersion": {
"name": "KnowledgeVersion",
"label": "Knowledge Version",
"description": "Temporal version of knowledge about a subject",
"properties": {
"subject": {"type": "string", "required": True},
"version": {"type": "string", "required": True},
"effective_from": {"type": "datetime"},
"effective_until": {"type": "datetime"},
"superseded": {"type": "boolean", "default": False}
}
}
},
relationship_types={
# Provenance relationships
"EXTRACTED_FROM": {
"name": "EXTRACTED_FROM",
"allowed_source_types": ["Claim"],
"allowed_target_types": ["DocumentSource"],
"properties": {
"page_number": {"type": "integer"},
"section": {"type": "string"},
"extraction_method": {
"type": "string",
"enum_values": ["llm", "manual", "structured"]
}
}
},
# Conflict relationships
"CONFLICTS_WITH": {
"name": "CONFLICTS_WITH",
"allowed_source_types": ["Claim"],
"allowed_target_types": ["Claim"],
"properties": {
"conflict_type": {
"type": "string",
"enum_values": ["direct_contradiction", "incompatible_values", "temporal_inconsistency"]
},
"detected_at": {"type": "datetime"}
}
},
"MEMBER_OF": {
"name": "MEMBER_OF",
"allowed_source_types": ["Claim"],
"allowed_target_types": ["ConflictSet"]
},
# Support relationships
"SUPPORTS": {
"name": "SUPPORTS",
"allowed_source_types": ["Claim"],
"allowed_target_types": ["Claim"],
"properties": {
"support_type": {
"type": "string",
"enum_values": ["corroborates", "provides_evidence", "logical_entailment"]
},
"confidence": {"type": "float"}
}
},
"REFUTES": {
"name": "REFUTES",
"allowed_source_types": ["Claim"],
"allowed_target_types": ["Claim"]
},
# Version relationships
"VERSION_OF": {
"name": "VERSION_OF",
"allowed_source_types": ["Claim"],
"allowed_target_types": ["KnowledgeVersion"]
},
"SUPERSEDES": {
"name": "SUPERSEDES",
"allowed_source_types": ["KnowledgeVersion"],
"allowed_target_types": ["KnowledgeVersion"],
"properties": {
"change_type": {
"type": "string",
"enum_values": ["correction", "update", "deprecation", "clarification"]
}
}
}
}
)
print(f"Schema created: {schema.data.id}")2. Extract Claims from Documents
# Upload document - Papr extracts claims automatically
response = client.document.upload(
file=open("api_specification_v2.pdf", "rb"),
schema_id=schema.data.id,
metadata={
"version": "v2.0",
"authority": "official",
"document_type": "specification"
}
)
# OR add claim explicitly with full control
response = client.memory.add(
content="The API rate limit is 1000 requests per hour according to section 3.2 of the v2.0 specification",
memory_policy={
"mode": "auto", # Let LLM extract entities
"schema_id": schema.data.id,
# Force specific properties on extracted Claim node
"node_constraints": [
{
"node_type": "Claim",
"set": {
"subject": "API rate limit",
"predicate": "is",
"object": "1000 requests per hour",
"statement": "The API rate limit is 1000 requests per hour",
"as_of_version": "v2.0",
"extracted_from_document": "api_spec_v2",
"extracted_from_page": 12,
"extracted_at": datetime.now().isoformat(),
"extraction_confidence": 0.95,
"temporal_scope": "current"
}
},
{
"node_type": "DocumentSource",
"set": {
"document_id": "api_spec_v2",
"version": "v2.0",
"authority_level": "official",
"document_type": "specification"
}
}
]
}
)What just happened:
- Claim node created with ALL the properties you need for governance
- DocumentSource node created (or linked if exists)
- EXTRACTED_FROM relationship automatically created
- Full provenance chain established
3. Detect Conflicts with GraphQL
# Query all claims about same subject+predicate
conflict_query = """
query FindPotentialConflicts($subject: String!, $predicate: String!) {
claims(where: {
subject: $subject,
predicate: $predicate
}) {
id
statement
object
as_of_version
extraction_confidence
extracted_from {
document_id
version
authority_level
}
}
}
"""
# Execute query
result = await client.graphql.query(
query=conflict_query,
variables={
"subject": "API rate limit",
"predicate": "is"
}
)
# Analyze results
claims = result['data']['claims']
unique_values = set(claim['object'] for claim in claims)
if len(unique_values) > 1:
print(f"⚠️ CONFLICT DETECTED: {len(unique_values)} different values")
for claim in claims:
print(f" - '{claim['object']}' from {claim['extracted_from']['document_id']} (v{claim['extracted_from']['version']})")
# Create conflict set
conflict_response = await client.memory.add(
content=f"Conflict detected for '{subject}' {predicate}",
memory_policy={
"mode": "manual",
"nodes": [
{
"id": "conflict_set_1",
"type": "ConflictSet",
"properties": {
"subject": subject,
"predicate": predicate,
"detected_at": datetime.now().isoformat(),
"resolution_status": "unresolved"
}
}
],
"relationships": [
{
"source": claim['id'],
"target": "conflict_set_1",
"type": "MEMBER_OF"
}
for claim in claims
]
}
)Why this works:
- Your schema defines EXACTLY what properties each Claim has
- GraphQL lets you query by those properties
- You can group by (subject, predicate) to find conflicts
- ConflictSet nodes track resolution status
4. Compute Corroboration Scores
# Service that runs periodically
async def update_corroboration_scores():
"""Count how many sources support each claim."""
# Get all claims with their sources
query = """
query GetClaimSources {
claims {
id
subject
predicate
object
extracted_from {
document_id
authority_level
}
}
}
"""
result = await client.graphql.query(query)
# Group claims by (subject, predicate, object) triple
claim_groups = {}
for claim in result['data']['claims']:
key = (claim['subject'], claim['predicate'], claim['object'])
if key not in claim_groups:
claim_groups[key] = []
claim_groups[key].append(claim)
# Update each claim's corroboration score
for (subject, predicate, obj), claims in claim_groups.items():
source_count = len(set(c['extracted_from']['document_id'] for c in claims))
# Weight by authority level
official_sources = sum(1 for c in claims if c['extracted_from']['authority_level'] == 'official')
# Compute score (example formula)
corroboration_score = min((source_count * 0.2) + (official_sources * 0.3), 1.0)
# Update all claims in this group
for claim in claims:
await client.memory.add(
content=f"Corroboration update for claim {claim['id']}",
memory_policy={
"mode": "manual",
"node_constraints": [
{
"node_type": "Claim",
"search": {
"properties": [
{"name": "id", "mode": "exact", "value": claim['id']}
]
},
"set": {
"source_count": source_count,
"corroboration_score": corroboration_score
}
}
]
}
)
print(f"✅ Updated corroboration for '{subject} {predicate} {obj}': {source_count} sources, score={corroboration_score}")Why this works:
- You stored source_count and corroboration_score in your Claim schema
- GraphQL lets you query all claims with their sources
- You compute the score using whatever formula makes sense
- Node constraints let you update specific claims by ID
5. Track Version History
# Create version nodes as knowledge evolves
async def track_version_change(subject, old_version, new_version, change_type):
"""Track that knowledge changed from one version to another."""
response = await client.memory.add(
content=f"Version transition: {subject} changed from {old_version} to {new_version}",
memory_policy={
"mode": "manual",
"nodes": [
{
"id": f"version_{old_version}",
"type": "KnowledgeVersion",
"properties": {
"subject": subject,
"version": old_version,
"effective_until": datetime.now().isoformat(),
"superseded": True
}
},
{
"id": f"version_{new_version}",
"type": "KnowledgeVersion",
"properties": {
"subject": subject,
"version": new_version,
"effective_from": datetime.now().isoformat(),
"superseded": False
}
}
],
"relationships": [
{
"source": f"version_{new_version}",
"target": f"version_{old_version}",
"type": "SUPERSEDES",
"properties": {
"change_type": change_type
}
}
]
}
)
# Query version history
version_history_query = """
query GetVersionHistory($subject: String!) {
knowledge_versions(
where: { subject: $subject }
order_by: { effective_from: ASC }
) {
version
effective_from
effective_until
superseded
supersedes {
version
}
claims {
statement
object
}
}
}
"""
history = await client.graphql.query(
query=version_history_query,
variables={"subject": "API rate limit"}
)
# Result:
# v1.0: "100 requests/hour" (2024-01-01 to 2025-06-01) [superseded]
# └─> superseded by v2.0
# v2.0: "1000 requests/hour" (2025-06-01 to present) [current]Why this works:
- KnowledgeVersion nodes explicitly model temporal scope
- SUPERSEDES relationships create version chains
- effective_from/effective_until enable point-in-time queries
- Claims link to versions via VERSION_OF relationship
6. Point-in-Time Queries
# What did we know about X on date Y?
async def get_knowledge_at_time(subject: str, timestamp: datetime):
"""Get what we knew about subject at specific time."""
query = """
query PointInTimeKnowledge($subject: String!, $timestamp: DateTime!) {
knowledge_versions(where: {
subject: $subject,
effective_from: { $lte: $timestamp },
effective_until: { $gte: $timestamp }
}) {
version
claims {
statement
object
corroboration_score
sources {
document_id
authority_level
}
}
}
}
"""
result = await client.graphql.query(
query=query,
variables={
"subject": subject,
"timestamp": timestamp.isoformat()
}
)
return result['data']['knowledge_versions']
# Example usage
march_2025_knowledge = await get_knowledge_at_time(
subject="API rate limit",
timestamp=datetime(2025, 3, 1)
)
print(f"In March 2025, we knew: {march_2025_knowledge[0]['claims'][0]['statement']}")
# Output: "In March 2025, we knew: The API rate limit is 100 requests per hour"Why The Developer Can Build This
1. Full Schema Control ✅
# You define EXACTLY what properties claims have
"Claim": {
"properties": {
"subject": ...,
"predicate": ...,
"object": ...,
"confidence": ...,
"source_count": ...,
# ANY properties you need for governance
}
}2. Property Injection ✅
# Force metadata onto every extracted node
"node_constraints": [{
"node_type": "Claim",
"set": {
"extracted_at": datetime.now(),
"version": "v2.0",
"reviewed": False
# Whatever you need
}
}]3. Relationship Modeling ✅
# Model ANY relationship type
"relationship_types": {
"CONFLICTS_WITH": {...},
"SUPPORTS": {...},
"SUPERSEDES": {...}
# Define what makes sense for your domain
}4. GraphQL Queries ✅
# Complex analytics queries
query = """
query ConflictAnalysis {
conflict_sets(where: { resolution_status: "unresolved" }) {
claims {
object
sources {
authority_level
}
}
}
}
"""5. Provenance Built-In ✅
Every memory automatically has:
memory_idcreated_atsource(document/message that created it)- Link to original content
6. Entity Resolution ✅
"API rate limit" mentioned in 10 documents → Papr merges into one node automatically
What Makes This Different from DIY
If You Build from Scratch:
# You'd need to implement:
- Graph database (Neo4j setup, indexes, backups)
- Vector embeddings (model selection, embedding generation)
- Hybrid search (BM25 + vector + graph ranking)
- Entity resolution (fuzzy matching, deduplication)
- Schema validation (custom code)
- GraphQL server (schema generation, resolvers)
- API layer (authentication, rate limiting)
- Multi-tenancy (data isolation)
# Time: 6-12 months
# Cost: $200k+ in engineeringWith Papr:
# You implement:
- Schema design (your domain model)
- Governance services (conflict detection, scoring)
- Resolution rules (your business logic)
# Time: 8-10 weeks
# Cost: ~$1,200/month + developer timeThe Crucial Insight
Papr isn't trying to be a complete governance system.
It's providing the primitives you need to build one:
┌─────────────────────────────────────┐
│ YOUR GOVERNANCE LOGIC │ ← You build this
│ • Conflict detection │
│ • Corroboration scoring │
│ • Resolution rules │
└─────────────────────────────────────┘
↕
┌─────────────────────────────────────┐
│ PAPR PRIMITIVES │ ← We provide this
│ • Knowledge graph │
│ • Custom schemas │
│ • Property injection │
│ • GraphQL queries │
│ • Entity resolution │
│ • Provenance tracking │
└─────────────────────────────────────┘The governance layer is domain-specific. Every system has different:
- Conflict resolution strategies
- Authority hierarchies
- Versioning semantics
- Corroboration formulas
You wouldn't want Papr to hard-code these. You want flexibility to model your domain.
Why Schema System is Powerful
Compare to other databases:
PostgreSQL:
CREATE TABLE claims (
id UUID PRIMARY KEY,
subject TEXT,
predicate TEXT,
object TEXT
-- But no automatic entity resolution
-- No graph relationships
-- No semantic search
);Neo4j (bare):
CREATE (c:Claim {subject: "...", predicate: "..."})
-- But no schema validation
-- No automatic extraction
-- No semantic search
-- No hybrid retrievalPapr:
schema = client.schemas.create(
node_types={"Claim": {...}},
relationship_types={"CONFLICTS_WITH": {...}}
)
# Gets you:
# ✅ Schema validation
# ✅ Entity resolution
# ✅ Semantic search
# ✅ Graph relationships
# ✅ Hybrid retrieval
# ✅ GraphQL queries
# All in one APISummary
Q: Why can't they just create a Claim schema?
A: They absolutely can. That's the point.
Papr's custom schema system lets you define EXACTLY what your domain needs:
- What node types exist (Claim, Source, ConflictSet, etc.)
- What properties they have (confidence, version, authority, etc.)
- What relationships connect them (EXTRACTED_FROM, CONFLICTS_WITH, etc.)
- How to match/dedupe (unique_identifiers, node_constraints)
Then you get:
- Storage (graph database)
- Retrieval (hybrid search)
- Analytics (GraphQL)
- Provenance (automatic tracking)
- Entity resolution (auto-merging)
All automatically.
What you still build:
- Services that USE those queries for conflict detection
- Logic that COMPUTES corroboration scores
- Rules that RESOLVE contradictions
- UI that DISPLAYS governance status
But that's domain logic you'd write regardless of underlying database.
The question isn't "can you do this with Papr?"
The question is "do you want to build the database layer yourself, or focus on governance logic?"
Papr = focus on governance. DIY = build database first, then governance.
Time saved: 6-12 months.
That's why they can (and should) build Osmosis on Papr.