Email Response: Osmosis + Papr Integration Analysis
To: Osmosis Developer
Thanks for the detailed questions! I've done a deep analysis of how Papr aligns with your Osmosis requirements. Here's what I found:
TL;DR: Yes, Papr can be your foundation layer
What Papr handles automatically:
- ✅ Knowledge graph storage with custom schemas
- ✅ Deduplication (same claim from 3 docs → 1 node with 3 sources)
- ✅ Multi-source tracking (EXTRACTED_FROM relationships)
- ✅ Source counting (GraphQL traverses relationships)
- ✅ Provenance tracking (every claim → source)
- ✅ Hybrid retrieval (semantic + keyword + graph)
- ✅ Property injection (version, authority, dates)
What you'd build on top:
- Resolution rules (count > 3? freshness? authority?)
- Inference engine (transitive rules, logical derivation)
- Optional: Workflow tracking (ConflictSet nodes, review status)
Bottom line: Papr gives you 90% of the infrastructure. The epistemological governance layer is custom logic you'd build regardless of underlying database.
Your 5 Questions - Direct Answers
1. Conflicting Statements
Papr provides automatically:
- Deduplication via unique_identifiers: Same (subject, predicate, object) → ONE node
- Multi-source tracking: Multiple EXTRACTED_FROM relationships per claim
- Built-in source counting: GraphQL returns all sources per claim
Example:
# Define deduplication in schema
schema = {
"Claim": {
"properties": {"subject": "...", "predicate": "...", "object": "..."},
"unique_identifiers": ["subject", "predicate", "object"] # Dedup key
}
}
# Upload 3 documents saying "Product X costs $100"
# → Papr creates ONE Claim node with 3 EXTRACTED_FROM relationships
# Upload 1 document saying "Product X costs $150"
# → Papr creates DIFFERENT Claim node (different object value)
# Query for conflicts (built into GraphQL)
query = """
query FindConflicts($subject: String!) {
claims(where: { subject: $subject }) {
object # The value
sources { document_id, version, date } # Auto-counted!
}
}
"""
# Returns:
# Claim 1: object="$100", sources=[doc1, doc2, doc3] (3 sources)
# Claim 2: object="$150", sources=[doc4] (1 source)
# → CONFLICT: same subject, different objectsYou decide resolution:
- Count-based: "Pick claim with most sources" (3 > 1)
- Freshness: "Pick most recent document"
- Authority: "Pick official source over unofficial"
- Combined: "Most sources from official docs"
2. Contextual Metadata vs. Claims
Papr provides two mechanisms:
1. Memory metadata (contextual info about source):
client.memory.add(
content="API rate limit is 1000/hour",
metadata={
"version": "v2.0",
"authority": "official",
"document_section": "rate-limits"
}
)2. Property overrides (forced onto extracted nodes):
client.memory.add(
content="API rate limit is 1000/hour",
memory_policy={
"mode": "auto", # LLM extracts claim
"node_constraints": [{
"node_type": "Claim",
"set": {
"version": "v2.0", # Injected metadata
"authority": "official", # Injected metadata
# LLM extracts: subject, predicate, object
}
}]
}
)Answer: Full control. Use memory.metadata for context, node_constraints.set to inject properties onto nodes, mode: auto for extracted claims.
3. Versioned Knowledge / Temporal Evolution
Papr provides:
- Timestamps on all memories (automatic)
- Property overrides to inject version info
Two approaches:
Option 1: Version as property (simpler):
client.memory.add(
content="API v1.0 rate limit is 100/hour",
memory_policy={
"mode": "auto",
"node_constraints": [{
"node_type": "Claim",
"set": {
"version": "v1.0",
"effective_from": "2024-01-01",
"effective_until": "2025-06-01"
}
}]
}
)
# Query by version
query = "claims(where: { subject: '...', version: 'v1.0' })"Option 2: Separate version nodes (more structured):
# Create explicit KnowledgeVersion nodes
# Link via SUPERSEDES relationships
# Query version chainsAnswer: Inject version via property overrides, OR create separate version nodes. Point-in-time queries via GraphQL filters.
4. Source Evidence vs. Inferred Relationships
Papr provides:
- Two modes:
- Auto mode: LLM extracts relationships from content (tied to source memory)
- Manual mode: You specify exact graph structure
- All extracted relationships link back to source
You build:
- Inference engine queries existing relationships
- Creates new relationships with
inferred: trueflag - Tracks inference rules (transitive, contradiction, etc.)
Example:
# Papr extracts: A --RELATED_TO--> B, B --RELATED_TO--> C
# Your inference engine creates: A --RELATED_TO--> C
# With properties: {inferred: true, rule: "transitive"}Answer: Relationships are primarily extracted, not inferred. You can build inference layer using GraphQL + manual relationship creation.
5. Single-Source vs. Multi-Source Corroboration
Papr provides automatically:
- Deduplication merges same claim from multiple sources → ONE node
- Multiple EXTRACTED_FROM relationships (one per source)
- GraphQL traverses relationships and counts sources
Simple query:
query = """
query GetCorroboration($subject: String!) {
claims(where: { subject: $subject }) {
object
sources { # Papr populates this automatically
document_id
authority
}
}
}
"""
result = await client.graphql.query(query, {"subject": "rate limit"})
for claim in result['data']['claims']:
source_count = len(claim['sources'])
official_count = sum(1 for s in claim['sources'] if s['authority'] == 'official')
# Apply your resolution rule
if source_count >= 3 and official_count >= 2:
# High confidence
elif source_count == 1:
# Single source - needs reviewAnswer: Counting sources is a simple GraphQL query. Papr's deduplication handles multi-source tracking. You just decide the scoring formula (count? weighted? freshness?).
Concrete Schema Example for Osmosis
from papr_memory import Papr
client = Papr(x_api_key="...")
# Your Osmosis schema
schema = client.schemas.create(
name="Osmosis Claims Schema",
node_types={
"Claim": {
"properties": {
"statement": {"type": "string", "required": True},
"subject": {"type": "string"},
"predicate": {"type": "string"},
"object": {"type": "string"},
"confidence": {"type": "float"},
"source_count": {"type": "integer"},
"corroboration_score": {"type": "float"},
"version": {"type": "string"}
}
},
"Source": {
"properties": {
"document_id": {"type": "string", "required": True},
"version": {"type": "string"},
"page": {"type": "integer"},
"authority": {"type": "string", "enum_values": ["official", "unofficial"]},
"publication_date": {"type": "datetime"}
}
},
"ConflictSet": {
"properties": {
"subject": {"type": "string"},
"predicate": {"type": "string"},
"resolution_status": {"type": "string", "enum_values": ["unresolved", "resolved", "ignored"]},
"resolution_rule": {"type": "string"}
}
},
"KnowledgeVersion": {
"properties": {
"version": {"type": "string"},
"effective_from": {"type": "datetime"},
"effective_until": {"type": "datetime"},
"superseded": {"type": "boolean"}
}
}
},
relationship_types={
"EXTRACTED_FROM": {
"allowed_source_types": ["Claim"],
"allowed_target_types": ["Source"]
},
"CONFLICTS_WITH": {
"allowed_source_types": ["Claim"],
"allowed_target_types": ["Claim"]
},
"MEMBER_OF": {
"allowed_source_types": ["Claim"],
"allowed_target_types": ["ConflictSet"]
},
"SUPERSEDES": {
"allowed_source_types": ["KnowledgeVersion"],
"allowed_target_types": ["KnowledgeVersion"]
},
"VERSION_OF": {
"allowed_source_types": ["Claim"],
"allowed_target_types": ["KnowledgeVersion"]
}
}
)
# Upload document with extraction
client.document.upload(
file=open("api_spec_v2.pdf", "rb"),
schema_id=schema.data.id,
metadata={"version": "v2.0", "authority": "official"}
)
# Query conflicts
conflicts = client.graphql.query("""
query GetConflicts($subject: String!) {
conflict_sets(where: {
subject: $subject,
resolution_status: "unresolved"
}) {
subject
predicate
claims {
statement
confidence
sources {
document_id
authority
version
}
}
}
}
""", {"subject": "rate limit"})Architecture Overview
┌─────────────────────────────────────────┐
│ OSMOSIS APPLICATION LAYER │
│ • Conflict detection service │
│ • Corroboration scoring │
│ • Version history tracking │
│ • Inference engine │
│ • Resolution rules │
└─────────────────────────────────────────┘
↕
┌─────────────────────────────────────────┐
│ PAPR MEMORY API │
│ • Knowledge graph (Neo4j) │
│ • Custom schema support │
│ • Entity resolution │
│ • Hybrid retrieval │
│ • GraphQL analytics │
│ • Provenance tracking │
└─────────────────────────────────────────┘Papr handles: Storage, retrieval, entity merging, graph queries
You handle: Governance logic, conflict detection, scoring, versioning
Why This Works
1. Schema Flexibility
You define exactly what a "claim" is, what properties it has, how it relates to sources.
2. Node Constraints
Force metadata onto every extracted claim:
memory_policy={
"node_constraints": [{
"node_type": "Claim",
"set": {
"confidence": 0.95,
"extracted_at": "2026-02-16T10:00:00Z",
"version": "v2.1"
}
}]
}3. GraphQL Power
Complex queries for governance:
query ConflictAnalysis($subject: String!) {
conflict_sets(where: { subject: $subject }) {
claims {
statement
sources {
authority
version
}
}
}
}4. Provenance Built-In
Every memory has source, created_at, metadata.document_id. Trace every claim back to evidence.
5. Entity Resolution
"Product X" mentioned in 5 documents → one node. Corroboration counting works automatically.
What You Still Build
Resolution Rules
# Simple logic - no complex service needed
async def resolve_conflict(claims):
# Count-based: Pick claim with most sources
# Freshness-based: Pick most recent
# Authority-based: Pick official over unofficial
# Combined: Your formula
winner = max(claims, key=lambda c:
len(c['sources']) * 0.5 + # Source count
sum(1 for s in c['sources'] if s['authority'] == 'official') * 0.3 + # Authority
recency_weight(c['sources']) * 0.2 # Freshness
)Version Tracking
async def track_version(claim_id, new_version):
# Create KnowledgeVersion nodes
# Link SUPERSEDES relationshipsInference Engine
async def apply_inference_rules():
# Transitive: A→B, B→C ⇒ A→C
# Contradiction: A→B, A→¬B ⇒ conflictTime & Cost Estimates
Development Timeline
- Phase 1 (2 weeks): POC with 3 test documents
- Phase 2 (4 weeks): Core governance services
- Phase 3 (4 weeks): Production hardening
Total: ~10 weeks to production (vs 6-12 months building from scratch)
Cost (10,000 documents)
- Papr Cloud: ~$1,100/month (storage + API calls)
- Custom services: ~$70/month (compute + database)
- Total: ~$1,200/month
Recommended Next Steps
Week 1: Quick prototype
- Create Osmosis schema in Papr
- Upload 3 test documents
- Extract claims (auto mode)
- Query via GraphQL
Week 2: Validate approach
- Write conflict detection script
- Test corroboration counting
- Verify provenance tracking
Decision point: If POC works, commit to Papr as foundation
My Assessment
Papr is positioned as both:
- ✅ Retrieval & memory layer (your threshold requirement)
- ✅ Knowledge graph foundation for governance (90% of infrastructure)
What Papr is NOT:
- ❌ Complete governance system out-of-box
- ❌ Built-in conflict detection
- ❌ Automatic temporal snapshots
But you'd need custom governance logic regardless of database choice. The question is whether you want to build:
- Graph database + entity resolution + retrieval + schema system + provenance (Papr does this), OR
- Just governance logic on top of existing foundation (Papr route)
Recommendation: Use Papr. Saves 6-12 months. Custom schema system enables exactly what you need.
Questions for You
- Scale: How many documents? How many cross-references?
- Latency: Real-time conflict detection, or batch jobs acceptable?
- Inference: How complex? (Simple transitive, or full first-order logic?)
- Version granularity: Document-level or statement-level versioning?
- POC timeline: Can you allocate 2 weeks for proof-of-concept?
Let's Talk
I've prepared a full technical analysis (50+ pages) covering:
- Detailed answers to each question with code examples
- Complete schema design for Osmosis
- Full workflow (upload → extract → detect → resolve → query)
- Performance considerations & optimization strategies
- Decision framework & trade-off analysis
Ready to dive deeper? Let's schedule a call to walk through:
- Schema design review
- Governance services architecture
- Integration patterns
- Performance optimization
Book time here: [calendly link]
Or reply with questions and I'll follow up.
Best regards,
Amir
Papr Team
P.S. The fact that you're thinking about provenance, corroboration, and versioning tells me you're building something sophisticated. Papr's schema system was designed exactly for this kind of domain-specific knowledge modeling. Happy to help you nail the design.