Osmosis Analysis - Updated Summary
Date: February 16, 2026
Status: Analysis updated based on accurate Papr capabilities
Key Corrections Made
The initial analysis underestimated how much Papr handles automatically. Here are the corrections:
1. Conflicting Statements - Much More Automatic
Initial Assessment (Incorrect)
- ❌ Claimed: "No automatic conflict detection"
- ❌ Claimed: "Developer must build conflict detection service"
- ❌ Implied: Complex background services needed
Corrected Assessment (Accurate)
✅ Papr handles automatically:
- Deduplication: Define
unique_identifiers: ["subject", "predicate", "object"]in schema - Multi-source tracking: Same claim from 3 documents → ONE node with 3 EXTRACTED_FROM relationships
- Source counting: GraphQL automatically traverses relationships
✅ Conflict detection is just a query:
query = """
query FindConflicts($subject: String!) {
claims(where: { subject: $subject }) {
object # Different values = conflict
sources { document_id, version, authority } # Count automatically
}
}
"""
# Returns all claims for subject, grouped by Papr's dedup
# If multiple different 'object' values → conflict detected✅ Developer only adds:
- Resolution rules (count > 3? most recent? official sources?)
- Optional: ConflictSet nodes for workflow tracking
Complexity reduction: From "complex service" to "simple query + resolution logic"
2. Contextual Metadata vs Claims - Clearer Mechanisms
Initial Assessment (Vague)
- ⚠️ Said: "Distinction possible but must be explicitly modeled"
- ⚠️ Unclear about mechanisms
Corrected Assessment (Specific)
✅ Two clear mechanisms:
1. Memory metadata (about the source):
metadata={
"version": "v2.0",
"authority": "official",
"document_type": "specification"
}2. Property overrides (injected onto nodes):
memory_policy={
"node_constraints": [{
"node_type": "Claim",
"set": {
"version": "v2.0", # Forced onto node
"authority": "official" # Forced onto node
# LLM still extracts: subject, predicate, object
}
}]
}Key insight: Metadata vs extracted claims vs injected properties are three distinct, well-defined mechanisms.
3. Versioned Knowledge - Simpler Than Described
Initial Assessment (Overcomplicated)
- ⚠️ Emphasized: "Must model versions as separate nodes"
- ⚠️ Made it seem complex
Corrected Assessment (Two Simple Options)
✅ Option 1: Version as property (simplest):
memory_policy={
"node_constraints": [{
"node_type": "Claim",
"set": {
"version": "v2.0",
"effective_from": "2025-01-01",
"effective_until": None # Current
}
}]
}
# Query by version
query = "claims(where: { subject: '...', version: 'v2.0' })"✅ Option 2: Separate version nodes (for complex version chains):
# Create KnowledgeVersion nodes
# Link with SUPERSEDES relationships
# Query version historyKey insight: Property injection via node_constraints makes version tracking trivial. Separate nodes only needed for complex version genealogy.
4. Multi-Source Corroboration - Built-In, Not Custom
Initial Assessment (Underestimated)
- ❌ Claimed: "No automatic source counting"
- ❌ Claimed: "Developer must build corroboration tracking"
- ❌ Suggested: Background service to update scores
Corrected Assessment (Much Simpler)
✅ Papr handles automatically:
- Deduplication merges same claim → ONE node
- Each source gets EXTRACTED_FROM relationship
- GraphQL traverses relationships and returns all sources
✅ Corroboration is just counting:
query = """
query GetCorroboration($subject: String!) {
claims(where: { subject: $subject }) {
object
sources { # Papr populates this automatically
document_id
version
authority
date
}
}
}
"""
# In your code:
for claim in result['claims']:
source_count = len(claim['sources'])
official_count = sum(1 for s in claim['sources'] if s['authority'] == 'official')
# Apply resolution rule
if source_count >= 3 and official_count >= 2:
confidence = "high"✅ Optional: Cache score on node:
# Only if you want to avoid recounting
memory_policy={
"node_constraints": [{
"node_type": "Claim",
"set": {"source_count": 3, "corroboration_score": 0.6}
}]
}Key insight: Counting sources is a simple length() operation on GraphQL result. No background service needed.
Updated Architecture Assessment
What Papr Handles (Revised Up to 90%)
Automatic (No Code Needed):
- ✅ Knowledge graph storage
- ✅ Deduplication (same claim from multiple sources)
- ✅ Multi-source tracking (EXTRACTED_FROM relationships)
- ✅ Source counting (GraphQL traversal)
- ✅ Entity resolution
- ✅ Provenance (automatic)
Developer Controls (via Config):
- ✅ Schema design (what properties claims have)
- ✅ unique_identifiers (what makes claims identical)
- ✅ Property injection (version, authority, etc.)
- ✅ GraphQL queries (conflict detection, analysis)
What Developer Builds (Revised Down to 10%)
Simple Logic:
- Resolution rules (which claim wins in conflict?)
- Inference engine (if A→B and B→C, then A→C)
- Optional: Workflow tracking (ConflictSet nodes, review status)
No longer needed:
- ❌ Conflict detection service (just a query)
- ❌ Source counting service (GraphQL does it)
- ❌ Deduplication logic (Papr handles it)
- ❌ Complex background jobs (most things are queries)
Revised Complexity Assessment
Initial Estimate
- Papr: 70% of infrastructure
- Developer: 30% custom services
Corrected Estimate
- Papr: 90% of infrastructure (+ automatic dedup, source counting, conflict identification)
- Developer: 10% domain logic (resolution rules, inference, optional workflow)
Impact on Timeline
Initial Estimate
- Phase 1 (POC): 2 weeks
- Phase 2 (Services): 4 weeks
- Phase 3 (Production): 4 weeks
- Total: 10 weeks
Revised Estimate
- Phase 1 (POC): 1 week (simpler than expected)
- Phase 2 (Resolution): 2 weeks (just rules, no services)
- Phase 3 (Production): 2 weeks (less to harden)
- Total: 5 weeks ← 50% reduction
Why? Because conflict detection, source counting, and corroboration are built-in queries, not custom services.
Updated Recommendation
Strength of Recommendation: Even Stronger
Before: "Yes, use Papr - saves 6-12 months"
Now: "Absolutely yes - saves 6-12 months AND the custom logic is trivial"
Reasoning:
- Deduplication is automatic - Define unique_identifiers in schema, done
- Source counting is automatic - GraphQL returns sources, just count them
- Conflict detection is automatic - Query by subject, check if multiple values
- Resolution is simple logic - Just max() with your scoring function
- Version tracking is property injection - Add version via node_constraints
What seemed complex (background services) is actually simple (GraphQL queries + basic logic).
Key Messages for Developer
Message 1: Deduplication Handles Most of It
"When you define unique_identifiers on your Claim node type, Papr automatically:
- Merges same claim from multiple documents into ONE node
- Creates EXTRACTED_FROM relationship to each source
- Makes source counting a simple GraphQL query
Conflict detection becomes: 'Are there multiple Claims with different objects for same subject?'"Message 2: Property Injection is Powerful
"Use node_constraints.set to inject metadata onto extracted nodes:
- version: "v2.0"
- authority: "official"
- extraction_date: "2026-01-15"
No need for separate metadata nodes in most cases."Message 3: Resolution is Just Logic
"Conflict resolution is:
winner = max(claims, key=lambda c:
len(c['sources']) * weight_count +
official_count(c['sources']) * weight_authority +
recency(c['sources']) * weight_freshness
)
That's it. No complex service needed."Example: Complete Workflow (Simplified)
Step 1: Define Schema (One Time)
schema = client.schemas.create(
name="Osmosis",
node_types={
"Claim": {
"properties": {
"subject": {"type": "string"},
"predicate": {"type": "string"},
"object": {"type": "string"},
"version": {"type": "string"},
"authority": {"type": "string"}
},
"unique_identifiers": ["subject", "predicate", "object"] # ← Dedup key
}
}
)Step 2: Upload Documents (Automatic)
# Upload 3 documents mentioning "API rate limit is 1000/hour"
for doc in ["spec_v2.pdf", "blog_post.md", "email_thread.txt"]:
client.document.upload(
file=open(doc, "rb"),
schema_id=schema.id,
metadata={"authority": get_authority(doc)}
)
# Papr automatically:
# - Extracts claim: {subject: "API rate limit", predicate: "is", object: "1000/hour"}
# - Deduplicates to ONE Claim node
# - Creates 3 EXTRACTED_FROM relationshipsStep 3: Query for Conflicts (Simple)
query = """
query FindConflicts($subject: String!) {
claims(where: { subject: $subject }) {
object
sources { document_id, authority }
}
}
"""
result = await client.graphql.query(query, {"subject": "API rate limit"})
# Check for different values
values = set(c['object'] for c in result['claims'])
if len(values) > 1:
print(f"CONFLICT: {values}")Step 4: Resolve (Simple Logic)
# Apply resolution rule
for claim in result['claims']:
score = len(claim['sources']) * 2 + sum(1 for s in claim['sources'] if s['authority'] == 'official') * 5
print(f"{claim['object']}: score={score}")
winner = max(result['claims'], key=lambda c: resolution_score(c))
print(f"Winner: {winner['object']}")Total complexity: ~50 lines of code. No background services. No complex workflows.
Bottom Line
The initial analysis was conservative about what Papr provides. The reality is:
Papr handles 90% automatically:
- Deduplication via unique_identifiers
- Multi-source tracking via relationships
- Source counting via GraphQL
- Property injection via node_constraints
Developer adds 10% as simple logic:
- Resolution rules (scoring function)
- Optional: Workflow tracking
- Optional: Inference rules
Timeline reduced from 10 weeks to 5 weeks.
The case for using Papr is even stronger than initially assessed.
Files Updated
OSMOSIS-USE-CASE-ANALYSIS.md- Full technical analysisOSMOSIS-EMAIL-RESPONSE.md- Email to developerOSMOSIS-WHY-IT-WORKS.md- Deep dive on "why schema works"OSMOSIS-UPDATED-SUMMARY.md- This document (corrections)
All documents now accurately reflect Papr's automatic capabilities.