Custom Knowledge Graph Schemas
Define your domain ontology to guide how Papr analyzes content and extracts entities.
Overview
Custom schemas allow you to define the structure of your knowledge graph by specifying:
- What types of entities exist in your domain (node types)
- What properties those entities have
- How entities relate to each other (relationship types)
- Validation rules and constraints
When you upload documents or add memories, the system uses your schema to guide entity extraction and ensure consistent graph structure.
Why Custom Schemas
Domain-Specific Extraction
Guide the system to extract entities specific to your domain:
- Legal: Contracts, Parties, Clauses, Obligations, Deadlines
- Medical: Patients, Diagnoses, Treatments, Medications, Procedures
- Code: Functions, Classes, Variables, Dependencies, Bugs
- E-commerce: Products, Customers, Orders, Payments, Reviews
- CRM: Companies, Contacts, Opportunities, Interactions
Consistent Property Definitions
Ensure all entities of the same type have consistent properties across your entire knowledge graph.
Control Entity Resolution
Choose between semantic similarity matching (for open-ended values) or exact matching (for controlled vocabularies) when deduplicating entities.
Automatic Indexing
Required properties are automatically indexed in Neo4j for fast query performance.
How Schemas Work
- You define: Create a schema specifying node types, properties, and relationships
- You upload/add: Upload documents or add memories to the system
- System analyzes: Content is analyzed to identify relevant information
- Schema guides: Your schema guides what entities and relationships to extract
- Predictive models build: Predictive models build the knowledge graph following your structure
- Schema ensures consistency: Your schema ensures consistent structure across all content
Schema Components
Node Types
Node types define entities in your domain. Each node type has:
- name: Unique identifier (must match pattern
^[A-Za-z][A-Za-z0-9_]*$) - label: Display label for the node type
- description: Optional description for documentation
- properties: Object defining all properties for this node type
- required_properties: List of properties that must be present
- unique_identifiers: Properties used for entity deduplication
- color: Optional color for visualization (hex code)
- icon: Optional icon name for visualization
Properties
Properties define attributes of nodes and relationships:
- type: Data type (
string,integer,float,boolean,datetime,array,object) - required: Whether the property must be present (boolean)
- default: Default value if not provided
- description: LLM-friendly description guiding extraction
- min_length/max_length: For strings
- min_value/max_value: For numbers
- enum_values: List of allowed values (max 10)
- pattern: Regex pattern for validation
Relationship Types
Relationship types define how entities connect:
- name: Unique identifier (must match pattern
^[A-Z][A-Z0-9_]*$) - label: Display label
- description: Optional description
- allowed_source_types: List of node types that can be the source
- allowed_target_types: List of node types that can be the target
- properties: Optional properties for the relationship
- cardinality:
one-to-one,one-to-many, ormany-to-many(default) - color: Optional color for visualization
Complete E-commerce Example
Here's a complete schema for an e-commerce domain:
from papr_memory import Papr
import os
client = Papr(x_api_key=os.environ.get("PAPR_MEMORY_API_KEY"))
schema = client.schemas.create(
name="E-commerce Schema",
description="Product catalog and customer relationships for e-commerce operations",
version="1.0.0",
node_types={
"Product": {
"name": "Product",
"label": "Product",
"description": "E-commerce product with pricing and inventory",
"properties": {
"name": {
"type": "string",
"required": True,
"description": "Product name, typically 2-4 words like 'iPhone 15 Pro' or 'Nike Running Shoes'"
},
"price": {
"type": "float",
"required": True,
"description": "Price in USD as decimal number (e.g., 999.99, 29.95)",
"min_value": 0
},
"category": {
"type": "string",
"required": True,
"description": "Main product category - choose the most appropriate category for this item",
"enum_values": ["electronics", "clothing", "books", "home", "sports"]
},
"condition": {
"type": "string",
"required": False,
"description": "Physical condition of the product - use 'new' for brand new items, 'like_new' for barely used",
"enum_values": ["new", "like_new", "good", "fair", "poor"],
"default": "new"
},
"in_stock": {
"type": "boolean",
"required": True,
"description": "Availability status - true if currently available for purchase, false if out of stock"
},
"sku": {
"type": "string",
"required": True,
"description": "Stock keeping unit - exact alphanumeric code for inventory tracking",
"enum_values": ["SKU-001", "SKU-002", "SKU-003", "SKU-004", "SKU-005"]
},
"description": {
"type": "string",
"required": False,
"description": "Detailed product description",
"max_length": 1000
}
},
"required_properties": ["name", "price", "category", "in_stock", "sku"],
"unique_identifiers": ["name", "sku"], # name: semantic, sku: exact
"color": "#e74c3c"
},
"Customer": {
"name": "Customer",
"label": "Customer",
"description": "Customer with purchase history and loyalty tier",
"properties": {
"name": {
"type": "string",
"required": True,
"description": "Customer full name"
},
"email": {
"type": "string",
"required": True,
"description": "Customer email address for contact and identification"
},
"tier": {
"type": "string",
"required": False,
"description": "Customer loyalty tier based on purchase history",
"enum_values": ["bronze", "silver", "gold"],
"default": "bronze"
},
"join_date": {
"type": "datetime",
"required": False,
"description": "Date when customer created account"
}
},
"required_properties": ["name", "email"],
"unique_identifiers": ["email"],
"color": "#3498db"
},
"Review": {
"name": "Review",
"label": "Review",
"description": "Product review with rating and text",
"properties": {
"rating": {
"type": "integer",
"required": True,
"description": "Star rating from 1 to 5",
"min_value": 1,
"max_value": 5
},
"text": {
"type": "string",
"required": False,
"description": "Review text content",
"max_length": 2000
},
"verified_purchase": {
"type": "boolean",
"required": False,
"description": "Whether this review is from a verified purchase",
"default": False
},
"review_date": {
"type": "datetime",
"required": True,
"description": "Date when review was posted"
}
},
"required_properties": ["rating", "review_date"],
"unique_identifiers": [],
"color": "#f39c12"
}
},
relationship_types={
"PURCHASED": {
"name": "PURCHASED",
"label": "Purchased",
"description": "Customer purchased a product",
"allowed_source_types": ["Customer"],
"allowed_target_types": ["Product"],
"properties": {
"date": {
"type": "datetime",
"required": True,
"description": "Purchase date"
},
"amount": {
"type": "float",
"required": True,
"description": "Purchase amount in USD"
},
"quantity": {
"type": "integer",
"required": False,
"description": "Number of items purchased",
"default": 1
}
},
"cardinality": "many-to-many",
"color": "#2ecc71"
},
"REVIEWED": {
"name": "REVIEWED",
"label": "Reviewed",
"description": "Customer wrote a review",
"allowed_source_types": ["Customer"],
"allowed_target_types": ["Review"],
"cardinality": "one-to-many",
"color": "#9b59b6"
},
"REVIEW_OF": {
"name": "REVIEW_OF",
"label": "Review Of",
"description": "Review is about a product",
"allowed_source_types": ["Review"],
"allowed_target_types": ["Product"],
"cardinality": "many-to-one",
"color": "#95a5a6"
}
}
)
print(f"Schema created with ID: {schema.data.id}")TypeScript Example
import Papr from '@papr/memory';
const client = new Papr({
xAPIKey: process.env.PAPR_MEMORY_API_KEY
});
const schema = await client.schemas.create({
name: "E-commerce Schema",
description: "Product catalog and customer relationships",
version: "1.0.0",
node_types: {
Product: {
name: "Product",
label: "Product",
properties: {
name: {
type: "string",
required: true,
description: "Product name, typically 2-4 words"
},
price: {
type: "float",
required: true,
description: "Price in USD as decimal number"
},
category: {
type: "string",
required: true,
enum_values: ["electronics", "clothing", "books", "home", "sports"]
},
in_stock: {
type: "boolean",
required: true
}
},
required_properties: ["name", "price", "category", "in_stock"],
unique_identifiers: ["name"]
}
},
relationship_types: {
PURCHASED: {
name: "PURCHASED",
allowed_source_types: ["Customer"],
allowed_target_types: ["Product"]
}
}
});
console.log(`Schema created with ID: ${schema.data.id}`);Key Concepts
LLM-Friendly Descriptions
Write detailed property descriptions that guide the LLM on expected formats and usage:
Good examples:
{
"name": {
"description": "Product name, typically 2-4 words like 'iPhone 15 Pro' or 'Nike Running Shoes'"
},
"price": {
"description": "Price in USD as decimal number (e.g., 999.99, 29.95)"
},
"status": {
"description": "use 'new' for brand new items, 'like_new' for barely used, 'good' for normal wear"
}
}Poor examples:
{
"name": {"description": "Name"}, # Too vague
"price": {"description": "Price"}, # No guidance on format
"status": {"description": "Status"} # No explanation of values
}Enum Values
Use enums to restrict property values to a predefined list (max 10 values).
When to use enums:
- Limited, well-defined options (≤10 values)
- Controlled vocabularies: "active/inactive", "high/medium/low"
- Status codes, priority levels, categories
- When you want exact matching
When to avoid enums:
- Open-ended text fields: names, titles, descriptions
- Large sets of options (>10): countries, cities
- When you want semantic similarity matching
- Dynamic or frequently changing value sets
Example:
{
"priority": {
"type": "string",
"enum_values": ["low", "medium", "high", "critical"],
"description": "Task priority level"
},
"status": {
"type": "string",
"enum_values": ["draft", "active", "completed", "archived"],
"description": "Current status of the item"
}
}Entity Resolution: Semantic vs Exact Matching
Properties in unique_identifiers are used for entity deduplication:
Without enum_values (Semantic Similarity):
- Uses semantic matching to identify similar entities
- Merges "Apple Inc" and "Apple Inc." as the same entity
- Merges "John Smith" and "J. Smith" if context suggests same person
- Best for open-ended values like company names, person names
With enum_values (Exact Matching):
- Only entities with exactly matching enum values are merged
- "SKU-001" only matches "SKU-001", not "SKU-002"
- Best for controlled identifiers like status codes, SKUs, categories
Example:
{
"Product": {
"properties": {
"name": {
"type": "string",
"required": True
# No enum_values = semantic matching
},
"sku": {
"type": "string",
"required": True,
"enum_values": ["SKU-001", "SKU-002", "SKU-003"]
# With enum_values = exact matching
}
},
"unique_identifiers": ["name", "sku"]
# name uses semantic similarity
# sku uses exact matching
}
}Schema Lifecycle
Schemas go through a lifecycle:
- Draft - Schema is being developed, not used in production
- Active - Schema is used for memory extraction and graph generation
- Deprecated - Schema is marked for removal, but existing data remains
- Archived - Schema is no longer used, preserved for historical data
# Create schema in draft mode
schema = client.schemas.create(
name="My Schema",
status="draft",
# ... rest of schema
)
# Activate when ready
client.schemas.activate(schema.data.id, activate=True)
# Later, deprecate
client.schemas.update(schema.data.id, {"status": "deprecated"})Using Schemas with Documents
Once you've created a schema, use it when uploading documents:
# Upload document with custom schema
response = client.document.upload(
file=open("product_catalog.pdf", "rb"),
schema_id=schema.data.id,
simple_schema_mode=True, # Recommended: system + one custom schema
hierarchical_enabled=True
)The system will use your schema to guide entity and relationship extraction from the document.
Using Schemas with Memory
Use schemas when adding memories directly:
# Add memory with graph generation using schema
response = client.memory.add(
content="Customer Jane Doe purchased iPhone 15 Pro for $999 on 2024-03-15",
graph_generation={
"mode": "auto",
"auto": {
"schema_id": schema.data.id,
"simple_schema_mode": True
}
}
)Managing Schemas
List All Schemas
schemas = client.schemas.list()
for schema in schemas.data:
print(f"{schema.name} ({schema.status})")Get Specific Schema
schema = client.schemas.retrieve(schema_id)
print(schema.data.name)
print(schema.data.node_types)Update Schema
updated = client.schemas.update(
schema_id,
{
"description": "Updated description",
"node_types": {
# Add or modify node types
}
}
)Delete Schema
# Soft delete (archives the schema)
client.schemas.delete(schema_id)Activate/Deactivate
# Activate schema for use
client.schemas.activate(schema_id, activate=True)
# Deactivate schema
client.schemas.activate(schema_id, activate=False)Best Practices
1. Start Simple, Iterate
Begin with a basic schema covering your core entities:
{
"node_types": {
"Customer": { /* minimal properties */ },
"Product": { /* minimal properties */ }
}
}Add complexity as you understand your use case better.
2. Write Clear Descriptions
Every property should have a clear, LLM-friendly description:
{
"contract_value": {
"type": "float",
"description": "Total contract value in USD, including all fees and charges. Format as decimal (e.g., 50000.00)"
}
}3. Use Simple Schema Mode in Production
response = client.document.upload(
file=file,
schema_id="your_schema",
simple_schema_mode=True # System + one custom schema = consistency
)This ensures consistency between document processing and direct memory creation.
4. Limit Node Types (≤15 per schema)
Too many node types make extraction less accurate. Focus on your most important entities.
5. Limit Relationship Types (≤20 per schema)
Keep relationships meaningful and avoid over-specification.
6. Use Enums Sparingly (≤10 values)
Only use enums for truly controlled vocabularies. Open-ended fields should not have enums.
7. Mark Properties as Required Thoughtfully
Only mark properties as required if they're truly essential. Missing required properties can cause extraction failures.
Common Domain Examples
Legal Domain
{
"node_types": {
"Contract": {
"properties": {
"title": {"type": "string", "required": True},
"type": {
"type": "string",
"enum_values": ["service", "employment", "nda", "partnership"]
},
"effective_date": {"type": "datetime"},
"expiration_date": {"type": "datetime"}
}
},
"Party": {
"properties": {
"name": {"type": "string", "required": True},
"role": {
"type": "string",
"enum_values": ["client", "vendor", "employee", "partner"]
}
}
},
"Obligation": {
"properties": {
"description": {"type": "string", "required": True},
"deadline": {"type": "datetime"},
"status": {
"type": "string",
"enum_values": ["pending", "completed", "overdue"]
}
}
}
}
}Medical Domain
{
"node_types": {
"Patient": {
"properties": {
"name": {"type": "string", "required": True},
"dob": {"type": "datetime"},
"medical_record_number": {"type": "string"}
}
},
"Diagnosis": {
"properties": {
"icd_code": {"type": "string"},
"description": {"type": "string", "required": True},
"severity": {
"type": "string",
"enum_values": ["mild", "moderate", "severe", "critical"]
}
}
},
"Treatment": {
"properties": {
"name": {"type": "string", "required": True},
"start_date": {"type": "datetime"},
"duration_days": {"type": "integer"}
}
}
}
}Code Repository Domain
{
"node_types": {
"Function": {
"properties": {
"name": {"type": "string", "required": True},
"language": {
"type": "string",
"enum_values": ["python", "javascript", "typescript", "java"]
},
"description": {"type": "string"},
"complexity": {
"type": "string",
"enum_values": ["low", "medium", "high"]
}
}
},
"Class": {
"properties": {
"name": {"type": "string", "required": True},
"file_path": {"type": "string"}
}
},
"Bug": {
"properties": {
"title": {"type": "string", "required": True},
"severity": {
"type": "string",
"enum_values": ["low", "medium", "high", "critical"]
},
"status": {
"type": "string",
"enum_values": ["open", "in_progress", "resolved", "closed"]
}
}
}
}
}Troubleshooting
Schema Validation Errors
If schema creation fails validation:
- Check node type names match pattern
^[A-Za-z][A-Za-z0-9_]*$ - Check relationship type names match pattern
^[A-Z][A-Z0-9_]*$ - Verify enum_values has ≤10 items
- Ensure required_properties reference existing properties
Extraction Not Finding Entities
- Add more detailed, LLM-friendly property descriptions
- Verify property types match expected data
- Check if required properties are too strict
- Try manual graph generation mode for debugging
Too Many Duplicate Entities
- Add more unique_identifiers
- Use enums for controlled values that should match exactly
- Consider property overrides in document upload
Entities Not Merging
- Check if unique_identifiers are set correctly
- For semantic matching, remove enum_values
- For exact matching, add enum_values
Next Steps
- Graph Generation - Control how graphs are built
- Document Processing - Use schemas with documents
- GraphQL Analysis - Query your custom schemas
- API Reference - Complete schema endpoints