Graph-aware embeddings and domain schemas
Terminology: We call this graph-aware embeddings (concept) implemented via holographic APIs (technology). When you see
enable_holographicorholographic_config, that's the implementation layer. When we discuss "domain schemas" and "structured dimensions," that's the conceptual layer this guide focuses on.
The problem
Standard vector search ranks by "semantic closeness" but can't tell if two results are close for the same reason.
A code snippet about "sorting arrays" and "sorting linked lists" might be semantically near, but they use different algorithms, data structures, and APIs. A scientific claim about "aspirin reduces heart attack risk" and "aspirin causes stomach bleeding" are both about aspirin, but one supports use and the other warns against it.
The solution
Graph-aware embeddings solve this by encoding structured domain dimensions alongside your base vector—things like programming language, operation type, evidence strength, temporal context, or custom fields you define. Search can then filter and boost by topical alignment, domain-specific context, and other axes beyond flat similarity.
Alongside your base vector, Papr extracts registered domain dimensions (topic, time, intent, entities, and other fields you define) and encodes them into fourteen frequency bands. That makes retrieval sensitive to topical alignment, temporal and situational context, and other axes your frequency schema defines—beyond raw semantic recall.
Implementation: Uses Papr's holographic pipeline (enable_holographic, holographic_config, /v1/holographic/*). This guide covers concepts and schemas; API request/response shapes are in the API reference (HolographicConfig, CreateDomainRequest, CustomFrequencyField, /v1/frequencies, /v1/holographic/domains).
Example: Code search with vs without graph-aware
Query: "How do I sort a list in Python?"
Standard semantic search (without graph-aware):
Returns mixed results:
- Python
list.sort()✅ - JavaScript
array.sort()❌ (different language) - Python sorting algorithms tutorial ⚠️ (conceptual, not code)
- SQL
ORDER BY❌ (different domain)
With graph-aware embeddings (cosqa schema):
Returns filtered results aligned on language=Python and primary_operation=sorting:
- Python
list.sort()✅ - Python
sorted()function ✅ - Python custom sort with
key=✅ - Python
heapq.nsmallest()for partial sorts ✅
Why: The schema encodes programming_domain, language, and primary_operation dimensions. H-COND scoring boosts results with high alignment on these fields.
Why it matters
| Plain similarity | Graph-aware embeddings |
|---|---|
| "Close in embedding space" | "Close and aligned on the same domain dimensions (e.g. same kind of code operation, same type of scientific claim, same topic lane)" |
| Weak on nuance (time, role, domain jargon) | Tunable via schema: what to extract and how hard each dimension should matter (weights) |
| One-size-fits-all | Built-in domains for common cases, or custom schemas for your vocabulary |
Choosing the right frequency schema—built-in or custom—is the single biggest lever for retrieval quality when you enable this mode.
Built-in schemas
Three schemas ship by default. Each defines fourteen frequency dimensions tuned to its domain. Use their ids or shortnames (for example cosqa, scifact, general) with frequency_schema_id on writes and in holographic_config on search. Inspect live definitions with GET /v1/frequencies.
cosqa — Code search
Optimized for matching code snippets to natural language queries. Reported on the CosQA benchmark: +5.5% NDCG@10 vs. cross-encoder baseline.
| Hz | Dimension | Type | Weight |
|---|---|---|---|
| 0.1 | programming_domain | FreeText | 0.6 |
| 0.5 | language | Enum | 0.8 |
| 1.0 | query_intent | FreeText | 1.0 |
| 2.0 | primary_operation | FreeText | 1.0 |
| 4.0 | key_apis | MultiValue | 1.0 |
| 6.0 | data_types_used | MultiValue | 0.8 |
| 8.0 | operation_verbs | MultiValue | 0.7 |
| 10 | secondary_apis | MultiValue | 0.5 |
| 14 | return_behavior | FreeText | 0.8 |
| 18 | input_output_signature | FreeText | 0.7 |
| 22 | code_pattern | FreeText | 0.8 |
| 26 | error_handling_pattern | FreeText | 0.5 |
| 30 | design_paradigm | FreeText | 0.6 |
| 34 | algorithm_technique | FreeText | 0.7 |
scifact — Scientific claims
Optimized for matching scientific claims to evidence passages. Reported on the SciFact benchmark: +36% NDCG@10 vs. baseline.
| Hz | Dimension | Type | Weight |
|---|---|---|---|
| 0.1 | scientific_field | Enum | 0.7 |
| 0.5 | claim_type | FreeText | 0.9 |
| 1.0 | methodology | FreeText | 1.0 |
| 2.0 | evidence_strength | Enum | 0.8 |
| 4.0 | biological_entities | MultiValue | 1.0 |
| 6.0 | chemical_compounds | MultiValue | 0.8 |
| 8.0 | study_design | FreeText | 0.7 |
| 10 | outcome_measures | MultiValue | 0.9 |
| 14 | population_context | FreeText | 0.6 |
| 18 | statistical_methods | FreeText | 0.7 |
| 22 | causal_mechanism | FreeText | 0.8 |
| 26 | temporal_context | FreeText | 0.5 |
| 30 | contradiction_signals | FreeText | 0.6 |
| 34 | confidence_level | FreeText | 0.7 |
general — General purpose
Domain-agnostic schema for mixed content. Use as a starting point for evaluation before specializing.
| Hz | Dimension | Type | Weight |
|---|---|---|---|
| 0.1 | topic_category | FreeText | 0.7 |
| 0.5 | content_type | Enum | 0.6 |
| 1.0 | primary_subject | FreeText | 1.0 |
| 2.0 | intent | FreeText | 1.0 |
| 4.0 | key_entities | MultiValue | 0.9 |
| 6.0 | domain_terms | MultiValue | 0.8 |
| 8.0 | action_verbs | MultiValue | 0.7 |
| 10 | specificity | Enum | 0.6 |
| 14 | audience_level | Enum | 0.5 |
| 18 | sentiment | Enum | 0.4 |
| 22 | temporal_relevance | FreeText | 0.5 |
| 26 | geographic_context | FreeText | 0.4 |
| 30 | source_type | Enum | 0.5 |
| 34 | complexity | Enum | 0.6 |
Table types (FreeText, Enum, MultiValue) describe the dimension; the API uses lowercase types on custom fields such as free_text, enum, and multi_value_text—see below.
About frequency bands (Hz values)
The built-in schemas show Hz values (0.1, 0.5, 2.0, etc.) representing the 14 standard brain-inspired frequency bands in holographic encoding. When registering a custom schema, your frequency field must use one of these allowed values—see CustomFrequencyField in the API reference for the exact enum (e.g., 0.1, 0.5, 2.0, 4.0, 6.0, 10.0, 12.0, 18.0, 19.0, 24.0, 30.0, 40.0, 50.0, 70.0).
Rule of thumb:
- Lower Hz (0.1–2.0) → Categorical/Enum dimensions (language, domain, contract type)
- Mid Hz (4.0–14) → Descriptive/FreeText dimensions (intent, operation, methodology)
- Higher Hz (18–70) → List/MultiValue dimensions (APIs, entities, key terms)
Registering a custom domain schema
Define a schema for your own domain with POST /v1/holographic/domains. Custom schemas are scoped to your API key and map up to fourteen fields onto the standard frequency bands. The reference defines allowed frequency values and field types for CreateDomainRequest / CustomFrequencyField.
POST /v1/holographic/domains
X-API-Key: YOUR_KEY
Content-Type: application/json{
"name": "acme:legal_contracts:1.0.0",
"description": "Legal contract analysis",
"fields": [
{ "frequency": 0.1, "name": "contract_type", "type": "enum", "values": ["MSA", "NDA", "SOW"], "weight": 0.8 },
{ "frequency": 0.5, "name": "jurisdiction", "type": "enum", "values": ["US", "EU", "UK"], "weight": 0.7 },
{ "frequency": 2.0, "name": "primary_obligation", "type": "free_text", "weight": 1.0 },
{ "frequency": 4.0, "name": "parties_involved", "type": "multi_value_text", "weight": 0.9 }
]
}Response:
{
"status": "success",
"schema_id": "acme:legal_contracts:1.0.0",
"domain": "legal_contracts",
"num_frequencies": 4
}Using the registered schema:
from papr_memory import Papr
client = Papr(x_api_key="YOUR_KEY")
# 1. Add memories with the custom schema
memory = client.memory.add(
content="Signed NDA with Acme Corp, jurisdiction: US, expires 2027-01-01",
enable_holographic=True,
frequency_schema_id="acme:legal_contracts:1.0.0"
)
# 2. Search with domain-specific filtering
results = client.memory.search(
query="Find all active NDAs in US jurisdiction",
holographic_config={
"enabled": True,
"frequency_schema_id": "acme:legal_contracts:1.0.0",
"frequency_filters": {
"contract_type": 0.9, # Must be 90%+ aligned on contract type
"jurisdiction": 0.8 # Must be 80%+ aligned on jurisdiction
}
}
)Schema design principles
- Enum for categorical dimensions (language, contract type)—often on lower Hz bands.
- Free text for descriptive dimensions (intent, operation)—mid bands.
- Multi-value for lists (APIs, entities, compounds)—higher bands.
- Weight what discriminates: if language matters most, weight it
1.0; if a dimension is mostly noise, weight it down (for example0.3). - At most fourteen fields—one per frequency band in the holographic encoding.
Custom schema co-design with Papr is usually the fastest path to production-quality, domain-aware retrieval. The schema is what makes graph-aware search tuned rather than generic.
How this maps to the API
| Goal | Where |
|---|---|
| List schemas and aliases | GET /v1/frequencies, GET /v1/frequencies/{frequency_schema_id} |
| Register a custom domain | POST /v1/holographic/domains |
| List domains | GET /v1/holographic/domains |
| Index memories with graph-aware vectors | enable_holographic, frequency_schema_id on POST /v1/memory and PUT /v1/memory/{memory_id} |
| Search with H-COND / filters | holographic_config on POST /v1/memory/search |
| BYOE base embedding | POST /v1/holographic/transform (and /batch) |
| Rerank only | POST /v1/holographic/rerank |
| Inspect extracted metadata | POST /v1/holographic/metadata |
H-COND (holographic conditional scoring) uses phase alignment between query and candidate metadata so ranking respects your schema dimensions, not only cosine distance. See HolographicConfig in the reference for scoring methods, filters, and tuning.
Do you need graph-aware embeddings?
Follow this decision tree:
1. Is your baseline search "good enough"?
→ Yes: Skip graph-aware. Standard semantic + agentic graph is sufficient.
→ No: Continue ↓
2. Can you describe the quality gap in domain terms?
Examples: "Returns Python when I need JavaScript," "Mixes bug reports with feature requests," "Can't distinguish claims vs counter-evidence"
→ Yes: You have a domain mismatch → Continue ↓
→ No: Your problem might be query quality or data sparsity, not embeddings
3. Does a built-in schema match your domain?
- cosqa → Code search (snippet ↔ natural language)
- scifact → Scientific claims ↔ evidence
- general → Mixed content domains
→ Yes: Start with built-in → Measure → Tune
→ No: Continue ↓
4. Can you define 4-14 structured dimensions for your domain?
Examples: contract_type, jurisdiction, ticket_priority, user_intent, evidence_strength
→ Yes: Register custom schema with Papr (co-design recommended)
→ No: Graph-aware mode requires a schema; revisit when you can articulate dimensions
When to enable
- You need topical, temporal, or domain-specific agreement—not just "semantically nearby."
- You can commit to a schema (built-in or custom) and measure ranking.
- You're willing to accept extra configuration complexity for quality gains.
When to skip
- Simple RAG or small corpora where baseline search is enough.
- You have not chosen a schema; graph-aware mode is schema-driven.
- You haven't measured a concrete relevance gap—add after baseline metrics, not by default.