Content Ingestion
This guide explains how to ingest different types of content into Papr Memory, including text, documents, and code snippets.
Overview
Papr Memory supports various content types to build a comprehensive memory system:
- Text-based memories (notes, conversations, json, etc.)
- Documents (PDF, HTML, TXT)
- Code snippets (with language detection)
Memory Types
Papr Memory supports the following memory types:
text
- Plain text content like notes, conversations, or meeting summaries (also supports JSON content)code_snippet
- Programming code with language detectiondocument
- Document files such as PDF, HTML, or TXT
Text Memory Ingestion
The most basic form of memory is text. You can add text memories using the /v1/memory
endpoint.
curl -X POST https://memory.papr.ai/v1/memory \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Client-Type: curl" \
-d '{
"content": "The product team discussed the new feature roadmap for Q3, focusing on user analytics and performance improvements.",
"type": "text",
"metadata": {
"topics": "meeting, product, roadmap",
"hierarchical_structures": "Company/Product/Roadmap",
"createdAt": "2024-04-15",
"sourceUrl": "https://meetings.example.com/123",
"conversationId": "conv-123",
"custom_field": "You can add any custom fields here"
}
}'
Document Ingestion
For larger content, you can upload documents such as PDFs, HTML files, or text files. Papr Memory will automatically process these documents and break them into appropriate memory chunks.
curl -X POST https://memory.papr.ai/v1/document \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: multipart/form-data" \
-H "X-Client-Type: curl" \
-F "file=@/path/to/document.pdf" \
-F "metadata={\"topics\":\"report, financial\",\"hierarchical_structures\":\"Finance/Reports/Q2\",\"department\":\"Finance\"}"
Document Processing Status
After uploading a document, you can check its processing status:
curl -X GET https://memory.papr.ai/v1/document/status/upload_abc123 \
-H "X-API-Key: YOUR_API_KEY" \
-H "X-Client-Type: curl"
Code Snippet Memory
Capture code snippets with language detection:
curl -X POST https://memory.papr.ai/v1/memory \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "X-Client-Type: curl" \
-d '{
"content": "def calculate_total(items):\n return sum(item.price for item in items)",
"type": "code_snippet",
"metadata": {
"language": "python",
"topics": "code, pricing, utility",
"hierarchical_structures": "Code/Python/Utils",
"author": "Jane Smith",
"project": "Billing System"
}
}'
Searching Memories
Papr Memory combines vector and graph search automatically to provide the most relevant results. You can control how many memories and graph nodes are returned.
curl -X POST https://memory.papr.ai/v1/memory/search \
-H "X-API-Key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept-Encoding: gzip" \
-H "X-Client-Type: curl" \
-d '{
"query": "What are the key points from our recent product planning?"
}'
Content Size Limitations
When working with Papr Memory, it's important to be aware of content size limitations:
Memory Content Size Limits
- Text and Code Snippets: The default maximum size for individual memory content is 15,000 bytes (approximately 15KB).
- Exceeding this limit will result in a
413 Payload Too Large
error. - If you need to store larger text, consider breaking it into smaller, logical chunks or using the document upload endpoint.
Document Upload Size Limits
- Document uploads: The maximum file size for document uploads depends on the file type:
- PDF files: Up to 10MB
- HTML files: Up to 5MB
- TXT files: Up to 5MB
- Large documents will be automatically chunked during processing to optimize for search quality.
- For extremely large files, you may need to split them before uploading.
Error Handling for Size Limits
If you attempt to upload content that exceeds the size limits, you'll receive an error response:
{
"code": 413,
"status": "error",
"error": "Content size (16000 bytes) exceeds maximum limit of 15000 bytes.",
"details": {
"max_content_length": 15000
}
}
For batch uploads, individual items that exceed the size limit will be reported in the errors array of the response.
Metadata Structure
Papr Memory allows flexible metadata to help organize and retrieve your memories effectively:
Standard Metadata Fields
topics
: String of topic labels to categorize the memory (comma-separated)hierarchical_structures
: String representing hierarchical categorization (e.g., "Department/Team/Project")createdAt
: When the content was created or relevantsourceUrl
: Link to the original sourceconversationId
: ID of the conversation the memory belongs to
Custom Metadata Fields
You can add any custom fields to the metadata object to meet your specific needs:
"metadata": {
"topics": "meeting, product",
"hierarchical_structures": "Company/Product/Roadmap",
"createdAt": "2024-04-15T10:00:00Z",
"emoji tags": "📊,💡,📝",
"emotion tags": "focused, productive",
"department": "Engineering",
"project_id": "PRJ-123",
"customer_id": "CUST-456",
"is_confidential": true,
"related_ticket": "TICKET-789",
"any_custom_field": "You can add any custom fields"
}
Best Practices
Add rich metadata to your memories to improve search and organization.
Use topics and hierarchical structures to create a consistent knowledge organization system.
Use batch processing for large volumes of memories to reduce API calls.
Handle document uploads asynchronously and poll for status rather than waiting for completion.
Consider content size limits - text memories have a 15,000 byte limit by default.
Include context when available to enhance the semantic understanding of your memories.
Add relationships between memories using the
relationships_json
field to build more connected knowledge.
Troubleshooting
Issue | Solution |
---|---|
413 Payload Too Large | Break content into smaller chunks |
Slow document processing | Check document status endpoint for progress |
Missing metadata after retrieval | Ensure metadata fields use supported formats |
Low-quality embeddings | Provide more context or related information |
Next Steps
- Learn about Search Tuning
- Explore Batch Writes and Idempotency
- See the API Reference for detailed endpoint information