Last updated

Content Ingestion

This guide explains how to ingest different types of content into Papr Memory, including text, documents, and code snippets.

Overview

Papr Memory supports various content types to build a comprehensive memory system:

  • Text-based memories (notes, conversations, json, etc.)
  • Documents (PDF, HTML, TXT)
  • Code snippets (with language detection)

Memory Types

Papr Memory supports the following memory types:

  • text - Plain text content like notes, conversations, or meeting summaries (also supports JSON content)
  • code_snippet - Programming code with language detection
  • document - Document files such as PDF, HTML, or TXT

Text Memory Ingestion

The most basic form of memory is text. You can add text memories using the /v1/memory endpoint.

curl -X POST https://memory.papr.ai/v1/memory \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Client-Type: curl" \
  -d '{
    "content": "The product team discussed the new feature roadmap for Q3, focusing on user analytics and performance improvements.",
    "type": "text",
    "metadata": {
      "topics": "meeting, product, roadmap",
      "hierarchical_structures": "Company/Product/Roadmap",
      "createdAt": "2024-04-15",
      "sourceUrl": "https://meetings.example.com/123",
      "conversationId": "conv-123",
      "custom_field": "You can add any custom fields here"
    }
  }'

Document Ingestion

For larger content, you can upload documents such as PDFs, HTML files, or text files. Papr Memory will automatically process these documents and break them into appropriate memory chunks.

curl -X POST https://memory.papr.ai/v1/document \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -H "X-Client-Type: curl" \
  -F "file=@/path/to/document.pdf" \
  -F "metadata={\"topics\":\"report, financial\",\"hierarchical_structures\":\"Finance/Reports/Q2\",\"department\":\"Finance\"}"

Document Processing Status

After uploading a document, you can check its processing status:

curl -X GET https://memory.papr.ai/v1/document/status/upload_abc123 \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "X-Client-Type: curl"

Code Snippet Memory

Capture code snippets with language detection:

curl -X POST https://memory.papr.ai/v1/memory \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Client-Type: curl" \
  -d '{
    "content": "def calculate_total(items):\n    return sum(item.price for item in items)",
    "type": "code_snippet",
    "metadata": {
      "language": "python",
      "topics": "code, pricing, utility",
      "hierarchical_structures": "Code/Python/Utils",
      "author": "Jane Smith",
      "project": "Billing System"
    }
  }'

Searching Memories

Papr Memory combines vector and graph search automatically to provide the most relevant results. You can control how many memories and graph nodes are returned.

curl -X POST https://memory.papr.ai/v1/memory/search \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept-Encoding: gzip" \
  -H "X-Client-Type: curl" \
  -d '{
    "query": "What are the key points from our recent product planning?"
  }'

Content Size Limitations

When working with Papr Memory, it's important to be aware of content size limitations:

Memory Content Size Limits

  • Text and Code Snippets: The default maximum size for individual memory content is 15,000 bytes (approximately 15KB).
  • Exceeding this limit will result in a 413 Payload Too Large error.
  • If you need to store larger text, consider breaking it into smaller, logical chunks or using the document upload endpoint.

Document Upload Size Limits

  • Document uploads: The maximum file size for document uploads depends on the file type:
    • PDF files: Up to 10MB
    • HTML files: Up to 5MB
    • TXT files: Up to 5MB
  • Large documents will be automatically chunked during processing to optimize for search quality.
  • For extremely large files, you may need to split them before uploading.

Error Handling for Size Limits

If you attempt to upload content that exceeds the size limits, you'll receive an error response:

{
  "code": 413,
  "status": "error",
  "error": "Content size (16000 bytes) exceeds maximum limit of 15000 bytes.",
  "details": {
    "max_content_length": 15000
  }
}

For batch uploads, individual items that exceed the size limit will be reported in the errors array of the response.

Metadata Structure

Papr Memory allows flexible metadata to help organize and retrieve your memories effectively:

Standard Metadata Fields

  • topics: String of topic labels to categorize the memory (comma-separated)
  • hierarchical_structures: String representing hierarchical categorization (e.g., "Department/Team/Project")
  • createdAt: When the content was created or relevant
  • sourceUrl: Link to the original source
  • conversationId: ID of the conversation the memory belongs to

Custom Metadata Fields

You can add any custom fields to the metadata object to meet your specific needs:

"metadata": {
  "topics": "meeting, product",
  "hierarchical_structures": "Company/Product/Roadmap",
  "createdAt": "2024-04-15T10:00:00Z",
  "emoji tags": "📊,💡,📝",
  "emotion tags": "focused, productive",
  "department": "Engineering",
  "project_id": "PRJ-123",
  "customer_id": "CUST-456",
  "is_confidential": true,
  "related_ticket": "TICKET-789",
  "any_custom_field": "You can add any custom fields"
}

Best Practices

  1. Add rich metadata to your memories to improve search and organization.

  2. Use topics and hierarchical structures to create a consistent knowledge organization system.

  3. Use batch processing for large volumes of memories to reduce API calls.

  4. Handle document uploads asynchronously and poll for status rather than waiting for completion.

  5. Consider content size limits - text memories have a 15,000 byte limit by default.

  6. Include context when available to enhance the semantic understanding of your memories.

  7. Add relationships between memories using the relationships_json field to build more connected knowledge.

Troubleshooting

IssueSolution
413 Payload Too LargeBreak content into smaller chunks
Slow document processingCheck document status endpoint for progress
Missing metadata after retrievalEnsure metadata fields use supported formats
Low-quality embeddingsProvide more context or related information

Next Steps