julee.repositories.minio.document¶
Minio implementation of DocumentRepository.
This module provides a Minio-based implementation of the DocumentRepository protocol that follows the Clean Architecture patterns defined in the Fun-Police Framework. It handles document storage with both metadata and content streams, ensuring idempotency and proper error handling.
The implementation separates document metadata (stored as JSON) from content (stored as content-addressable binary objects) in Minio, following the large payload handling pattern from the architectural guidelines.
Classes¶
Minio implementation of DocumentRepository using Minio for persistence. |
|
Simple wrapper for raw document metadata JSON. |
Module Contents¶
- class julee.repositories.minio.document.MinioDocumentRepository(client)[source]¶
Bases:
julee.domain.repositories.document.DocumentRepository,julee.repositories.minio.client.MinioRepositoryMixinMinio implementation of DocumentRepository using Minio for persistence.
This implementation stores document metadata and content separately: - Metadata: JSON objects in the “documents” bucket - Content: Binary objects in the “documents-content” bucket
This separation allows for efficient metadata queries while supporting large content files without hitting Temporal’s 2MB payload limits.
- async get_many(document_ids)[source]¶
Retrieve multiple documents by ID using batch operations.
- Parameters:
document_ids (list[str]) – List of unique document identifiers
- Returns:
Dict mapping document_id to Document (or None if not found)
- Return type:
dict[str, julee.domain.models.document.Document | None]
Note
This implementation optimizes by batch-fetching metadata first, then batch-fetching unique content streams, then splicing them together.
- async list_all()[source]¶
List all documents.
- Returns:
List of all documents, sorted by document_id
- Return type:
- async save(document)[source]¶
Save a document with its content and metadata.
If the document has content_string, it will be converted to a ContentStream and stored. The content_string field should only be used for small content (few KB) when saving from workflows/use-cases. Call-sites in activities should always use the content stream.