julee.domain.models.document

Document domain package for the Capture, Extract, Assemble, Publish workflow.

This package contains the Document domain object and its related functionality for the CEAP workflow system.

Document represents complete document entities including content and metadata, providing a stream-like interface for efficient handling of both small and large documents.

Submodules

Classes

Document

Complete document entity including content and metadata.

DocumentStatus

Status of a document through the Capture, Extract, Assemble, Publish

Package Contents

class julee.domain.models.document.Document(/, **data)[source]

Bases: pydantic.BaseModel

Complete document entity including content and metadata.

This is the primary domain model that represents a complete document in the CEAP workflow system. Content is provided as a ContentStream for efficient handling of both small and large documents.

The content stream is excluded from JSON serialization - use separate content endpoints for streaming binary data over HTTP.

classmethod content_multihash_must_not_be_empty(v)[source]
classmethod content_type_must_not_be_empty(v)[source]
classmethod document_id_must_not_be_empty(v)[source]
classmethod filename_must_not_be_empty(v)[source]
validate_content_fields(info)[source]

Ensure document has at least content, or content_bytes.

additional_metadata: dict[str, Any] = None
assembly_types: list[str] = None
content: julee.domain.models.custom_fields.content_stream.ContentStream | None = None
content_bytes: bytes | None = None
content_multihash: str = None
content_type: str
created_at: datetime.datetime | None = None
document_id: str
knowledge_service_id: str | None = None
original_filename: str
size_bytes: int = None
status: DocumentStatus
updated_at: datetime.datetime | None = None
class julee.domain.models.document.DocumentStatus[source]

Bases: str, enum.Enum

Status of a document through the Capture, Extract, Assemble, Publish pipeline.

ASSEMBLED = 'assembled'
ASSEMBLY_SPECIFICATION_IDENTIFIED = 'assembly_specification_identified'
CAPTURED = 'captured'
EXTRACTED = 'extracted'
FAILED = 'failed'
PUBLISHED = 'published'
REGISTERED = 'registered'