Skip to content

🗃️ Uploading Files

Adding content to Curiositi is straightforward. Upload files through the web interface, and the worker processes them in the background to make them searchable.

FormatMIME TypeNotes
PDFapplication/pdfFull text extraction
Plain Texttext/plainDirect content indexing
Markdowntext/markdownRendered text extracted
CSVtext/csvTabular data as text
HTMLtext/htmlRendered text extracted
XMLtext/xml, application/xmlRaw content extracted
JSONapplication/jsonRaw content extracted
FormatMIME TypeNotes
JPEGimage/jpegAI-generated description via vision model
PNGimage/pngAI-generated description via vision model
WebPimage/webpAI-generated description via vision model
GIFimage/gifAI-generated description via vision model

Maximum file size: 50 MB

Images larger than 5 MB are considered “large” and handled accordingly during processing.

  1. Navigate to a space or the main files view
  2. Use the upload interface to select files
  3. Files upload to S3 storage and metadata is saved to the database
  4. A processing job is dispatched via Upstash QStash
1. Upload
├─ File uploaded to S3 storage
├─ File metadata saved to PostgreSQL (status: "pending")
└─ Processing job dispatched via QStash
2. Worker Processing (POST /process-file)
├─ Worker downloads file from S3
├─ Content extracted based on file type:
│ ├─ Documents: text extraction
│ └─ Images: AI vision model generates description
├─ Text chunked (800 tokens per chunk, 100 token overlap)
├─ Vector embeddings generated (1536 dimensions)
├─ Chunks + embeddings stored in fileContents table
└─ File status updated to "completed"
3. Search Ready
└─ File content is now searchable via semantic search

Each file has a status that tracks its processing state:

StatusMeaning
pendingFile uploaded, waiting for worker to process
processingWorker is currently extracting and embedding content
completedProcessing finished, file is searchable
failedProcessing encountered an error

All file operations use tRPC (not REST endpoints). The available procedures are:

// Get all files in the current workspace
const files = trpc.file.getAllInOrg.useQuery({
limit: 50,
offset: 0,
});
// Get recent files
const recent = trpc.file.getRecent.useQuery({ limit: 10 });
// Get files not assigned to any space
const orphans = trpc.file.getOrphanFiles.useQuery();
const file = trpc.file.getById.useQuery({ fileId: "file-uuid" });
const { url } = trpc.file.getPresignedUrl.useQuery({ fileId: "file-uuid" });
const deleteMutation = trpc.file.delete.useMutation();
await deleteMutation.mutateAsync({ fileId: "file-uuid" });
// Deletes from database and S3 storage
const processMutation = trpc.file.process.useMutation();
await processMutation.mutateAsync({ fileId: "file-uuid" });
// Re-enqueues the file for processing via QStash
// Hybrid search (filename + semantic)
const results = trpc.file.search.useQuery({
query: "report",
limit: 20,
});

This combines traditional filename matching with semantic search for broader coverage.

The upload endpoint is a standard HTTP POST (not tRPC) at /api/upload. It handles multipart form data and stores the file in S3.

  1. Check the worker is running: bun --filter @curiositi/worker dev
  2. Verify QSTASH_TOKEN and QSTASH_URL are set correctly in .env
  3. Verify WORKER_URL points to your worker instance (default: http://localhost:3040)
  4. Check worker logs for errors
  • File may be corrupted or in an unsupported format
  • Check the worker logs for specific error messages
  • Try re-processing with the file.process tRPC mutation
  • Verify the file type is in the supported MIME types list
  • Check the file is under 50 MB