Skip to content

🧱 Architecture

Curiositi is a monorepo built with Turborepo and Bun. This document explains the system architecture, data flow, and technical decisions.

flowchart TB
    subgraph Client["Client Layer"]
        Browser["Browser<br/>(React 19)"]
    end

    subgraph App["Application Layer"]
        Platform["Platform<br/>(TanStack Start)"]
        Worker["Worker<br/>(Hono on Bun)"]
        
        Platform -->|"QStash / bunqueue"| Worker
    end

    subgraph Data["Data Layer"]
        Postgres["PostgreSQL<br/>(Drizzle ORM)"]
        S3["S3 Storage<br/>(Files)"]
    end

    Browser --"HTTPS"--> Platform
    Platform --> Postgres
    Platform --> S3
    Worker --> S3
    Worker --> Postgres

Technology Stack:

  • Framework: React 19 + TanStack Start
  • Routing: TanStack Router (file-based)
  • Data Fetching: TanStack Query + tRPC v11
  • API: tRPC for type-safe client-server communication
  • Styling: Tailwind CSS v4 + shadcn/ui
  • Build Tool: Vite 7 with Nitro for SSR
  • Auth: Better Auth (Google OAuth + email/password)
  • Monitoring: Sentry

Key Directories:

apps/platform/src/
β”œβ”€β”€ routes/ # TanStack Router route definitions
β”œβ”€β”€ components/ # React components
β”‚ └── ui/ # shadcn/ui components
β”œβ”€β”€ pages/ # Page-level views
β”œβ”€β”€ layouts/ # Layout components
β”œβ”€β”€ integrations/ # tRPC client and server setup
β”œβ”€β”€ hooks/ # React hooks
β”œβ”€β”€ lib/ # Utilities (auth, upload, etc.)
β”œβ”€β”€ middleware/ # Server middleware
└── env.ts # Environment variable validation

The worker is a Hono server running on Bun that processes uploaded files. It exposes the following endpoints:

EndpointMethodDescription
/GETRoot health check
/healthGETHealth status endpoint
/process-filePOSTFile processing endpoint (invoked by QStash/bunqueue)

The POST /process-file endpoint is invoked by the platform via Upstash QStash or bunqueue (for local development).

Technology Stack:

  • Framework: Hono
  • Runtime: Bun (port 3040)
  • AI: Vercel AI SDK (ai package) with OpenAI and Google providers
  • Queue: Upstash QStash (production) or bunqueue (local development)

Key Directories:

apps/worker/src/
β”œβ”€β”€ processors/ # File type processors
β”‚ β”œβ”€β”€ doc.ts # Document processor (PDF, text, etc.)
β”‚ β”œβ”€β”€ image.ts # Image processor (JPEG, PNG, etc.)
β”‚ └── types.ts # Processor type definitions
β”œβ”€β”€ lib/
β”‚ β”œβ”€β”€ chunk.ts # Text chunking (800 tokens, 100 overlap)
β”‚ └── md.ts # Markdown utilities
β”œβ”€β”€ process-file.ts # Main file processing logic
β”œβ”€β”€ index.ts # Hono server entry point
└── env.ts # Environment variable validation

Technology Stack:

  • Database: PostgreSQL 14+ with pgvector extension
  • ORM: Drizzle ORM
  • Migrations: Drizzle Kit

Key Tables:

Better Auth Tables:
β”œβ”€β”€ user
β”œβ”€β”€ session
β”œβ”€β”€ account
β”œβ”€β”€ organization
β”œβ”€β”€ member
└── verification
Curiositi Tables:
β”œβ”€β”€ spaces # Hierarchical spaces (parentSpaceId for nesting)
β”œβ”€β”€ files # File metadata (name, type, size, S3 key, status)
β”œβ”€β”€ fileContents # Extracted content chunks + vector embeddings (1536d)
└── filesInSpace # Many-to-many junction table
packages/
β”œβ”€β”€ db/ # Database schema and Drizzle config
β”‚ └── src/
β”‚ β”œβ”€β”€ schema.ts # All table definitions
β”‚ β”œβ”€β”€ client.ts # Database client
β”‚ └── index.ts # Exports
β”œβ”€β”€ share/ # Shared utilities
β”‚ └── src/
β”‚ β”œβ”€β”€ ai/ # AI model definitions (OpenAI, Google)
β”‚ β”œβ”€β”€ constants/ # MIME types, file size limits, allowed types
β”‚ β”œβ”€β”€ fs/ # File system helpers
β”‚ β”œβ”€β”€ logger/ # Structured logging utility
β”‚ β”œβ”€β”€ schemas/ # Shared Zod schemas
β”‚ └── types/ # Shared type definitions
β”œβ”€β”€ api-handlers/ # Shared API handler logic
β”‚ └── src/
β”‚ β”œβ”€β”€ upload.ts # File upload handling
β”‚ β”œβ”€β”€ file.ts # File operations
β”‚ β”œβ”€β”€ space.ts # Space operations
β”‚ β”œβ”€β”€ queue.ts # Queue job dispatching
β”‚ └── response.ts # Response helpers
β”œβ”€β”€ queue/ # Job queue wrapper (supports QStash and bunqueue)
└── tsconfig/ # Shared TypeScript configurations
flowchart LR
    subgraph Upload["1. User Upload"]
        Browser["Browser"] -->|"tRPC / api/upload"| Platform
        Platform -->|"File"| S3["S3 Storage"]
    end

    subgraph Store["2. Metadata Storage"]
        S3 -->|"File saved"| Platform
        Platform -->|"Save metadata"| DB[(PostgreSQL)]
        DB -->|"Status: pending"| Platform
        Platform -->|"Dispatch job"| Queue["QStash / bunqueue"]
    end

    subgraph Process["3. Background Processing"]
        Queue -->|"Invoke"| Worker["Worker"]
        Worker -->|"Download"| S3
        S3 -->|"File content"| Worker
        Worker -->|"Extract content"| Worker
        Worker -->|"Chunk text"| Worker
        Worker -->|"Generate embeddings"| Worker
        Worker -->|"Store"| DB
        Worker -->|"Status: completed"| DB
    end

    subgraph Ready["4. Search Ready"]
        DB -.->|"File searchable"| Search["Vector Search"]
    end
flowchart TB
    subgraph Step1["1. Query Received"]
        User["User enters query"] --> Platform
    end

    subgraph Step2["2. Embedding Generation"]
        Platform -->|"Query text"| AI["Embedding Model"]
        AI -->|"Query vector"| Platform
    end

    subgraph Step3["3. Vector Search"]
        Platform -->|"Vector"| DB[(pgvector)]
        DB -->|"Top matching chunks"| Platform
    end

    subgraph Step4["4. Result Aggregation"]
        Platform -->|"Group by file"| Results["Results to user"]
    end
flowchart TB
    Login["1. Login Request<br/>(Email/Password or Google)"] --> Validate["Better Auth validates"]
    Validate --> Session["2. Session Created<br/>(PostgreSQL + Cookie)"]
    Session --> Workspace["3. Workspace Context<br/>(Select workspace)"]
    Workspace --> Authz["4. Authorization<br/>(Check membership)"]
  • Code Sharing β€” Shared packages (db, share, api-handlers, queue) reduce duplication
  • Build Optimization β€” Turborepo caches and parallelizes builds
  • Unified Tooling β€” Single lint (Biome), format, and type-check across the repo
  • Dependency Management β€” Workspace protocol for internal dependencies
  • File-based Routing β€” Automatic route generation via TanStack Router
  • Type Safety β€” End-to-end TypeScript with tRPC integration
  • SSR Support β€” Built-in server-side rendering via Vite + Nitro
  • Developer Experience β€” Hot reload and excellent dev tools
  • Performance β€” Lightweight and fast, runs natively on Bun
  • Simplicity β€” Single endpoint, minimal framework overhead
  • TypeScript Native β€” Full type safety out of the box
  • Type Safety β€” Full TypeScript support with inferred types
  • SQL-like Syntax β€” Familiar query building, close to raw SQL
  • Migration Support β€” Drizzle Kit for schema generation and migrations
  • Performance β€” Minimal overhead compared to heavier ORMs
  • Native Integration β€” Vector storage directly in PostgreSQL
  • Similarity Search β€” Efficient nearest neighbor queries
  • No Additional Services β€” No need for a separate vector database
  • ACID Compliance β€” Transactional safety for embeddings alongside relational data
  • Structured Logging β€” Custom logger in packages/share/src/logger/
  • Log Levels β€” Debug, Info, Warn, Error
  • Sentry β€” Integrated into the platform for error tracking and performance monitoring
flowchart TB
    subgraph ci["ci.yml (push/PR to main)"]
        Format["Format check<br/>(Biome)"]
        Lint["Lint + Type Check<br/>(bun run check)"]
        Commit["Commit message<br/>(commitlint)"]
    end

    subgraph test["test.yml (push/PR to main)"]
        Test["Run tests<br/>(bun run test:coverage)"]
        Codecov["Upload coverage<br/>(Codecov)"]
    end

    subgraph release["release.yml (weekly, Friday 00:00 UTC)"]
        Changeset["Generate changesets"]
        Version["Version packages"]
        GitHub["Create releases/tags"]
    end
  • Platform β€” Deploys to Vercel (TanStack Start + Nitro)
  • Documentation (www) β€” Deploys to Vercel (Starlight + Astro)
  • Worker β€” Runs on Bun
Terminal window
# Start all services
bun run dev
# Services available:
# - Platform: http://localhost:3030
# - Worker: http://localhost:3040

The project uses Bun’s built-in test runner and Playwright for E2E tests:

Terminal window
# Run tests with coverage
bun run test
# Run with lcov coverage output
bun run test:coverage