π§± Architecture
Curiositi is a monorepo built with Turborepo and Bun. This document explains the system architecture, data flow, and technical decisions.
High-Level Architecture
Section titled βHigh-Level Architectureβflowchart TB
subgraph Client["Client Layer"]
Browser["Browser<br/>(React 19)"]
end
subgraph App["Application Layer"]
Platform["Platform<br/>(TanStack Start)"]
Worker["Worker<br/>(Hono on Bun)"]
Platform -->|"QStash / bunqueue"| Worker
end
subgraph Data["Data Layer"]
Postgres["PostgreSQL<br/>(Drizzle ORM)"]
S3["S3 Storage<br/>(Files)"]
end
Browser --"HTTPS"--> Platform
Platform --> Postgres
Platform --> S3
Worker --> S3
Worker --> Postgres
Component Overview
Section titled βComponent OverviewβPlatform (Web Application)
Section titled βPlatform (Web Application)βTechnology Stack:
- Framework: React 19 + TanStack Start
- Routing: TanStack Router (file-based)
- Data Fetching: TanStack Query + tRPC v11
- API: tRPC for type-safe client-server communication
- Styling: Tailwind CSS v4 + shadcn/ui
- Build Tool: Vite 7 with Nitro for SSR
- Auth: Better Auth (Google OAuth + email/password)
- Monitoring: Sentry
Key Directories:
apps/platform/src/βββ routes/ # TanStack Router route definitionsβββ components/ # React componentsβ βββ ui/ # shadcn/ui componentsβββ pages/ # Page-level viewsβββ layouts/ # Layout componentsβββ integrations/ # tRPC client and server setupβββ hooks/ # React hooksβββ lib/ # Utilities (auth, upload, etc.)βββ middleware/ # Server middlewareβββ env.ts # Environment variable validationWorker (File Processing)
Section titled βWorker (File Processing)βThe worker is a Hono server running on Bun that processes uploaded files. It exposes the following endpoints:
| Endpoint | Method | Description |
|---|---|---|
/ | GET | Root health check |
/health | GET | Health status endpoint |
/process-file | POST | File processing endpoint (invoked by QStash/bunqueue) |
The POST /process-file endpoint is invoked by the platform via Upstash QStash or bunqueue (for local development).
Technology Stack:
- Framework: Hono
- Runtime: Bun (port 3040)
- AI: Vercel AI SDK (
aipackage) with OpenAI and Google providers - Queue: Upstash QStash (production) or bunqueue (local development)
Key Directories:
apps/worker/src/βββ processors/ # File type processorsβ βββ doc.ts # Document processor (PDF, text, etc.)β βββ image.ts # Image processor (JPEG, PNG, etc.)β βββ types.ts # Processor type definitionsβββ lib/β βββ chunk.ts # Text chunking (800 tokens, 100 overlap)β βββ md.ts # Markdown utilitiesβββ process-file.ts # Main file processing logicβββ index.ts # Hono server entry pointβββ env.ts # Environment variable validationDatabase (PostgreSQL + pgvector)
Section titled βDatabase (PostgreSQL + pgvector)βTechnology Stack:
- Database: PostgreSQL 14+ with pgvector extension
- ORM: Drizzle ORM
- Migrations: Drizzle Kit
Key Tables:
Better Auth Tables:βββ userβββ sessionβββ accountβββ organizationβββ memberβββ verification
Curiositi Tables:βββ spaces # Hierarchical spaces (parentSpaceId for nesting)βββ files # File metadata (name, type, size, S3 key, status)βββ fileContents # Extracted content chunks + vector embeddings (1536d)βββ filesInSpace # Many-to-many junction tableShared Packages
Section titled βShared Packagesβpackages/βββ db/ # Database schema and Drizzle configβ βββ src/β βββ schema.ts # All table definitionsβ βββ client.ts # Database clientβ βββ index.ts # Exportsβββ share/ # Shared utilitiesβ βββ src/β βββ ai/ # AI model definitions (OpenAI, Google)β βββ constants/ # MIME types, file size limits, allowed typesβ βββ fs/ # File system helpersβ βββ logger/ # Structured logging utilityβ βββ schemas/ # Shared Zod schemasβ βββ types/ # Shared type definitionsβββ api-handlers/ # Shared API handler logicβ βββ src/β βββ upload.ts # File upload handlingβ βββ file.ts # File operationsβ βββ space.ts # Space operationsβ βββ queue.ts # Queue job dispatchingβ βββ response.ts # Response helpersβββ queue/ # Job queue wrapper (supports QStash and bunqueue)βββ tsconfig/ # Shared TypeScript configurationsData Flow
Section titled βData FlowβFile Upload Flow
Section titled βFile Upload Flowβflowchart LR
subgraph Upload["1. User Upload"]
Browser["Browser"] -->|"tRPC / api/upload"| Platform
Platform -->|"File"| S3["S3 Storage"]
end
subgraph Store["2. Metadata Storage"]
S3 -->|"File saved"| Platform
Platform -->|"Save metadata"| DB[(PostgreSQL)]
DB -->|"Status: pending"| Platform
Platform -->|"Dispatch job"| Queue["QStash / bunqueue"]
end
subgraph Process["3. Background Processing"]
Queue -->|"Invoke"| Worker["Worker"]
Worker -->|"Download"| S3
S3 -->|"File content"| Worker
Worker -->|"Extract content"| Worker
Worker -->|"Chunk text"| Worker
Worker -->|"Generate embeddings"| Worker
Worker -->|"Store"| DB
Worker -->|"Status: completed"| DB
end
subgraph Ready["4. Search Ready"]
DB -.->|"File searchable"| Search["Vector Search"]
end
Search Flow
Section titled βSearch Flowβflowchart TB
subgraph Step1["1. Query Received"]
User["User enters query"] --> Platform
end
subgraph Step2["2. Embedding Generation"]
Platform -->|"Query text"| AI["Embedding Model"]
AI -->|"Query vector"| Platform
end
subgraph Step3["3. Vector Search"]
Platform -->|"Vector"| DB[(pgvector)]
DB -->|"Top matching chunks"| Platform
end
subgraph Step4["4. Result Aggregation"]
Platform -->|"Group by file"| Results["Results to user"]
end
Authentication Flow
Section titled βAuthentication Flowβflowchart TB
Login["1. Login Request<br/>(Email/Password or Google)"] --> Validate["Better Auth validates"]
Validate --> Session["2. Session Created<br/>(PostgreSQL + Cookie)"]
Session --> Workspace["3. Workspace Context<br/>(Select workspace)"]
Workspace --> Authz["4. Authorization<br/>(Check membership)"]
Technical Decisions
Section titled βTechnical DecisionsβWhy Turborepo?
Section titled βWhy Turborepo?β- Code Sharing β Shared packages (db, share, api-handlers, queue) reduce duplication
- Build Optimization β Turborepo caches and parallelizes builds
- Unified Tooling β Single lint (Biome), format, and type-check across the repo
- Dependency Management β Workspace protocol for internal dependencies
Why TanStack Start?
Section titled βWhy TanStack Start?β- File-based Routing β Automatic route generation via TanStack Router
- Type Safety β End-to-end TypeScript with tRPC integration
- SSR Support β Built-in server-side rendering via Vite + Nitro
- Developer Experience β Hot reload and excellent dev tools
Why Hono for Worker?
Section titled βWhy Hono for Worker?β- Performance β Lightweight and fast, runs natively on Bun
- Simplicity β Single endpoint, minimal framework overhead
- TypeScript Native β Full type safety out of the box
Why Drizzle ORM?
Section titled βWhy Drizzle ORM?β- Type Safety β Full TypeScript support with inferred types
- SQL-like Syntax β Familiar query building, close to raw SQL
- Migration Support β Drizzle Kit for schema generation and migrations
- Performance β Minimal overhead compared to heavier ORMs
Why pgvector?
Section titled βWhy pgvector?β- Native Integration β Vector storage directly in PostgreSQL
- Similarity Search β Efficient nearest neighbor queries
- No Additional Services β No need for a separate vector database
- ACID Compliance β Transactional safety for embeddings alongside relational data
Monitoring and Observability
Section titled βMonitoring and ObservabilityβLogging
Section titled βLoggingβ- Structured Logging β Custom logger in
packages/share/src/logger/ - Log Levels β Debug, Info, Warn, Error
Error Tracking
Section titled βError Trackingβ- Sentry β Integrated into the platform for error tracking and performance monitoring
GitHub Actions Workflows
Section titled βGitHub Actions Workflowsβflowchart TB
subgraph ci["ci.yml (push/PR to main)"]
Format["Format check<br/>(Biome)"]
Lint["Lint + Type Check<br/>(bun run check)"]
Commit["Commit message<br/>(commitlint)"]
end
subgraph test["test.yml (push/PR to main)"]
Test["Run tests<br/>(bun run test:coverage)"]
Codecov["Upload coverage<br/>(Codecov)"]
end
subgraph release["release.yml (weekly, Friday 00:00 UTC)"]
Changeset["Generate changesets"]
Version["Version packages"]
GitHub["Create releases/tags"]
end
Deployment
Section titled βDeploymentβ- Platform β Deploys to Vercel (TanStack Start + Nitro)
- Documentation (www) β Deploys to Vercel (Starlight + Astro)
- Worker β Runs on Bun
Development
Section titled βDevelopmentβLocal Development
Section titled βLocal Developmentβ# Start all servicesbun run dev
# Services available:# - Platform: http://localhost:3030# - Worker: http://localhost:3040Testing
Section titled βTestingβThe project uses Bunβs built-in test runner and Playwright for E2E tests:
# Run tests with coveragebun run test
# Run with lcov coverage outputbun run test:coverageNext Steps
Section titled βNext Stepsβ- Configuration β Environment variables and settings
- Core Concepts β Understand the data model
- Getting Started β Run Curiositi locally