Skip to content

đŸ§± Core Concepts

Understanding Curiositi’s core concepts will help you make the most of the platform.

Workspaces are the top-level containers in Curiositi. They represent teams or companies.

  • Multi-tenancy — Each workspace is completely isolated
  • Member Management — Invite users with different roles (owner, admin, member)
  • Session Scoping — Users select an active workspace, and all queries are scoped to it
mindmap
  root((Workspace))
    Members
      Owner
      Admin
      Members
    Spaces
      Marketing
      Engineering
      Finance
    Files
      document.pdf
      report.csv
      photo.png

Workspaces are stored as organization records in the database (via Better Auth):

FieldTypeDescription
idtextPrimary key
nametextWorkspace name
slugtextURL-friendly identifier (unique)
logotextOptional logo URL
metadatatextOptional metadata
createdAttimestampCreation time

Spaces are Curiositi’s way of organizing content. They work like folders with hierarchical nesting.

Spaces can be nested using the parentSpaceId field:

mindmap
  root((Marketing))
    Campaigns
      Q1 2025
      Q2 2025
    Brand Assets
FieldTypeDescription
idUUIDPrimary key (auto-generated)
nametextDisplay name
descriptiontextOptional description
icontextOptional icon (e.g., emoji)
organizationIdtextOwning workspace
parentSpaceIdUUID / nullParent space reference for nesting
createdAttimestampCreation time
updatedAttimestampLast modification time
FeatureTraditional FoldersCuriositi Spaces
NestingLimited depthUnlimited hierarchy
SearchFilename onlySemantic search across all content
File locationFiles in one folderFiles can be in multiple spaces
Organization scopePer-userPer-workspace

Files are the core content in Curiositi. Each file goes through a processing pipeline from upload to searchable content.

stateDiagram-v2
    [*] --> Upload
    Upload --> Pending
    Pending --> Processing
    Processing --> Completed
    Processing --> Failed
    Completed --> [*]
    Failed --> [*]
StatusDescription
pendingFile uploaded, waiting for worker to process
processingWorker is extracting content and generating embeddings
completedFile is fully processed and searchable
failedProcessing encountered an error
FieldTypeDescription
idUUIDPrimary key (auto-generated)
nametextOriginal filename
pathtextS3 storage path
sizeintegerFile size in bytes
typetextMIME type
organizationIdtextOwning workspace
uploadedByIdtextUser who uploaded the file
statusenumpending, processing, completed, failed
tagsjsonbOptional tags (default: { tags: [] })
processedAttimestampWhen processing completed
createdAttimestampUpload time
updatedAttimestampLast modification time

When a file is uploaded:

  1. Upload — File streams to S3 storage
  2. Metadata — File record created in database with status pending
  3. Queue — Processing job dispatched via Upstash QStash
  4. Content Extraction — Worker extracts text (documents) or generates descriptions (images)
  5. Chunking — Content split into chunks (800 tokens, 100 token overlap)
  6. Embedding — Each chunk converted to a 1536-dimension vector
  7. Storage — Chunks and embeddings saved to fileContents table
  8. Complete — File status updated to completed

Files are broken into chunks for precise semantic search.

  • Precision — Find the exact relevant section, not just the file
  • Context — Overlapping chunks preserve context across boundaries
  • Token Limits — Fits within embedding model constraints
  • Performance — Smaller vectors enable faster similarity search
FieldTypeDescription
idUUIDPrimary key (auto-generated)
fileIdUUIDReference to parent file
contenttextThe text content of the chunk
embeddedContentvector(1536)Vector embedding for similarity search
metadatajsonOptional metadata about the chunk
createdAttimestampCreation time
updatedAttimestampLast modification time
  • Chunk size: 800 tokens
  • Overlap: 100 tokens

Files can exist in multiple spaces simultaneously using the filesInSpace junction table.

erDiagram
    Files ||--o{ filesInSpace : "many-to-many"
    Spaces ||--o{ filesInSpace : "many-to-many"

    Files {
        uuid id
        string name
        string path
    }

    Spaces {
        uuid id
        string name
        uuid parentSpaceId
    }

    filesInSpace {
        uuid id
        uuid fileId
        uuid spaceId
    }
FieldTypeDescription
idUUIDPrimary key
fileIdUUIDReference to file
spaceIdUUIDReference to space
createdAttimestampWhen the link was created
updatedAttimestampLast modification time
  • No file duplication in storage
  • Single source of truth for file content and embeddings
  • Flexible organization — add a file to any number of spaces
  • Easy reorganization without moving data

The heart of Curiositi is semantic search — finding files by meaning, not just keywords.

  1. Query Embedding — Your search text is converted to a 1536-dimension vector
  2. Similarity Search — pgvector finds the closest matching content chunks using cosine similarity
  3. Ranking — Results ranked by similarity score
  4. Aggregation — Matching chunks grouped by source file
  5. Response — Files returned with relevance scores

Curiositi uses 1536-dimension embeddings:

mindmap
  root((Embeddings))
    "Quarterly sales report"
      0.023
      -0.156
      0.892
      ...
    "Q4 revenue summary"
      0.019
      -0.142
      0.887
      ...

Curiositi uses Better Auth for authentication.

  • Email/Password — Standard credential-based login
  • Google OAuth — Sign in with Google

Sessions are stored in PostgreSQL. Each session tracks the user’s active workspace (via activeOrganizationId), which scopes all subsequent queries.

RoleCapabilities
OwnerFull control, member management
AdminCreate spaces, upload files, manage content
MemberUpload files, search, read access

How data moves through Curiositi:

flowchart TB
    User["User<br/>(Browser)"] -->|"Upload"| Platform["Platform<br/>(TanStack Start)"]
    Platform -->|"Enqueue job"| Queue["Queue<br/>(QStash/bunqueue)"]
    Platform -->|"Store metadata"| DB[(PostgreSQL<br/>+ pgvector)]
    Queue -->|"Process"| Worker["Worker<br/>(Hono)"]
    Worker -->|"Download/Upload"| S3["S3 Storage"]
    Worker -->|"Store embeddings"| DB
    User -->|"Query"| Platform
    Platform -->|"Search"| DB
    DB -->|"Results"| Platform