Skip to main content

File Storage — Overview

The File Storage service is the shared binary storage layer for all modules. Modules never write directly to S3 — they call kernel.files() and the service handles upload, virus scanning, image processing, access control, and lifecycle management.


Technical Stack

ComponentTechnologyRole
Go runtimeaws-sdk-go-v2All S3 operations
Object storageSeaweedFS (self-hosted) or AWS S3Binary storage backend
Upload strategyPresigned URLsDirect client-to-S3 upload, zero gateway bandwidth cost
AV scanningTwo-bucket staging pipelineEvery upload scanned before becoming accessible
Image processingbimg (libvips, Go binding)Crop, resize, convert, compress (used by Netflix, AWS)
Metadata storePostgreSQLFile records, status, ownership, soft-delete state

Staging + AV Scan Pipeline

Every file goes through a two-bucket pipeline before it is accessible. The staging bucket is never exposed publicly.

Client uploads:
Presigned POST → S3 staging bucket: {tenantId}/staging/{fileId}
Status: pending_scan

S3 Event Notification → File Service → AV scanner
Clean: move to main bucket → {tenantId}/{bucket}/{fileId}
Status: available
Infected: delete from staging
Status: rejected
Notify tenant admin

A file with status pending_scan is not accessible via GET /files/:id. Any attempt returns 404 Not Found until the AV scan completes and the file moves to available.

Staging Overflow Protection

ParameterValueEnv variable
Staging file TTL24 hoursFILES_STAGING_TTL_HOURS=24
Max pending files per tenant100FILES_STAGING_MAX_PER_TENANT=100
TTL exceededFile auto-rejected + uploader notified
Max pending exceeded503 Service Unavailable
AV scan queue backlog > 1 000Priority inversion: newest files first

Bucket Layout

All files for all tenants reside in a single S3 bucket, partitioned by {tenantId}/{bucket}/:

Logical bucketS3 path prefixContents
avatars{tenantId}/avatars/User and org avatars
assets{tenantId}/assets/PWA assets, icons, branding
documents{tenantId}/documents/Reports, contracts, exports
exports{tenantId}/exports/Data export archives
modules{tenantId}/modules/Module-specific file attachments

One S3 bucket per deployment, not per tenant. This keeps S3 bucket counts manageable (AWS S3: 100-bucket soft limit without quota request) while maintaining strict tenant isolation via prefix and JWT-level access control.


Tenant Isolation

File access is enforced at two levels:

  1. S3 key prefix — every file path is prefixed with {tenantId}/. Files from different tenants physically reside under different prefixes.
  2. JWT check — the File Service verifies that the tenantId from the JWT matches the tenantId in the file record before serving any request. Cross-tenant file access is structurally impossible.

REST API

MethodEndpointDescription
POST/api/v1/files/uploadMultipart upload (server-proxied)
POST/api/v1/files/presignGet presigned upload URL for direct S3 upload
GET/api/v1/files/:idDownload file (binary stream)
GET/api/v1/files/:id/urlGet presigned download URL
GET/api/v1/filesList files (paginated, filterable)
DELETE/api/v1/files/:idSoft delete (moves to trash, 30-day restore window)
DELETE/api/v1/files/:id/permanentImmediate permanent delete (irreversible)
POST/api/v1/files/:id/restoreRestore from trash (within 30 days)
POST/api/v1/files/:id/processProcess image (resize, crop, convert)
GET/api/v1/files/:id/thumbnail/:presetGet thumbnail by preset name

File Status Lifecycle

┌─────────────┐
│ pending_scan │ (in staging bucket, not accessible)
└──────┬──────┘
AV scan │
┌───────────┴───────────┐
│ clean │ infected
▼ ▼
┌───────────┐ ┌─────────────┐
│ available │ │ rejected │
└─────┬─────┘ └─────────────┘
│ DELETE /files/:id

┌─────────────┐
│ deleted │ (soft delete, in trash, restorable 30 days)
└─────┬───────┘
│ 30 days elapsed OR DELETE permanent

┌─────────────┐
│ destroyed │ (physically removed from S3, irreversible)
└─────────────┘