Work / Cloud and Infra / Async Document Workflow

Production-grade · Async pipeline

From upload to export, never stuck on a spinner.

Document workflows die quietly. A worker crashes. A queue backs up. The user sees a spinner forever and assumes the upload failed. This system streams every stage as it happens over Server-Sent Events, with a Celery worker pool, Redis Pub/Sub, PostgreSQL persistence and a Docker compose that brings the whole thing up on one command.

5Pipeline stages, every one streamed
5Accepted formats: PDF, TXT, CSV, JSON, MD
50MBDefault file-size ceiling
20+Makefile targets, one to learn
1docker compose up

Act I · The Problem

The user sees one spinner. The system knows ten things.

A document upload kicks off file validation, text extraction, analysis, categorization, a human review step and an export. Every one of those can fail. None of it is visible to the person who clicked Upload. The product feels broken even when it is working.

Spinner forever.

Worker crashed. Queue backed up. The UI has no signal so it shows the same thing it always shows. The user reloads, retries, gives up.

No human-in-the-loop.

Most pipelines auto-finalise. Real workflows need a person to approve, reject or request reprocessing before the export.

Demo only.

Plenty of "async pipeline" repos. Almost none ship with health checks, rate limits, JWT auth, backups, and a one-command compose that boots cleanly.

Act II · The Promise

Every stage. Streamed. While it happens.

The frontend opens an EventSource. The backend subscribes to a per-document Redis channel. The Celery worker publishes progress as it works. The client gets a live event for every stage transition. A real progress bar that reflects a real worker.

Act III · The Architecture

Three planes. One compose.

Browser to nginx, nginx to Next.js for UI and FastAPI for the API, FastAPI to Celery via Redis broker, Celery to PostgreSQL for persistence and Redis Pub/Sub for progress. Every container has a health check, a resource limit and a structured log line.

Edge

nginx · ports 80/443

Reverse proxy with rate limiting and security headers. Serves Next.js on the root and FastAPI under /api/v1.

UI

Next.js 14 · TypeScript

Tailwind CSS, EventSource for SSE, hot-reload dev mode via source mount. JWT auth per request.

API

FastAPI · async/await

Pydantic v2 schemas, async SQLAlchemy, dependency injection. Repository pattern, thin route handlers, business logic in services.

Workers

Celery 5 · documents queue

Multi-stage tasks for extract, analyze, categorize. Automatic retry on transient failures. Flower dashboard at port 5555.

State

PostgreSQL 16

Document metadata, status transitions, audit trail. UUID primary keys. Trigram extension for full-text search. Alembic migrations.

Bus

Redis 7 · broker + Pub/Sub

Celery broker, result backend, cache. Per-document Pub/Sub channel that the SSE endpoint subscribes to and forwards to the client.

Act IV · The API

REST + SSE. Versioned from day one.

Real endpoints. JWT auth. Idempotency-friendly. Versioned under /api/v1. Auto-generated OpenAPI docs at /api/v1/docs. The shape of every error: {"error": {"code", "message", "details"}}.

VerbEndpointWhat it does
POST/api/v1/documentsUpload a document. Returns id, status=uploaded, kicks off Celery task.
GET/api/v1/documents/{id}Get current state, metadata, summary, keywords.
GET/api/v1/documents/{id}/streamSSE stream of progress events for that document.
POST/api/v1/review/{id}/approveHuman-in-the-loop approval. Marks document finalised.
POST/api/v1/review/{id}/rejectReject and stop the pipeline. Audit logged.
POST/api/v1/review/{id}/reprocessSend back to the worker pool. Status returns to processing.
GET/api/v1/export?format=json|csvBulk export of finalised documents.

Act V · The Stack

What is inside.

  • Next.js 14
  • React
  • TypeScript 5
  • Tailwind CSS 3
  • FastAPI 0.109+
  • Python 3.11+
  • Celery 5.3+
  • Flower 2.0
  • PostgreSQL 16
  • Redis 7
  • SQLAlchemy 2 async
  • Alembic 1.13
  • nginx Alpine
  • Docker + Compose 24+
  • JWT
  • Server-Sent Events

Act VI · Proof

It boots. It tests. It backs up.

One command · make up

Or docker compose up -d. All services come up with health checks, dependency ordering, and isolated networking. make health verifies every container.

20+ Makefile targets

build, up, down, logs, status, health, migrate, backup, restore, shell, test, lint. The README walks through each. New engineer is productive in under an hour.

Backup · restore · monitor

Database backup and restore scripts. Flower dashboard for tasks. Automated deploy script for Ubuntu. Resource limits on every container.

Honest limits, in the README

SSE connections are long-lived. No document versioning. Single-region deploy. IP-based rate limiting. All of it written down, so you know what you are getting.

Want a pipeline that tells the user the truth, every second?

I build production-grade async systems with real progress, real workers and real recovery. The kind that stay up in production, not just in a demo.