No. 24 · AI · Agents · HR

The fine line between
shortlisted and
rejected. रेशा · Resha · the fibre

A recruiter receives 600 resumes for one role. They open 30. Twenty die in the first ninety seconds. Brilliant candidates fall on "page two". Resha gives every candidate the same, fair, three-tier review: on-device, local LLM, cloud fallback. Same prompt, same rubric, every time.

Open the live app Dev mode · streaming CoT View source

3Inference tiers

~50msOnboard tier latency

95%+Consensus accuracy

0GPU required

PDF · DOCX · TXTFile formats parsed

Act I · The Funnel

Six hundred resumes.
One role.
Ninety seconds each.

This is a math problem before it is a hiring problem. A recruiter physically cannot give a fair review to 600 resumes. So they don't. They scan, they skim, they reject on font choice, and the company loses the engineer they needed.

600

Resumes received for one role

120

Filename or first-line pass

Actually opened

Survive the first 90 seconds

The other 590 are not bad candidates. They are unread.

Act II · The Promise

Three inference tiers.
One verdict you can defend.

Cloud goes down. Bandwidth dies. The intern's laptop has no GPU. Resha works anyway. The orchestrator picks the best available tier and routes the analysis. When two or more tiers agree, the consensus weights raise the confidence.

Tier 01
Onboard

SBERT semantic match · TF-IDF

Pure Python, CPU-only, runs anywhere. Spacy entity extraction over skills, experience and education. all-MiniLM-L6-v2 embeds resume and job description, cosine similarity scores the match.

~50ms ~80% accuracy

Tier 02
Local LLM

Ollama · Phi-3, Qwen2.5, Gemma 3

Runs on the candidate's data, on your machine. No internet required. Five-minute timeout, model warm-up on startup, LRU cache for repeats. ollama pull gemma3:1b and you are off.

3-5s ~90% accuracy

Tier 03
Cloud LLM

Google Gemini fallback

When you have signal and you want the highest ceiling. Sub-3s round-trip, real chain-of-thought streaming back to the dev panel, full reasoning trace logged to SQLite for audit.

1-3s ~95% accuracy

↓ ↓ ↓ Weighted consensus ~95%+ accuracy when two or more tiers agree. The fibre.

Act III · The Reasoning

Real chain-of-thought.
Not simulated steps.

Resha streams every reasoning step from the LLM as it happens. Server-Sent Events with buffering disabled, so a recruiter or auditor can watch the model think, in order, in real time.

tok 0001parsing resume.pdf · extracted 1,847 tokens

tok 0042candidate: senior python engineer · 6.2 yrs experience

tok 0089required: python (5+ yrs) ✓ · django ✓ · postgres ✓

tok 0134missing: kubernetes ✗ · 2 of 3 nice-to-haves present

tok 0178SBERT semantic match: 0.84 (above threshold 0.75)

tok 0221tier-1: shortlist · tier-2: shortlist · tier-3: shortlist

tok 0245consensus weight: 0.95 → verdict: SHORTLIST

Act IV · The Microservice

Drop into any ATS.
FastAPI in, JSON out.

Method	Endpoint	Purpose
POST	/api/analyze	SBERT semantic analysis · resume vs JD
POST	/api/shortlist	Cloud LLM shortlist verdict
POST	/api/analyze-file	Upload PDF/DOCX/TXT for analysis
GET	/api/history	SQLite audit trail of decisions
POST	/api/dev/analyze-stream	Streaming chain-of-thought · SSE
POST	/api/dev/hybrid-analyze	Multi-model consensus verdict
POST	/api/dev/warmup	Preload Ollama model into RAM
DELETE	/api/dev/cache	Clear the LRU response cache

Act V · Proof

Production-deployed. Audit-ready.

Live · reas.dmj.one/task2/

One-click systemd deployment. Nginx reverse-proxy, port 22000, isolated from /task1/ service. Auto-generated secret keys, hardened security headers.

SQLite audit trail

Every decision logged with the inputs, the tier used, the verdict, the reasoning. Defensible if a candidate ever asks why.

Strict CSP, HSTS, X-Frame-Options

API-key authentication, no plaintext secrets, security headers on every response. Drop into a regulated environment without rewriting policy.

Four local model presets

Gemma 3 1B (~1GB), Qwen2.5 3B (~2GB), Gemma 2B (~1.5GB), TinyLlama (~700MB). Pick by hardware, the orchestrator handles the rest.

The Stack

Hybrid by design. Offline by default.

Python 3.10+
FastAPI
SBERT MiniLM-L6
spaCy en_core_web_sm
PyTorch CPU
Ollama
Gemma 3 / Qwen2.5
Gemini fallback
SQLite audit
SSE streaming
systemd
Nginx reverse-proxy

If 590 unread resumes can become 590 fair reviews, your hiring can change too.

I build inference orchestrators that route across edge, local and cloud, and stream their reasoning so humans stay in the loop. If your AI needs to defend itself, talk to me.

Hire me Back to work