Work / AI · Agents / Resha

No. 24 · AI · Agents · HR

The fine line between
shortlisted and
rejected. रेशा · Resha · the fibre

A recruiter receives 600 resumes for one role. They open 30. Twenty die in the first ninety seconds. Brilliant candidates fall on "page two". Resha gives every candidate the same, fair, three-tier review: on-device, local LLM, cloud fallback. Same prompt, same rubric, every time.

3Inference tiers
~50msOnboard tier latency
95%+Consensus accuracy
0GPU required
PDF · DOCX · TXTFile formats parsed

Act I · The Funnel

Six hundred resumes.
One role.
Ninety seconds each.

This is a math problem before it is a hiring problem. A recruiter physically cannot give a fair review to 600 resumes. So they don't. They scan, they skim, they reject on font choice, and the company loses the engineer they needed.

600

Resumes received for one role

120

Filename or first-line pass

30

Actually opened

10

Survive the first 90 seconds

The other 590 are not bad candidates. They are unread.

Act II · The Promise

Three inference tiers.
One verdict you can defend.

Cloud goes down. Bandwidth dies. The intern's laptop has no GPU. Resha works anyway. The orchestrator picks the best available tier and routes the analysis. When two or more tiers agree, the consensus weights raise the confidence.

Tier 01
Onboard

SBERT semantic match · TF-IDF

Pure Python, CPU-only, runs anywhere. Spacy entity extraction over skills, experience and education. all-MiniLM-L6-v2 embeds resume and job description, cosine similarity scores the match.

~50ms ~80% accuracy
Tier 02
Local LLM

Ollama · Phi-3, Qwen2.5, Gemma 3

Runs on the candidate's data, on your machine. No internet required. Five-minute timeout, model warm-up on startup, LRU cache for repeats. ollama pull gemma3:1b and you are off.

3-5s ~90% accuracy
Tier 03
Cloud LLM

Google Gemini fallback

When you have signal and you want the highest ceiling. Sub-3s round-trip, real chain-of-thought streaming back to the dev panel, full reasoning trace logged to SQLite for audit.

1-3s ~95% accuracy
↓ ↓ ↓ Weighted consensus ~95%+ accuracy when two or more tiers agree. The fibre.

Act III · The Reasoning

Real chain-of-thought.
Not simulated steps.

Resha streams every reasoning step from the LLM as it happens. Server-Sent Events with buffering disabled, so a recruiter or auditor can watch the model think, in order, in real time.

Act IV · The Microservice

Drop into any ATS.
FastAPI in, JSON out.

MethodEndpointPurpose
POST/api/analyzeSBERT semantic analysis · resume vs JD
POST/api/shortlistCloud LLM shortlist verdict
POST/api/analyze-fileUpload PDF/DOCX/TXT for analysis
GET/api/historySQLite audit trail of decisions
POST/api/dev/analyze-streamStreaming chain-of-thought · SSE
POST/api/dev/hybrid-analyzeMulti-model consensus verdict
POST/api/dev/warmupPreload Ollama model into RAM
DELETE/api/dev/cacheClear the LRU response cache

Act V · Proof

Production-deployed. Audit-ready.

Live · reas.dmj.one/task2/

One-click systemd deployment. Nginx reverse-proxy, port 22000, isolated from /task1/ service. Auto-generated secret keys, hardened security headers.

SQLite audit trail

Every decision logged with the inputs, the tier used, the verdict, the reasoning. Defensible if a candidate ever asks why.

Strict CSP, HSTS, X-Frame-Options

API-key authentication, no plaintext secrets, security headers on every response. Drop into a regulated environment without rewriting policy.

Four local model presets

Gemma 3 1B (~1GB), Qwen2.5 3B (~2GB), Gemma 2B (~1.5GB), TinyLlama (~700MB). Pick by hardware, the orchestrator handles the rest.

The Stack

Hybrid by design. Offline by default.

  • Python 3.10+
  • FastAPI
  • SBERT MiniLM-L6
  • spaCy en_core_web_sm
  • PyTorch CPU
  • Ollama
  • Gemma 3 / Qwen2.5
  • Gemini fallback
  • SQLite audit
  • SSE streaming
  • systemd
  • Nginx reverse-proxy

If 590 unread resumes can become 590 fair reviews, your hiring can change too.

I build inference orchestrators that route across edge, local and cloud, and stream their reasoning so humans stay in the loop. If your AI needs to defend itself, talk to me.