Work / AI · Agents / NLU Bot Trainer

No. 20 · AI · Browser ML · Sentio

Train an enterprise classifier.
In your browser tab.

Most chatbot platforms charge six figures and train in someone else's cloud. The data leaves the building. Compliance teams panic. NLU Bot Trainer ships a 5-classifier stacking ensemble of 171,772 parameters that runs entirely in the browser. Zero data egress. No GPU. No SaaS.

Training · Local browser 94%
Logistic Reg.
12K
Complement NB
7K
Linear SVM
12K
MLP · 128h
133K
Gradient Boost
7K
171,772 params · 2.0 MB Inference · 1-6 ms Egress · 0 bytes
5Classifiers stacking
171KTotal parameters
30sFull ensemble training
1-6msInference latency
0Bytes egress

Act I · The Problem

The chatbot platform owns the data.

Enterprise NLU is a six-figure subscription that ships customer transcripts to a third-party cloud. For regulated industries, that is the whole problem.

i.

Six figures a year, plus per-call.

The big NLU platforms bill flat fees and per-message. Pricing scales with success. You succeed harder, you pay harder.

ii.

Your customer data lives elsewhere.

Training requires upload. Inference requires upload. Compliance teams cannot sign that off in DPDP, GDPR or HIPAA jurisdictions.

iii.

One model, one bias.

Linear models miss feature overlap. Naive Bayes struggles with correlations. SVMs overfit tight margins. A single algorithm is a single weakness.

iv.

You cannot vendor-lock-out.

Train on Lex, leave Lex, retrain on Dialogflow. Most teams stop trying. The ground truth gets stuck in someone else's format.

Act II · The Stacking Ensemble

Five algorithms.
One vote.

Each classifier fails differently. The ensemble error rate is strictly lower than any individual. Cross-validated meta-weights decide who to trust on what.

Every algorithm (MurmurHash3, Pegasos SVM, Complement NB, backprop MLP, gradient boosted stumps) is implemented from scratch in TypeScript. Zero ML dependencies. Ships as static files.

Act III · Built for production

Self-learning, drift-aware,
seven export formats.

Self-learning loop

Evaluates, diagnoses weak intents, augments data, pseudo-labels high-confidence predictions, curriculum-orders, retrains, validates. Accepts only if accuracy does not regress. Fully autonomous.

Drift detection

Page-Hinkley for concept drift. DDM for error-rate drift. Vocabulary distribution monitored in real time. Dashboard shows you the moment behaviour shifts.

Model registry

Semantic versioning. Champion / challenger lifecycle. A/B testing with configurable traffic splits. Rollback in one click.

Seven-platform export

Rasa YAML 3.1, Dialogflow ES, Lex V2, LUIS, Wit.ai, CSV, JSON. Vendor-out is built into the product.

Scales for free

Every user brings their own compute, the browser. Ten users or ten million, the server load is identical: it serves static files.

WCAG 2.2 AA

Full keyboard navigation. Alt+1 through Alt+8 page switching. ARIA labels, screen reader support, reduced motion, skip navigation.

The Stack

Pure TypeScript math.
No Python runtime. No GPU. No API keys.

  • TypeScript 5.5
  • Next.js 14.2
  • React 18.3
  • Pure-JS MurmurHash3
  • Pegasos Linear SVM
  • Complement Naive Bayes
  • Backprop MLP
  • Gradient Boosted Stumps
  • localStorage model registry
  • Page-Hinkley drift
  • DDM error-rate drift
  • Docker · Vercel · Any VM
  • AGPL 3.0

If your data cannot leave the building, train it where it lives.

I build privacy-first ML for regulated industries. From-scratch algorithms, no SaaS dependency, drop into any browser. Vendor-out is the product.