Work / AI · Agents / NLU Bot Trainer
No. 20 · AI · Browser ML · Sentio
Train an enterprise classifier.
In your browser tab.
Most chatbot platforms charge six figures and train in someone else's cloud. The data leaves the building. Compliance teams panic. NLU Bot Trainer ships a 5-classifier stacking ensemble of 171,772 parameters that runs entirely in the browser. Zero data egress. No GPU. No SaaS.
Act I · The Problem
The chatbot platform owns the data.
Enterprise NLU is a six-figure subscription that ships customer transcripts to a third-party cloud. For regulated industries, that is the whole problem.
Six figures a year, plus per-call.
The big NLU platforms bill flat fees and per-message. Pricing scales with success. You succeed harder, you pay harder.
Your customer data lives elsewhere.
Training requires upload. Inference requires upload. Compliance teams cannot sign that off in DPDP, GDPR or HIPAA jurisdictions.
One model, one bias.
Linear models miss feature overlap. Naive Bayes struggles with correlations. SVMs overfit tight margins. A single algorithm is a single weakness.
You cannot vendor-lock-out.
Train on Lex, leave Lex, retrain on Dialogflow. Most teams stop trying. The ground truth gets stuck in someone else's format.
Act II · The Stacking Ensemble
Five algorithms.
One vote.
Each classifier fails differently. The ensemble error rate is strictly lower than any individual. Cross-validated meta-weights decide who to trust on what.
Classifier 01
Logistic Regression
12K parameters
Strong on linear boundaries. Misses overlapping features.
Classifier 02
Complement Naive Bayes v2
7K parameters
Strong on small data. Struggles with correlations.
Classifier 03
Pegasos Linear SVM
12K parameters
Sharp margins. Overfits tight clusters.
Classifier 04
MLP · 128 hidden
133K parameters
Catches non-linear patterns. Hungry for data.
Classifier 05
Gradient Boosted Stumps
7K parameters
Great on sharp splits. Misses smooth boundaries.
Every algorithm (MurmurHash3, Pegasos SVM, Complement NB, backprop MLP, gradient boosted stumps) is implemented from scratch in TypeScript. Zero ML dependencies. Ships as static files.
Act III · Built for production
Self-learning, drift-aware,
seven export formats.
Self-learning loop
Evaluates, diagnoses weak intents, augments data, pseudo-labels high-confidence predictions, curriculum-orders, retrains, validates. Accepts only if accuracy does not regress. Fully autonomous.
Drift detection
Page-Hinkley for concept drift. DDM for error-rate drift. Vocabulary distribution monitored in real time. Dashboard shows you the moment behaviour shifts.
Model registry
Semantic versioning. Champion / challenger lifecycle. A/B testing with configurable traffic splits. Rollback in one click.
Seven-platform export
Rasa YAML 3.1, Dialogflow ES, Lex V2, LUIS, Wit.ai, CSV, JSON. Vendor-out is built into the product.
Scales for free
Every user brings their own compute, the browser. Ten users or ten million, the server load is identical: it serves static files.
WCAG 2.2 AA
Full keyboard navigation. Alt+1 through Alt+8 page switching. ARIA labels, screen reader support, reduced motion, skip navigation.
The Stack
Pure TypeScript math.
No Python runtime. No GPU. No API keys.
- TypeScript 5.5
- Next.js 14.2
- React 18.3
- Pure-JS MurmurHash3
- Pegasos Linear SVM
- Complement Naive Bayes
- Backprop MLP
- Gradient Boosted Stumps
- localStorage model registry
- Page-Hinkley drift
- DDM error-rate drift
- Docker · Vercel · Any VM
- AGPL 3.0
If your data cannot leave the building, train it where it lives.
I build privacy-first ML for regulated industries. From-scratch algorithms, no SaaS dependency, drop into any browser. Vendor-out is the product.