Work / AI · Agents / Scry
Scry sees
what your screen sees.
/skraɪ/ · to see, to reveal, to divine hidden knowledge.
Scry is a research instrument. It demonstrates how screen-reading and AI response automation can be weaponised against unsuspecting systems, so blue teams can defend against the same techniques. It is published for education, red-team simulation and authorised assessment. It is not a productivity tool. It is not a cheat tool. It is not for use without permission.
Section 01 · Boundary of Use
Read this first.
Most security tools sit on the "informational use only" disclaimer and stop. Scry's threat model assumes a user who genuinely wants to research, demonstrate, or red-team. The boundary is named, controlled and enforced by design.
Permitted: blue-team defence research, red-team engagements with written authorisation, classroom labs in offensive-security curricula, controlled simulations on systems you own.
Prohibited: use on systems you do not own or are not contracted to test, use during examinations or assessments where you are the subject, use to violate any platform's terms of service.
Liability: the author and dmj.one accept no liability for misuse. Using the software is acceptance of these terms.
Distribution: the optional ephemeral licensing system requires the owner to sign each session's challenge code. No license, no run.
Section 02 · Why It Exists
Defenders cannot defend against
what they have never seen.
Screen-reading AI is now a commodity. Multimodal models can interpret arbitrary content rendered to a display. Combined with input automation, the assistive-tech surface becomes an exfiltration surface. Scry exists so blue teams can build detections before adversaries ship them.
Threat scenario A
Adversary places a screen-reading agent on a target endpoint. It interprets sensitive renderings (CRM, finance, source code) and exfiltrates structured summaries via covert channel.
→ defender needs to detect input-injection + clipboard-streaming patterns
Threat scenario B
Insider with legitimate access uses screen-reading AI to extract knowledge faster than DLP can flag it. No file is ever opened in a way that triggers existing alerts.
→ defender needs OCR-egress detection in EDR signatures
Section 03 · Architecture
Two engines.
Triple-press activation.
SHA-256 integrity at every boot.
Phase 01
Capture
Screen capture pipeline. Region, full-screen, or active-window. Captured frames hashed with SHA-256 and discarded after analysis. pyautogui + pillow.
Phase 02
Read
Hybrid OCR. Gemini Flash multimodal model handles structured visual content. Tesseract OCR runs in parallel for redundancy and offline fallback.
Phase 03
Reason
Gemini Flash interprets the captured content according to mode (MCQ, descriptive, free-form). Output is constrained to the configured response schema.
Phase 04
Respond
Optional input automation. Human-like typing simulation, configurable mouse speed, randomised inter-key delays. Off by default.
Phase 05
Stream
Clipboard streaming on triple-press hotkey. Backspace pauses, 9 stops, arrow speeds up. All operator-controlled, no autonomous loop.
Phase 06
Self-update
Source-mode auto-pull from main, binary-mode auto-update from signed GitHub releases. Versioned via src/version.py, GitHub Actions builds the EXE.
Section 04 · Built-In Constraints
Designed to be hard to misuse.
| Threat | Mitigation | Outcome |
|---|---|---|
| Stolen .env file | Machine-bound encryption. Key derived from CPU, disk serial, MAC, install path, install ID. | Useless on any other machine |
| Project copied to another path | Install-path hash mismatch invalidates the encrypted API key. | Re-authorisation required |
| Casual hotkey trigger | Triple-press activation. Any other key in between resets the count. | No accidental activation |
| Attacker has full source | Ephemeral licensing system, owner private key never in repo. | Cannot generate session licences |
| Stolen one-time licence key | Each licence is challenge-bound, session-bound, expires on close. | Single-use only |
| Tampered binary | SHA-256 integrity verification at boot. Modified files refuse to run. | Boot aborted |
Section 05 · Operator Controls
Every hotkey requires three intentional presses.
MCQ mode
Capture, OCR, structured multiple-choice answer.
Reset on any other key
Descriptive mode
Capture and produce free-form descriptive analysis.
Reset on any other key
Clipboard stream
Type clipboard contents character-by-character. Backspace pauses, 9 stops.
Operator-paced
Force stop
Terminates every running Scry process immediately.
Always-available kill switch
Section 06 · Proof
Engineered for research integrity.
CI · flake8 · bandit · pytest
GitHub Actions on every push: linting, security static-analysis with bandit, unit tests with pytest. Build status on the README badge.
Signed release pipeline
Bump src/version.py, push to main, Actions builds the obfuscated executable, creates a GitHub release, uploads the signed binary, clients self-update.
True one-click bootstrap
Double-click Scry.vbs on a fresh machine. Auto-installs Python if missing, creates venv, installs deps, opens the control centre in browser.
Mobile licence signer
Owner runs license_signer.py --server. Mobile-friendly web interface signs challenge codes from a phone. Private key never on the operator machine.
The Stack
Python only. Auditable.
- Python 3.10+
- Gemini Flash
- Tesseract OCR
- PyAutoGUI
- Pillow
- Pytest
- flake8
- bandit
- PyInstaller
- GitHub Actions
- Ed25519 licensing
- SHA-256 integrity
If you can build the threat,
you can also build the defence.
I build security research tools that are easy to study, hard to misuse, and impossible to ignore. If your team needs to threat-model the assistive-tech surface, talk to me.
By using or distributing Scry you accept the authorised-use terms above.