Open Source · Apache 2.0 · v0.3.15

You think.
Three agents do the rest. No limits.

You handle hypotheses, judgments, and decisions. Your agents handle literature recon, simulations, code, documentation, and figures.

3
Agent Roles
7
Workflow Phases
396
Challenge Problems
4
Languages
The reproducibility crisis

Speed without rigor is noise.

AI agents can produce a paper a day. Most of those papers won't reproduce. ASRP encodes the scientific method into the workflow itself.

70%

of researchers have failed to reproduce another scientist's experiment. (Nature, 2016)

52%

say there is a significant reproducibility crisis in their field — and AI is making it worse.

2

errors caught in one day in the iDEA case study below — before submission, not months after.

Standard Research Workflow · SRW-v3

From idea to direction menu in ≤20 minutes.

Every research project starts with the same 7-phase workflow. Bootstrap phases run on AI minutes — you steer immediately. Phase 7 then switches to "1 human day = 1 AI hour" for deep work.

1

Intake

Theorist Q&A in Discord

2

Lit Recon

Web · arXiv · self-search

3

Opportunities

Synthesise + critique

4

Direction Pick

User chooses path

5

Plan

Engineer feasibility

6

Schedule

Time + budget lock-in

7

Active Loop

Execute · review · iterate

Self-healing dispatcher

Reviewer dispatches phase kickoffs and writes daily standups. Sender ≠ mention target is a runtime invariant — no Discord self-mention deadlocks.

Independent recompute

Engineer recomputes any numerical result Theorist produces, using a different library, algorithm, or parameterization — divergence is logged.

The team

Three agents. One research team.

Each role has its own SOUL prompt, model class, skill manifest, and access permissions. The agent that proposes a hypothesis is never the one that signs off on it.

Theorist
OPUS · LEAD

Lead scientist and the only user-facing voice. Owns hypotheses, literature reconnaissance, opportunity synthesis, plan construction, and the active loop.

  • Self red-team pass on every draft
  • Owns Phase 1 Intake with the researcher
  • Writes and revises the paper
Engineer
SONNET · BUILD

Implements code, runs experiments, processes data — and independently recomputes every numerical result Theorist produces, using a different path.

  • Different library / algorithm / params
  • Reviews compute feasibility in Phase 5
  • Owns workspace, code, data pipelines
Reviewer
HAIKU/SONNET · CRITIC

Dispatcher, daily standup author, and red-team critic. Posts phase kickoffs that @mention Theorist and Engineer. Never mentions itself — bots don't receive self-mention events.

  • Hostile-reviewer critique on every deliverable
  • Sender ≠ mention target invariant
  • Writes the daily standup summary
Discord-native

Your research team lives in Discord.

No new chat app to learn. ASRP spins up an embedded OpenClaw gateway and connects each agent as a real Discord bot — you talk to them in #general just like a teammate.

@mention to trigger. Standup at sunrise.

Every research project gets its own channel. Phase kickoffs are posted automatically. Daily standups summarize what shipped, what broke, and what's next.

  • Embedded OpenClaw gateway — one click to start
  • 3 independent Discord bots, one per role
  • @mention Theorist to ask anything
  • Phase kickoffs · daily standups · audit log
# research-idea-novel-dft · 3 members
R
Reviewer BOT
@Theorist Phase 2 kickoff — lit recon window opens now. Budget: 12 minutes. Drop the top-3 candidate papers in this channel when ready.
T
Theorist BOT
Found 3 anchors: Hodgson 2023 (iDEA library), Tozer 2018 (KS gap), Cohen 2008 (DD definition). Hypothesis candidate: prime-spaced wells → anomalous negative DD. Critique: depends on exact-vs-approx KS gap — flag for Engineer.
E
Engineer BOT
Recomputed with exact KS gap → DD flips to +0.084. The "negative DD" was a definition artifact. Logging divergence; Reviewer please re-open the deliverable.
Built-in scientific rigor

Six guardrails the workflow enforces.

These aren't best practices you have to remember. They're encoded in the workflow itself.

Hypothesis Pre-Registration

Register hypotheses with falsification criteria before running anything. The Researches Registry is your immutable baseline — no post-hoc storytelling.

Independent Cross-Validation

Engineer must reproduce every numerical result via a different code path. Divergence triggers a Reviewer red-team pass before anything goes into the paper.

Append-Only Audit Trail

Every decision, every data point, every error correction logged forever. Full reproducibility from hypothesis to conclusion — exportable as CSV.

First Principles Reasoning

Every SOUL prompt instructs the agent to question assumptions, verify definitions, and trust data over authority. Self red-team pass before any deliverable ships.

Per-Task Model Budget

Right model for the right task. Opus for reasoning, Sonnet for code, Haiku for dispatch. Daily budget caps with live cost tracking on the dashboard.

Role Separation

The proposer is never the validator. Reviewer red-teams every deliverable. SRW-v3 enforces sender ≠ mention target as a runtime invariant.

Agent Toolbox

23 research tools. One click to arm your agents.

Auto-detect installed dependencies, batch-install what's missing, and keep each agent's capability profile in sync — pip, brew/apt/winget, cargo, clawhub, or guided manual install.

Math & Scientific Computing

mpmath · NumPy · SciPy · SymPy · pandas · Numba · scikit-learn

TheoristEngineer

Formal Verification & Proof

Lean 4

Theorist

LaTeX & Documents

Tectonic · nano-pdf

Engineer

Literature & Retrieval

arxiv API · opendataloader-pdf

Theorist

OpenClaw Skills

github · summarize · himalaya (email)

TheoristEngineerReviewer

System & Infrastructure

Git · tqdm · JupyterLab · jq · python3-venv

TheoristEngineerReviewer

5 install types · pip · brew / apt / winget · cargo · clawhub · manual — all auto-detected

Papers pipeline

From draft to submission, in six tracked stages.

Every paper project lives as a directory tree with authors, sections, references, and stage gates. Three sample paper projects ship out of the box.

1

Draft

Outline + scaffold

2

Lit Review

Citations & anchors

3

Methods

Reproducible recipe

4

Results

Tables, figures, data

5

Internal Review

Reviewer red-team

6

Submission

Export & archive

Challenge center

396 problems to benchmark your agents.

A built-in question bank across four disciplines. Hand them to your ASRP team and measure how a real research workflow stacks up against single-shot prompting.

Each discipline ships with 99 graded problems — easy warm-ups through grad-level open questions. Search, filter by difficulty, and route any problem straight into a new research project.

Race-guarded loading, debounced search, accessible keyboard nav, and a Local toggle so you can run the whole bank against your on-prem Ollama stack.

π

Mathematics

99 problems
⚛︎

Physics

99 problems
⚗︎

Chemistry

99 problems
🧬

Life Sciences

99 problems
Privacy first

Run on-prem. Your data stays on your machine.

ASRP works fully offline with Ollama. Hardware detection, streaming model pulls, GPU/VRAM aware. Or deploy headless on a VPS — same desktop, no display required.

🖥️

Hardware aware

Detects RAM, GPU, VRAM and recommends models you can actually run.

⬇️

Streaming pulls

One-click ollama pull with live progress and resume.

🔐

OS-level encryption

API keys live in the OS keychain. Local SQLite + JWT auth.

🚀

Headless / VPS

Auto-detects missing $DISPLAY and switches to headless mode.

Cloud or local — your call.

Bring your own keys for Anthropic, Google, or OpenRouter. Or run everything locally with Ollama. Mix and match per agent — Theorist on Opus, Engineer on a local Qwen, Reviewer on Haiku.

Self-test suite (25 checks) verifies the install end-to-end. Auto-updater handles new releases.

Case study

Errors caught in hours, not months.

A real ASRP run: a one-day DFT investigation that self-corrected twice and retracted itself before submission.

The iDEA Experiment — April 1, 2026

T+0h
Literature review; iDEA code installed and benchmarked.
T+2h
Hypothesis: "Prime-spaced wells produce anomalous negative DD."
T+4h
V1 results support the hypothesis (DD = -0.070 Ha).
T+4.5h — Error #1
Reviewer caught: Lattice spans unequal. Control experiment shows it's spacing asymmetry, not primes.
T+9h — Error #2
Reviewer caught: Used approximate KS gap instead of exact. Wrong definition.
T+10h — Corrected
Exact KS gap flips DD sign from -0.041 to +0.084. "Negative DD" was an artifact.
T+11h — Conclusion
Paper retracted before submission. Two errors caught in one day.
In traditional research, these errors would survive to submission and be caught months later — if at all.
Get started

From download to first research, in four steps.

No CLI needed. Install the desktop app, complete the 5-step Setup Wizard, and start your first research project.

1

Download

Get the installer for macOS, Windows, or Linux. Auto-updater takes care of the rest.

2

Setup Wizard

5 steps: profile, API keys, agent config, Discord bots, launch. ~10 minutes.

3

Start agents

One click on the dashboard spins up the embedded OpenClaw gateway and your 3 bots.

4

Talk in Discord

@mention Theorist with your research question. The team takes it from there.

Download for your platform
All platforms · Apache 2.0 · release notes
Open source

Built in the open. For science.

Inspect every line. Self-host on your terms. Shape the future of AI-augmented research.

Transparent

Every line of code is auditable on GitHub. See exactly how agents make decisions and where your data flows.

Self-host

Run ASRP on your laptop, your lab workstation, or a headless VPS. Your research data never leaves your network.

No lock-in

Bring your own LLM. Anthropic, Google, OpenRouter, or local Ollama — swap freely per agent.

Community

Contribute SOUL templates, skill manifests, benchmarks, and domain workflows that benefit every researcher.