Open Source · Apache 2.0 · v0.3.15

You think.
Three agents do the rest. No limits.

You handle hypotheses, judgments, and decisions. Your agents handle literature recon, simulations, code, documentation, and figures.

Download for your platform View on GitHub

Agent Roles

Workflow Phases

396

Challenge Problems

Languages

The reproducibility crisis

Speed without rigor is noise.

AI agents can produce a paper a day. Most of those papers won't reproduce. ASRP encodes the scientific method into the workflow itself.

70%

of researchers have failed to reproduce another scientist's experiment. (Nature, 2016)

52%

say there is a significant reproducibility crisis in their field — and AI is making it worse.

errors caught in one day in the iDEA case study below — before submission, not months after.

Standard Research Workflow · SRW-v3

From idea to direction menu in ≤20 minutes.

Every research project starts with the same 7-phase workflow. Bootstrap phases run on AI minutes — you steer immediately. Phase 7 then switches to "1 human day = 1 AI hour" for deep work.

Intake

Theorist Q&A in Discord

Lit Recon

Web · arXiv · self-search

Opportunities

Synthesise + critique

Direction Pick

User chooses path

Plan

Engineer feasibility

Schedule

Time + budget lock-in

Active Loop

Execute · review · iterate

Self-healing dispatcher

Reviewer dispatches phase kickoffs and writes daily standups. Sender ≠ mention target is a runtime invariant — no Discord self-mention deadlocks.

Independent recompute

Engineer recomputes any numerical result Theorist produces, using a different library, algorithm, or parameterization — divergence is logged.

The team

Three agents. One research team.

Each role has its own SOUL prompt, model class, skill manifest, and access permissions. The agent that proposes a hypothesis is never the one that signs off on it.

Theorist

OPUS · LEAD

Lead scientist and the only user-facing voice. Owns hypotheses, literature reconnaissance, opportunity synthesis, plan construction, and the active loop.

Self red-team pass on every draft
Owns Phase 1 Intake with the researcher
Writes and revises the paper

Engineer

SONNET · BUILD

Implements code, runs experiments, processes data — and independently recomputes every numerical result Theorist produces, using a different path.

Different library / algorithm / params
Reviews compute feasibility in Phase 5
Owns workspace, code, data pipelines

Reviewer

HAIKU/SONNET · CRITIC

Dispatcher, daily standup author, and red-team critic. Posts phase kickoffs that @mention Theorist and Engineer. Never mentions itself — bots don't receive self-mention events.

Hostile-reviewer critique on every deliverable
Sender ≠ mention target invariant
Writes the daily standup summary

Discord-native

Your research team lives in Discord.

No new chat app to learn. ASRP spins up an embedded OpenClaw gateway and connects each agent as a real Discord bot — you talk to them in #general just like a teammate.

@mention to trigger. Standup at sunrise.

Every research project gets its own channel. Phase kickoffs are posted automatically. Daily standups summarize what shipped, what broke, and what's next.

Embedded OpenClaw gateway — one click to start
3 independent Discord bots, one per role
@mention Theorist to ask anything
Phase kickoffs · daily standups · audit log

# research-idea-novel-dft · 3 members

Reviewer BOT

@Theorist Phase 2 kickoff — lit recon window opens now. Budget: 12 minutes. Drop the top-3 candidate papers in this channel when ready.

Theorist BOT

Found 3 anchors: Hodgson 2023 (iDEA library), Tozer 2018 (KS gap), Cohen 2008 (DD definition). Hypothesis candidate: prime-spaced wells → anomalous negative DD. Critique: depends on exact-vs-approx KS gap — flag for Engineer.

Engineer BOT

Recomputed with exact KS gap → DD flips to +0.084. The "negative DD" was a definition artifact. Logging divergence; Reviewer please re-open the deliverable.

Built-in scientific rigor

Six guardrails the workflow enforces.

These aren't best practices you have to remember. They're encoded in the workflow itself.

Hypothesis Pre-Registration

Register hypotheses with falsification criteria before running anything. The Researches Registry is your immutable baseline — no post-hoc storytelling.

Independent Cross-Validation

Engineer must reproduce every numerical result via a different code path. Divergence triggers a Reviewer red-team pass before anything goes into the paper.

Append-Only Audit Trail

Every decision, every data point, every error correction logged forever. Full reproducibility from hypothesis to conclusion — exportable as CSV.

First Principles Reasoning

Every SOUL prompt instructs the agent to question assumptions, verify definitions, and trust data over authority. Self red-team pass before any deliverable ships.

Per-Task Model Budget

Right model for the right task. Opus for reasoning, Sonnet for code, Haiku for dispatch. Daily budget caps with live cost tracking on the dashboard.

Role Separation

The proposer is never the validator. Reviewer red-teams every deliverable. SRW-v3 enforces sender ≠ mention target as a runtime invariant.

Agent Toolbox

23 research tools. One click to arm your agents.

Auto-detect installed dependencies, batch-install what's missing, and keep each agent's capability profile in sync — pip, brew/apt/winget, cargo, clawhub, or guided manual install.

Math & Scientific Computing

mpmath · NumPy · SciPy · SymPy · pandas · Numba · scikit-learn

TheoristEngineer

Formal Verification & Proof

Lean 4

Theorist

LaTeX & Documents

Tectonic · nano-pdf

Engineer

Literature & Retrieval

arxiv API · opendataloader-pdf

Theorist

OpenClaw Skills

github · summarize · himalaya (email)

TheoristEngineerReviewer

System & Infrastructure

Git · tqdm · JupyterLab · jq · python3-venv

TheoristEngineerReviewer

5 install types · pip · brew / apt / winget · cargo · clawhub · manual — all auto-detected

Papers pipeline

From draft to submission, in six tracked stages.

Every paper project lives as a directory tree with authors, sections, references, and stage gates. Three sample paper projects ship out of the box.

Draft

Outline + scaffold

Lit Review

Citations & anchors

Methods

Reproducible recipe

Results

Tables, figures, data

Internal Review

Reviewer red-team

Submission

Export & archive

Challenge center

396 problems to benchmark your agents.

A built-in question bank across four disciplines. Hand them to your ASRP team and measure how a real research workflow stacks up against single-shot prompting.

Each discipline ships with 99 graded problems — easy warm-ups through grad-level open questions. Search, filter by difficulty, and route any problem straight into a new research project.

Race-guarded loading, debounced search, accessible keyboard nav, and a Local toggle so you can run the whole bank against your on-prem Ollama stack.

Mathematics

99 problems

⚛︎

Physics

99 problems

⚗︎

Chemistry

99 problems

🧬

Life Sciences

99 problems

Privacy first

Run on-prem. Your data stays on your machine.

ASRP works fully offline with Ollama. Hardware detection, streaming model pulls, GPU/VRAM aware. Or deploy headless on a VPS — same desktop, no display required.

🖥️

Hardware aware

Detects RAM, GPU, VRAM and recommends models you can actually run.

⬇️

Streaming pulls

One-click ollama pull with live progress and resume.

🔐

OS-level encryption

API keys live in the OS keychain. Local SQLite + JWT auth.

🚀

Headless / VPS

Auto-detects missing $DISPLAY and switches to headless mode.

Cloud or local — your call.

Bring your own keys for Anthropic, Google, or OpenRouter. Or run everything locally with Ollama. Mix and match per agent — Theorist on Opus, Engineer on a local Qwen, Reviewer on Haiku.

Self-test suite (25 checks) verifies the install end-to-end. Auto-updater handles new releases.

Case study

Errors caught in hours, not months.

A real ASRP run: a one-day DFT investigation that self-corrected twice and retracted itself before submission.

The iDEA Experiment — April 1, 2026

T+0h

Literature review; iDEA code installed and benchmarked.

T+2h

Hypothesis: "Prime-spaced wells produce anomalous negative DD."

T+4h

V1 results support the hypothesis (DD = -0.070 Ha).

T+4.5h — Error #1

Reviewer caught: Lattice spans unequal. Control experiment shows it's spacing asymmetry, not primes.

T+9h — Error #2

Reviewer caught: Used approximate KS gap instead of exact. Wrong definition.

T+10h — Corrected

Exact KS gap flips DD sign from -0.041 to +0.084. "Negative DD" was an artifact.

T+11h — Conclusion

Paper retracted before submission. Two errors caught in one day.

In traditional research, these errors would survive to submission and be caught months later — if at all.

Get started

From download to first research, in four steps.

No CLI needed. Install the desktop app, complete the 5-step Setup Wizard, and start your first research project.

Download

Get the installer for macOS, Windows, or Linux. Auto-updater takes care of the rest.

Setup Wizard

5 steps: profile, API keys, agent config, Discord bots, launch. ~10 minutes.

Start agents

One click on the dashboard spins up the embedded OpenClaw gateway and your 3 bots.

Talk in Discord

@mention Theorist with your research question. The team takes it from there.

Download for your platform

All platforms · Apache 2.0 · release notes

Open source

Built in the open. For science.

Inspect every line. Self-host on your terms. Shape the future of AI-augmented research.

Transparent

Every line of code is auditable on GitHub. See exactly how agents make decisions and where your data flows.

Self-host

Run ASRP on your laptop, your lab workstation, or a headless VPS. Your research data never leaves your network.

No lock-in

Bring your own LLM. Anthropic, Google, OpenRouter, or local Ollama — swap freely per agent.

Community

Contribute SOUL templates, skill manifests, benchmarks, and domain workflows that benefit every researcher.

You think. Three agents do the rest. No limits.

Speed without rigor is noise.

From idea to direction menu in ≤20 minutes.

Intake

Lit Recon

Opportunities

Direction Pick

Plan

Schedule

Active Loop

Self-healing dispatcher

Independent recompute

Three agents. One research team.

Your research team lives in Discord.

@mention to trigger. Standup at sunrise.

Six guardrails the workflow enforces.

Hypothesis Pre-Registration

Independent Cross-Validation

Append-Only Audit Trail

First Principles Reasoning

Per-Task Model Budget

Role Separation

23 research tools. One click to arm your agents.

Math & Scientific Computing

Formal Verification & Proof

LaTeX & Documents

Literature & Retrieval

OpenClaw Skills

System & Infrastructure

From draft to submission, in six tracked stages.

Draft

Lit Review

Methods

Results

Internal Review

Submission

396 problems to benchmark your agents.

Mathematics

Physics

Chemistry

Life Sciences

Run on-prem. Your data stays on your machine.

Hardware aware

Streaming pulls

OS-level encryption

Headless / VPS

Cloud or local — your call.

Errors caught in hours, not months.

The iDEA Experiment — April 1, 2026

From download to first research, in four steps.

Download

Setup Wizard

Start agents

Talk in Discord

Built in the open. For science.

Transparent

Self-host

No lock-in

Community

You think.
Three agents do the rest. No limits.