Reliable sandboxes for AI benchmarks.
Harbor-native sandboxes with gateway-enforced anti-cheating, credential isolation, and full audit trails per task.
↓$ harbor run \ --dataset terminal-bench@2.0 \ --agent claude-code \ --model anthropic/claude-opus-4-7 \ --env islo \ --n-concurrent 300Gateway profiles
Per-sandbox network rules. Control what agents can reach by host, path, method, and rate limit.
host api.github.compath /v1/*action allowmethods GET, POSTrate 100 req/minContent filters
Scan response bodies for leaked answers. If an agent fetches a solution, the gateway blocks it.
→ GET github.com/repo/issues/418← 200 body received "def solve(puzzle): …"✗ BLOCKED — content filter matchCredential injection
Agents get access, never keys. Secrets are injected by the proxy and stay out of trajectories.
match api.openai.comheader Authorizationvalue {{ secrets.OPENAI_KEY }}| Islo | DIY | Daytona | Modal | Others | |
|---|---|---|---|---|---|
Snapshot-based environments Instant resets — no cold starts | ✓ | partial | ✓ | ✓ | ✓ |
Gateway profiles Per-sandbox network policies, host/path rules, rate limits | ✓ | — | — | — | — |
Content filters Inspect response bodies — stop agents from fetching leaked answers | ✓ | — | — | — | — |
Credential injection proxy Inject secrets without exposing them to agents | ✓ | — | — | — | — |
Cost limits per run Cap spend per task — no runaway budgets | ✓ | — | — | — | — |
Zero infra to maintain Fully managed — no orchestration to operate | ✓ | — | ✓ | ✓ | ✓ |
compute pricing
CPU Time
$0.07/CPU-hour
Memory Time
$0.04/GB-hour
Storage Time
$0.0007/GB-hour
Example: 500 tasks on SWE-bench-verified (~10 min avg)
CPU
$23.33
Memory
$3.33
Storage
$0.12
Total / task
~$26.78 / $0.054