Reliable computers for AI benchmarks.

Harbor-native computers with gateway-enforced anti-cheating, credential isolation, and full audit trails per task.

↓

terminal

$ harbor run \
    --dataset terminal-bench@2.0 \
    --agent claude-code \
    --model anthropic/claude-opus-4-7 \
    --env islo \
    --n-concurrent 300

Gateway profiles

Per-computer network rules. Control what agents can reach by host, path, method, and rate limit.

gateway.rule

host    api.github.com
path    /v1/*
action  allow
methods GET, POST
rate    100 req/min

Content filters

Scan response bodies for leaked answers. If an agent fetches a solution, the gateway blocks it.

gateway.log

→ GET github.com/repo/issues/418
← 200 body received
  "def solve(puzzle): …"
✗ BLOCKED — content filter match

Credential injection

Agents get access, never keys. Secrets are injected by the proxy and stay out of trajectories.

inject.yaml

match   api.openai.com
header  Authorization
value   {{ secrets.OPENAI_KEY }}

	Islo	DIY	Daytona	Modal	Others
Snapshot-based environments Instant resets — no cold starts	✓	partial	✓	✓	✓
Gateway profiles Per-computer network policies, host/path rules, rate limits	✓	—	—	—	—
Content filters Inspect response bodies — stop agents from fetching leaked answers	✓	—	—	—	—
Credential injection proxy Inject secrets without exposing them to agents	✓	—	—	—	—
Cost limits per run Cap spend per task — no runaway budgets	✓	—	—	—	—
Zero infra to maintain Fully managed — no orchestration to operate	✓	—	✓	✓	✓

compute pricing

CPU Time

$0.07/CPU-hour

Memory Time

$0.04/GB-hour

Storage Time

$0.0007/GB-hour

Example: 500 tasks on SWE-bench-verified (~10 min avg)

CPU

$23.33

Memory

$3.33

Storage

$0.12

Total / task

~$26.78 / $0.054

Stop building infra. Start shipping evals.

Get started →read the quickstart

$15K credits for startups/academic →

Docs Blog Sign in