Reliable sandboxes
for AI benchmarks.

Harbor-native sandbox infrastructure that prevents agents from cheating.

terminal
$ harbor run \
--dataset terminal-bench@2.0 \
--agent claude-code \
--model anthropic/claude-opus-4-6 \
--env islo \
--n-concurrent 500

Why Islo?

Gateway profiles

Per-sandbox network rules. Control what agents can reach by host, path, method, and rate limit.

gateway.rule
host    api.github.com
path /v1/*
action allow
methods GET, POST
rate 100 req/min

Content filters

Scan response bodies for leaked answers. If an agent fetches a solution from GitHub, the gateway blocks it.

gateway.log
GET github.com/org/repo/issues/418
200 body received
"def solve(puzzle): …"
✗ BLOCKED — content filter match

Credential injection

Agents get access, never keys. Secrets are injected by the proxy and stay out of trajectories.

inject.yaml
match   api.openai.com
header Authorization
value {{ secrets.OPENAI_KEY }}
IsloOthers
Snapshot-based environments
Gateway profiles
Content filters
Credential injection proxy
Cost limits per run
Built-in trajectory storage
Zero infra to maintain

Pricing

Pay for actual CPU cycles, resident memory, and storage. Nothing else.

CPU Time
$0.07/CPU-hour
Memory Time
$0.04/GB-hour
Storage Time
$0.0007/GB-hour
Example: 500 tasks on SWE-bench-verified (~10 min avg)
CPU
$23.33
Memory
$3.33
Storage
$0.12
Total / per task
~$26.78 / $0.054

Who we are

Proven at scale

We're the team behind Incredibuild — a decade of making parallel, compute-heavy workloads fast and reliable. Running thousands of sandboxes in parallel is core to what we do.

Built from real feedback

We've worked directly with the teams building evals for frontier labs. We understand exactly where existing tools fall short — and built Islo to close those gaps.

Fully committed

This is Incredibuild's strategic focus — not a side project. We're investing in the long-term infrastructure layer for AI evaluation, and we're here to support teams that depend on it.

Stop building infra. Start shipping evals.