engineeringarchitectureperformance

How We Built a Cloud IDE That Starts in Under 2 Seconds

A deep dive into our dual-runtime architecture: WebContainers for frontend and pre-warmed Docker containers with gVisor for backend.

YaliCode TeamMarch 15, 20267 min read

# How We Built a Cloud IDE That Starts in Under 2 Seconds

When we set out to build [YaliCode](https://yalicode.dev), we had a simple requirement: a developer should be able to open a browser tab, pick a language, and be running code in under two seconds. No Docker pull. No VM spin-up. No "initializing workspace" spinner.

Here is how we got there.

The Problem With Traditional Cloud IDEs

Most cloud IDEs follow the same pattern: spin up a virtual machine or a container per user, install dependencies, mount a filesystem, and expose a code-server instance. This works, but it comes with unavoidable latency. Cold starts of 10 to 30 seconds are common, and the infrastructure cost per concurrent user is high.

We needed something different. Our users range from students writing their first Python loop to experienced developers testing a quick algorithm in Rust. They need instant feedback, not a provisioning pipeline.

Dual-Runtime Architecture

The core insight behind YaliCode is that frontend and backend code have fundamentally different execution requirements. A React app needs a Node.js runtime with a dev server and hot module replacement. A Python script needs a sandboxed container with stdin/stdout capture. Trying to serve both from one abstraction leads to compromises on both sides.

So we built two runtimes.

Runtime 1: WebContainers for Frontend Frameworks

For frontend templates like React, Vue, Svelte, Next.js, Angular, and Astro, we use [WebContainers](https://webcontainers.io/) — a browser-native Node.js runtime built by StackBlitz. WebContainers run entirely in the browser tab. There is no server involved.

When you open the [YaliCode editor](https://yalicode.dev/editor) and select a React template, here is what happens:

1. The WebContainer boots inside a Service Worker (roughly 400ms)

2. Template files are mounted into the virtual filesystem

3. `npm install` runs inside the browser against a local package cache

4. The dev server starts, and the preview iframe loads

The entire flow takes about 1.5 seconds on a modern machine. Because it runs client-side, we can support unlimited concurrent users on frontend templates with zero server cost.

```typescript

// Simplified WebContainer boot sequence

const container = await WebContainer.boot();

await container.mount(templateFiles);

const install = await container.spawn('npm', ['install']);

await install.exit;

const devServer = await container.spawn('npm', ['run', 'dev']);

```

We built a bidirectional file sync layer that keeps the Monaco editor state and the WebContainer filesystem in lockstep. Edits in the editor are debounced and written to the container; changes from the container (like code generation) are polled back every three seconds.

Runtime 2: Pre-Warmed Docker Containers for Backend Languages

For the 23 backend languages we support — Python, JavaScript, TypeScript, Go, Rust, C, C++, Java, Ruby, Bash, PHP, C#, Kotlin, Swift, Perl, Lua, R, Dart, SQL, Haskell, Elixir, Zig, and Assembly — code runs in Docker containers on our servers.

The naive approach would be to start a container per execution request. Even with lightweight images, `docker run` adds 500ms to 1s of overhead. Multiply that across thousands of requests and the experience degrades fast.

Instead, we maintain a pre-warmed container pool. Here is how it works:

| Stage | What Happens | Time |

|-------|-------------|------|

| Boot | Background process keeps N containers warm per language | 0ms (already running) |

| Inject | Code is base64-encoded and written to the container's tmpfs | ~10ms |

| Execute | Container runs the code with strict resource limits | Varies by language |

| Recycle | Container is destroyed after single use; pool replenishes | Async |

Containers are single-use. After one execution, they are destroyed and replaced. This eliminates any risk of state leaking between users and keeps the filesystem clean.

The pool scales dynamically based on demand. If Python requests spike, we warm more Python containers. If Haskell is quiet, we keep the minimum.

Redis Caching for Deterministic Code

A surprising number of execution requests are identical. Students working through the same tutorial, developers testing the same snippet from a blog post, or someone hitting "Run" repeatedly without changing their code.

We cache results in Redis using a SHA-256 hash of the code, language, and version as the key. Cache entries have a one-hour TTL.

```python

cache_key = hashlib.sha256(

f"{language}:{version}:{code}".encode()

).hexdigest()

cached = await redis.get(cache_key)

if cached:

return json.loads(cached) # Cache hit: ~1ms response

```

We skip caching for non-deterministic code — anything using `random`, `time`, `date`, network calls, or user input. The detection runs before execution and checks for known patterns in each language.

On a typical day, our cache hit rate sits between 30 and 40 percent. That is a significant reduction in container usage.

Sandbox Security

Running arbitrary user code is inherently dangerous. Our sandbox applies defense in depth:

No network access: Containers have networking completely disabled. No outbound HTTP, no DNS, no sockets (unless dependency installation is explicitly requested).

Read-only root filesystem: The only writable areas are tmpfs mounts at `/tmp` (128MB) and `/home/runner` (64MB). Nothing persists.

Dropped capabilities: `cap_drop: ALL` removes every Linux capability. The process cannot change ownership, bind to ports, or load kernel modules.

PID limit: Each container is capped at 512 processes, preventing fork bombs.

Resource limits: 1 CPU, configurable memory (128MB default for free tier), and a strict execution timeout (10 seconds for free, 30 seconds for pro).

Seccomp profile: A custom seccomp whitelist allows roughly 250 safe syscalls and blocks dangerous ones like `ptrace`, `mount`, `reboot`, and `bpf`.

Crypto mining detection: Before execution, we scan code against known mining patterns and block matches immediately.

Code is injected via base64 encoding — never bind mounts. The host filesystem is never exposed to the container.

gVisor Runtime

For an additional layer of isolation, we support [gVisor](https://gvisor.dev/) (`runsc`) as an alternative container runtime. gVisor intercepts all syscalls in userspace, providing a second boundary between user code and the host kernel. This is the same technology Google uses to sandbox untrusted workloads in Cloud Run.

Putting It All Together

When you press Ctrl+Enter in the [YaliCode editor](https://yalicode.dev/editor), here is the full request lifecycle:

1. The client sends your code to `/api/execute`

2. The API route authenticates you and checks your daily execution limit

3. The request is forwarded to the execution API (FastAPI on port 8585)

4. Redis cache is checked — if hit, the result is returned in under 5ms

5. If miss, a pre-warmed container is pulled from the pool (0ms wait)

6. Your code is base64-encoded and written to the container's tmpfs

7. The container executes with all sandbox restrictions active

8. stdout, stderr, and the exit code are captured

9. The result is cached (if deterministic) and returned to the client

Total time for a cache miss on a simple Python script: 300 to 800ms. For a cache hit: under 50ms.

What We Learned

Building a fast cloud IDE is less about any single optimization and more about eliminating every unnecessary step in the pipeline. Pre-warming containers removes cold start. Caching removes redundant execution. WebContainers remove the server entirely for frontend work. Base64 injection removes filesystem setup.

Every millisecond you shave off the feedback loop makes the developer experience feel more like a local editor and less like a remote service.

Try It Yourself

Open [yalicode.dev/editor](https://yalicode.dev/editor), pick any of our 23 supported languages, and run your first snippet. The timer starts when you press Ctrl+Enter.

Ready to code?

No account needed. Just open the editor and start building.

Open the editor