Engineering a Planet in the Browser: The Decisions Behind OVGrid

Real-scale terrain, exact-ground physics, weather and a living crowd — on WebGPU. The hard problems, and the calls we made to solve them.

OVGrid is a real-scale planet you explore as an avatar, in a browser tab: a 795 km-radius cube-sphere with procedural terrain, weather, vegetation, a day/night cycle, multiplayer presence and tradable land. It runs on WebGPU, with no install and no account.

▶ Play it: https://world.ovgrid.com

Most of the interesting work wasn't "make it pretty" — it was a series of hard engineering problems where the naive solution simply doesn't survive at planetary scale. This post walks through those problems and the decisions we took. Almost every section maps to a deep-dive doc in the repo, linked as we go.

1. The scale problem: why a planet shakes — and Floating Origin

A GPU works in 32-bit floats: ~6–7 decimal digits of precision. At 10 units from the origin you can resolve a millimetre; at 100,000 units that millimetre is simply discarded. On a 795 km planet, vertices near the surface round to the same representable value, adjacent triangles collapse or invert, and the terrain jitters — even with the camera dead still.

You can't fix this by "using doubles everywhere" — the GPU pipeline is Float32 by definition. The architectural answer is Floating Origin: keep the camera permanently at (0,0,0) and move the world around it.

The CPU holds every coordinate in Float64 (full precision).
Each frame, positions are rebased relative to the camera before they reach the GPU, so the numbers the shader sees are always tiny and precise.
The view matrix is used without translation — the camera never moves, the world does.

// CPU (Float64): everything is camera-relative before it touches the GPU
const cameraRelativePos = absolutePos - cameraOrigin;  // small, precise

The decision that paid off most: delegate everything possible to the GPU and keep the CPU as the source of truth for precision. Normals and the sphere normal were removed from the vertex buffer (11 → 5 floats per vertex) and reconstructed analytically in the shader — a 55% VRAM cut (112 MB → 51 MB) that also killed "texture swimming." Where a calculation genuinely needs Float64 but runs on the GPU (the ray–sphere intersection discriminant, a subtraction of two ~40-million values), we precompute it on the CPU and pass the small result down.

The result: jitter-free rendering across all 22 LOD levels, no CPU fallback. → floating-origin-guide.md

2. A whole planet of terrain: a GPU-driven quadtree

The planet is a cube-sphere partitioned by a dynamic quadtree: nodes subdivide when the camera gets close, giving 22 visual LOD levels (centimetres at ground level) mapped onto 14 physical subdivisions, while the horizon stays stable.

The rendering decision here was vertex pulling. Instead of uploading vertices every frame, a single static 25×25 unit grid (625 vertices) lives on the GPU permanently. Each visible chunk is an instance that contributes only its metadata (centre, scale, LOD); the vertex shader projects the flat quad onto the sphere, displaces it with fractal noise (getTerrainHeight), and re-centres it on the camera — all on the GPU. The CPU sends zero vertices per frame, just a small list of visible instance indices.

The cleverest determination was the seam fix. Cracks between chunks of different LOD are the classic planetary-terrain headache. Rather than stitch geometry, we made the noise octave count depend on distance-to-camera, not on the chunk's LOD level. Neighbouring chunks share the same distance along their shared edge, so they compute the exact same height there — the seam closes mathematically. The "skirts" that used to hide cracks became a 500 m safety seal, nothing more. → terrain-mesh.md

3. What to draw: hybrid two-stage culling

Drawing a whole planet's worth of quads is wasteful when you can only see a sliver. We cull in two stages:

Horizon culling — spherical geometry: a surface point P is visible from camera C only if dot(normalize(P), normalize(C)) > R / |C|. This discards the entire far hemisphere. It runs first because it's cheap.
Frustum / back-cone culling on a GPU compute shader — everything outside the view (or behind the camera) is dropped in parallel, and the survivors are written to a buffer drawn via indirect draw, with no CPU readback.

Adaptive margins (wider for big root quads, tighter for detail) prevent popping during fast rotation. Net effect: 60–80% of quads never reach the rasterizer. → culling-system.md

4. The decision I'm proudest of: the GPU physics "treadmill"

Here's a subtle trap. The terrain height is computed by a noise function. If you run "the same" function on the CPU (Float64) to do collision, it does not match the GPU's Float32 result — different implementations, different rounding. That precision gap means the avatar jitters, sinks, or floats above the ground you can see.

The determination: don't recompute the ground on the CPU at all. Bake it on the GPU, from the exact same shader code as the visible terrain, and treat that as the single source of truth.

A 64×64 grid at 8 m spacing (a 512 m square) is baked around the avatar with a compute shader — the same getTerrainHeight the visual terrain uses.
It only re-bakes when the avatar drifts more than 200 m from the last centre — a "treadmill" that follows you while keeping GPU→CPU readback rare (~1–5 ms, 1–2 frames of latency).
Collision uses bilinear interpolation between the four nearest samples for millimetric accuracy anywhere in the 512 m area.

The avatar now walks on exactly what you see. The same grid powers the camera's terrain probe (it shrinks the orbit distance instead of clipping into a hill) and — as of the latest work — the NPCs' swim/walk decisions. → GPU_PHYSICS_GRID.md

5. Populating the world cheaply: GPU scatter + a dither dissolve

Trees, palms, rocks and flowers are scattered entirely on the GPU, on a grid of cells around the camera with a unified ~3 km reach (past that they're sub-pixel — spending compute there buys nothing). Biome latitude bands decide what grows where, with noise on the borders so the biome lines aren't straight.

The LOD decision worth stealing: instead of alpha-blending distant instances (which forces transparency sorting), they dissolve with an ordered 4×4 Bayer dither — a clean, structured screen-door fade that needs no blending and no sort. It's the same trick used for the avatars now.

6. A living crowd that respects the ground (NPCs)

NPCs were a good case study in honest constraints. They can only know the exact ground inside that 512 m physics grid; beyond it, they'd happily walk on the surface of the sea. Two decisions:

Swim/walk is read from the baked grid, the same ground truth the player uses — so an NPC swims in real water and walks on real land, not on the CPU-noise "other planet."
The finite roster is toroidally wrapped around the player, tuned to the same ~3 km reach as the trees. Whatever an NPC's home is, it folds into the precise range so the crowd always populates the scene naturally — and fades out at the horizon with the same Bayer dissolve, hiding the wrap seam. (We tried a tight wrap first; it clustered NPCs on top of the player. Widening it to the tree distance restored the natural, scattered feel.)

The takeaway: when a system can only be correct within a bounded region, bound the system to that region and fade the edge — don't fake correctness outside it.

7. Weather without a GPU budget

Rain you'd expect to be expensive. The decision that made it nearly free: a world-anchored treadmill. A small disk of ~25,000 drops (40 m radius) is tile-wrapped around the camera — as the avatar walks +X, every drop slides −X, and when one exits the disk it re-enters from the opposite side, at the boundary where the radial fade is already zero, so the jump is invisible.

// Tile-wrap: the disk feels infinite, but only ~25k drops ever exist
visX_m = fract((baseX_m - camOffX_m) / (2*R) + 0.5) * (2*R) - R;

Two subtle calls made it feel right:

Damp the slide to 0.1×. A physically exact 1:1 slide reads as "rain shot sideways"; at 0.1 you still see drops drift past as you walk, but the fall reads vertical.
Anchor to the avatar's position, not the camera's. The third-person camera orbits the avatar — keying off camera movement would slide the rain every time you just looked around.

The whole thing costs ~0.06% of an Apple M1 GPU. Snow reuses the identical treadmill with a different flake shape and a slower fall. Clouds are baked once into a texture and projected with triplanar mapping (three orthogonal planes, blended) to avoid the polar UV singularity that streaked the old spherical projection on a small planet — three parallax layers share one texture, ~0.1–0.3 ms a frame. → rain-system.md · clouds-performance.md

8. A believable sky anywhere on the planet

The planet rotates in world space, so an avatar at longitude L sees the sun in a different place than one at L + π at the same instant. The HUD therefore shows local solar time, computed from the avatar's longitude relative to the sun's azimuth — not the global clock, which would happily say "14:53" while the local sun is at midnight.

The storm sky has the same curvature trap: the procedural lightning bolt is built in the avatar's local tangent frame, not world axes — otherwise it tilts to horizontal as you walk around the planet's curve. An autonomous weather system drives it all from FBM noise, with a Stardew-style calendar whose forecast icons are classified by the same blend the engine renders, so the icon always matches the sky.

9. UI inside a 3D world

Configuration panels (Debug, Calendar), avatar nameplates and parcel tooltips don't sit as flat DOM overlays — they render inside the world as billboards. Real HTML/CSS is rasterized into a GPU texture via the experimental HTML-in-Canvas API (copyElementImageToTexture), then drawn as a camera-facing quad anchored in the world and depth-tested, so terrain and avatars occlude it.

Because the API doesn't route clicks to 3D-placed elements, interaction is a perspective-correct raycast: solve the screen point → panel UV mapping in clip space (using the w component, not affine interpolation, which drifts at an angle), hit-test against the live control rects, and replay the hit as a synthetic DOM event on the real control. Build a panel with the shared markup and it just works in-world, no per-panel code. → IN_WORLD_UI.md

10. Avatars: unique per person, tiny over the wire

Every avatar is a voxel character built at runtime from a compact parameter set, rigged to a Mixamo skeleton and animated with crossfaded clips. The default look is seeded from your identity, so a fresh visitor never collides with someone else's appearance — yet the customization travels as a handful of bytes, perfect for P2P broadcast. (Prefer a polished look? The CharacterSystem hot-swaps to a Ready Player Me GLB live, pausing render, clearing buffers and rebuilding bind groups.)

And the part with no server at all

Here's the thing that ties it together: there is no backend. Multiplayer presence, identity, land and assets all run on GenosDB, a serverless, peer-to-peer, real-time database — WebRTC transport, Nostr signaling, no game server in the middle. Sign in with biometrics or a seed phrase via its Security Manager; your avatar appears in other players' worlds over the open internet.

That side of the story — how one distributed database drives the entire live state of a 3D world — is a whole post of its own, on genosdb.com. This one was about the world; that one is about the engine room.

Why build it on the open web

OVGrid is, deliberately, a web app — a static WebGPU build, instant and linkable, no install and no gatekeeper. It's an open social world, not a game: a persistent planet people explore and inhabit, with structured games as optional layers to come later. It's built incrementally, in the open, and it's MIT-licensed.

A few entry points if you want to go deeper:

Play: https://world.ovgrid.com
Code: https://github.com/estebanrfp/ovgrid
Every tunable, documented: docs/configuration
Ask the codebase in natural language: DeepWiki

If you've ever wondered what it takes to put a real-scale planet in a browser tab — the repo is open, and most of these decisions have their own deep-dive doc. Go fly around it, then read how it holds together.

Engineering a Planet in the Browser: The Decisions Behind OVGrid

1. The scale problem: why a planet shakes — and Floating Origin

2. A whole planet of terrain: a GPU-driven quadtree

3. What to draw: hybrid two-stage culling

4. The decision I'm proudest of: the GPU physics "treadmill"

5. Populating the world cheaply: GPU scatter + a dither dissolve

6. A living crowd that respects the ground (NPCs)

7. Weather without a GPU budget

8. A believable sky anywhere on the planet

9. UI inside a 3D world

10. Avatars: unique per person, tiny over the wire

And the part with no server at all

Why build it on the open web

Comments

More from this blog

From Babylon.js to Native WebGPU: How OVGrid Migrated to a Framework-Free Engine

Scaling the Metaverse: Building a Planetary-Scale Engine with WebGPU

OVGrid is an Open Virtual Grid Metaverse

Summary of Progress in 2021 Working at OVGrid

Command Palette

1. The scale problem: why a planet shakes — and Floating Origin

2. A whole planet of terrain: a GPU-driven quadtree

3. What to draw: hybrid two-stage culling

4. The decision I'm proudest of: the GPU physics "treadmill"

5. Populating the world cheaply: GPU scatter + a dither dissolve

6. A living crowd that respects the ground (NPCs)

7. Weather without a GPU budget

8. A believable sky anywhere on the planet

9. UI inside a 3D world

10. Avatars: unique per person, tiny over the wire

And the part with no server at all

Why build it on the open web

Comments

More from this blog