present · rust · c++ · webassembly · web audio api · design · in progress

Correspondence 通感

a generative music engine where pixel color becomes jazz piano in real time. designed around 留白 — the chinese aesthetic principle of meaningful negative space.
demo · in development source · private during prototype

Every painting has music inside it. Every piece of music has an image inside it. The Chinese literary tradition has a name for this crossing: 通感 — the deliberate flow between senses, where the feeling that lives between the image and the sound becomes the meaning. Wang Wei wrote poems that were paintings. Debussy made sound that behaves like light on water. Monet left empty water in his lily ponds because the empty water was where the painting lived.

Correspondence is that idea made computational — a system that reads the pixels of a portrait and synthesizes jazz piano in real time. The colors become notes. The brightness becomes tempo. The empty pixels become silence — not absence, but the active silence that lets the surrounding sounds breathe. The image becomes a score. The user becomes the performer.

01system overview

Correspondence is a four-layer pipeline. A visual input becomes structured musical parameters, which feed a procedural synthesis engine, which outputs audio with low enough latency to feel responsive to the user's gesture. Each layer is intentionally separated so the mapping rules can evolve without rewriting the synthesis layer, and vice versa.

┌─────────────────────────────────────────────────────────────┐ │ BROWSER │ │ │ │ ┌─────────────┐ ┌─────────────┐ ┌────────────┐ │ │ │ CANVAS │ │ MAPPING │ │ GENERATIVE │ │ │ │ (input) │ ───▶ │ ENGINE │ ───▶ │ JAZZ │ │ │ │ │ │ │ │ ENGINE │ │ │ │ pixel rgb │ │ hsb → midi │ │ │ │ │ │ + cursor │ │ + rules │ │ rust / │ │ │ └─────────────┘ └─────────────┘ │ c++ wasm │ │ │ ▲ └─────┬──────┘ │ │ │ │ │ │ │ ▼ │ │ │ ┌─────────────┐ │ │ │ │ WEB AUDIO │ │ │ └──── user gesture ◀──────────────┤ OUTPUT │ │ │ └─────────────┘ │ └─────────────────────────────────────────────────────────────┘

layer 1 — visual input

A character's pixel portrait is rendered to a canvas element. The system reads pixel values continuously from a small region around the user's cursor or touch position. RGB is converted to HSB so that perceptual properties drive the music instead of raw color channels — hue, saturation, and brightness map more naturally to musical intuition than the red, green, and blue components do.

layer 2 — mapping engine

The mapping layer is where character identity lives. Each character has its own ruleset that translates HSB values into musical parameters. The mappings are deliberately not linear and not universal — BoJack's blue-yellow palette feeds a different scale than Princess Carolyn's pink. The point is not to convert color into sound mechanically. The point is to find the feeling that lives between them.

layer 3 — generative jazz engine

The synthesis core is written in Rust with some C++ DSP primitives, compiled to WebAssembly. It is not a sample player — every note is generated from scratch using oscillator banks shaped by ADSR envelopes and a gentle reverb tail. Phrases are produced by walking probabilistic state machines that encode each character's musical personality. The engine has pauses built in. It is allowed to stop.

layer 4 — real-time loop

As the user moves across the portrait, the colors under their cursor feed the engine continuously. Audio output runs through the Web Audio API at a target buffer of ~5ms. The image is the score, the user is the performer, and the music shifts as they explore. The same portrait is never played the same way twice.

02mapping design

The relationship between color and sound is the soul of the project. The current mapping is a starting point — a simple, defensible baseline that will evolve through listening tests once the engine is running.

hue
pitch
mapped to a character-specific scale, not the full chromatic range
saturation
dynamics
desaturated colors play softer, fully saturated colors play louder
brightness
tempo · articulation
dark regions slow down and stretch, bright regions move faster and brighter

An empty pixel — pure background — produces no note. But it also does not produce nothing. It produces an active pause that lets the surrounding music expand into it. This is the most important design decision in the project, and it comes directly from the principle of 留白.

03technical decisions

core language
Rust — for memory safety without garbage collection. Real-time audio cannot tolerate unpredictable GC pauses, and Rust is the closest a modern language gets to C++ control with stronger guarantees.
dsp primitives
C++ — for mature audio libraries and decades of DSP heritage. Used selectively where Rust crates are still maturing.
delivery
WebAssembly — runs the engine inside any modern browser at near-native speed. No installation, no platform fragmentation, low friction for first listeners.
audio output
Web Audio API — for the final output stage and the audio graph. The Wasm module produces sample buffers; Web Audio routes them to the system.
synthesis
Oscillator banks (sine + triangle), ADSR envelopes, convolution reverb. No samples, no recordings — every note is built from waveforms in real time.
latency target
~5ms audio buffer. Below the threshold where gesture and sound feel disconnected.
visual layer
Canvas / WebGL for pixel reading and portrait rendering. Lightweight, no framework dependency for the engine.

Real-time audio sits in the same family of problems — small budgets, hard constraints, no room for "usually works."

04留白 · negative space as a design principle

留白 (liúbái) means "leaving white" — the negative space in Chinese ink painting that is not absence but presence in a different form. A Wang Wei landscape with mist covering half the mountain makes you feel the mountain more completely than a fully painted one would. The mist holds the mountain in place.

Most generative music systems try to fill every moment. Correspondence does the opposite. The principle expresses itself across the system at every layer:

In the portrait — empty pixels produce active silence, not nothing.
In the engine — pauses are first-class citizens, not gaps between notes.
In the interface — nothing competes for attention except the portrait and the sound.

Jazz already understands this. Miles Davis built entire phrases out of the notes he chose not to play. The empty pixel and the unplayed note are the same gesture in two different traditions.

05what doesn't exist yet

I looked for prior art before starting. There are color-to-pitch sonification toys, but most use direct linear mappings — blue becomes a low note, red becomes a high note — without character-specific rules or musical structure. There are AI music generators like Suno that produce songs from text prompts, but they are disconnected from visual data and the human is not really in control. There are pixel art tools, and there are generative music projects, but no system combines them in a way that lets a specific character's visual identity become a specific musical identity that the user can play in real time.

That is the space Correspondence is trying to explore.

06roadmap

designed
System architecture and mapping rules
four-layer pipeline, character-specific mappings, philosophical grounding in 留白
building
Rust audio synthesis core
oscillator banks, envelopes, audio output through Web Audio. the foundation everything else depends on.
next
WebAssembly bridge and canvas integration
expose engine to browser, hook up pixel reading, real-time gesture loop.
then
First character — BoJack Horseman
portrait, mapping rules, generative ruleset. the proof that a character can become an instrument.
later
Five characters and an interactive demo
share the first version publicly. listen to what changes when the engine meets real listeners.

07influences

08related work

Correspondence grew out of back-in-the-90s, where I first synthesized jazz piano in the browser using the Web Audio API. The Voices tab in that project planted the question: what if the character's portrait could generate the music instead of the other way around?

· · ·