Every painting has music inside it. Every piece of music has an image inside it. The Chinese literary tradition has a name for this crossing: 通感 — the deliberate flow between senses, where the feeling that lives between the image and the sound becomes the meaning. Wang Wei wrote poems that were paintings. Debussy made sound that behaves like light on water. Monet left empty water in his lily ponds because the empty water was where the painting lived.
Correspondence is that idea made computational — a system that reads the pixels of a portrait and synthesizes jazz piano in real time. The colors become notes. The brightness becomes tempo. The empty pixels become silence — not absence, but the active silence that lets the surrounding sounds breathe. The image becomes a score. The user becomes the performer.
01system overview
Correspondence is a four-layer pipeline. A visual input becomes structured musical parameters, which feed a procedural synthesis engine, which outputs audio with low enough latency to feel responsive to the user's gesture. Each layer is intentionally separated so the mapping rules can evolve without rewriting the synthesis layer, and vice versa.
layer 1 — visual input
A character's pixel portrait is rendered to a canvas element. The system reads pixel values continuously from a small region around the user's cursor or touch position. RGB is converted to HSB so that perceptual properties drive the music instead of raw color channels — hue, saturation, and brightness map more naturally to musical intuition than the red, green, and blue components do.
layer 2 — mapping engine
The mapping layer is where character identity lives. Each character has its own ruleset that translates HSB values into musical parameters. The mappings are deliberately not linear and not universal — BoJack's blue-yellow palette feeds a different scale than Princess Carolyn's pink. The point is not to convert color into sound mechanically. The point is to find the feeling that lives between them.
layer 3 — generative jazz engine
The synthesis core is written in Rust with some C++ DSP primitives, compiled to WebAssembly. It is not a sample player — every note is generated from scratch using oscillator banks shaped by ADSR envelopes and a gentle reverb tail. Phrases are produced by walking probabilistic state machines that encode each character's musical personality. The engine has pauses built in. It is allowed to stop.
layer 4 — real-time loop
As the user moves across the portrait, the colors under their cursor feed the engine continuously. Audio output runs through the Web Audio API at a target buffer of ~5ms. The image is the score, the user is the performer, and the music shifts as they explore. The same portrait is never played the same way twice.
02mapping design
The relationship between color and sound is the soul of the project. The current mapping is a starting point — a simple, defensible baseline that will evolve through listening tests once the engine is running.
An empty pixel — pure background — produces no note. But it also does not produce nothing. It produces an active pause that lets the surrounding music expand into it. This is the most important design decision in the project, and it comes directly from the principle of 留白.
03technical decisions
Real-time audio sits in the same family of problems — small budgets, hard constraints, no room for "usually works."
04留白 · negative space as a design principle
留白 (liúbái) means "leaving white" — the negative space in Chinese ink painting that is not absence but presence in a different form. A Wang Wei landscape with mist covering half the mountain makes you feel the mountain more completely than a fully painted one would. The mist holds the mountain in place.
Most generative music systems try to fill every moment. Correspondence does the opposite. The principle expresses itself across the system at every layer:
In the portrait — empty pixels produce active silence, not nothing.
In the engine — pauses are first-class citizens, not gaps between notes.
In the interface — nothing competes for attention except the portrait and the sound.
Jazz already understands this. Miles Davis built entire phrases out of the notes he chose not to play. The empty pixel and the unplayed note are the same gesture in two different traditions.
05what doesn't exist yet
I looked for prior art before starting. There are color-to-pitch sonification toys, but most use direct linear mappings — blue becomes a low note, red becomes a high note — without character-specific rules or musical structure. There are AI music generators like Suno that produce songs from text prompts, but they are disconnected from visual data and the human is not really in control. There are pixel art tools, and there are generative music projects, but no system combines them in a way that lets a specific character's visual identity become a specific musical identity that the user can play in real time.
That is the space Correspondence is trying to explore.
06roadmap
07influences
- Wang Wei — the master of cross-sensory poetry and painting
- Claude Debussy — Clair de Lune, La Mer, sound that behaves like light on water
- Claude Monet — water lilies, the empty water that holds the painting together
- Miles Davis — the silence between notes
- Bill Evans — Peace Piece, jazz piano that thinks out loud
- The Legend of 1900 — Magic Waltz, Playing Love
- BoJack Horseman — the blue-and-yellow California melancholy of the opening title
08related work
Correspondence grew out of back-in-the-90s, where I first synthesized jazz piano in the browser using the Web Audio API. The Voices tab in that project planted the question: what if the character's portrait could generate the music instead of the other way around?