Markov Melody

a language model for notes.

Swap a 50,000-word vocabulary for a handful of note events — each a pitch and a duration — and a "language model" shrinks to a table of counts: what event tends to follow what. Learn it from nursery tunes, then sample new melodies (pitch and rhythm) and play them on a flute. The whole "model" is a few hundred numbers.

Context 2 events

Temperature 0.9

Length 40

Tempo 120

The model — pitch transitions

Row = current pitch, column = next pitch; brighter = more likely (summed over durations). The bright diagonal band is stepwise motion — idiomatic flute writing.

Vocabulary

pitches

durations (beats)

Tokens are pitch × duration pairs. Raise Context for longer-range phrasing; raise Temperature to wander off the trained tunes.

How it works. Every tune is a string of events — a (pitch, duration) pair, e.g. G4·♩. For context length k we count how often each event follows each k-event window; that table is the whole model. To generate, look up the current window, draw the next event in proportion to those counts (raised to 1/temperature), slide, repeat — backing off to a shorter window if one's unseen. The model learns melody and rhythm together: long notes land on strong beats, short notes run in pairs, just like the corpus.

Tie to Neuron Lab. That transition table is the same object as the maze agent's forward model in Imagine a Move — M answers "what state follows this one?", here "what note-event follows this one?" (one is a VSA bundle, one is a count table; same job). And temperature is the agent's exploration knob ε: low replays the safe learned line, high explores and risks nonsense. Widening the token from pitch to pitch×duration is the same trade every LLM makes — a richer vocabulary captures more, but needs more data to stay coherent.