a language model for notes.
Swap a 50,000-word vocabulary for a handful of note events — each a pitch and a duration — and a "language model" shrinks to a table of counts: what event tends to follow what. Learn it from nursery tunes, then sample new melodies (pitch and rhythm) and play them on a flute. The whole "model" is a few hundred numbers.
Row = current pitch, column = next pitch; brighter = more likely (summed over durations). The bright diagonal band is stepwise motion — idiomatic flute writing.
Tokens are pitch × duration pairs. Raise Context for longer-range phrasing; raise Temperature to wander off the trained tunes.
G4·♩. For context length k we count how often each event follows each k-event window; that table is the whole model. To generate, look up the current window, draw the next event in proportion to those counts (raised to 1/temperature), slide, repeat — backing off to a shorter window if one's unseen. The model learns melody and rhythm together: long notes land on strong beats, short notes run in pairs, just like the corpus.
M answers "what state follows this one?", here "what note-event follows this one?" (one is a VSA bundle, one is a count table; same job). And temperature is the agent's exploration knob ε: low replays the safe learned line, high explores and risks nonsense. Widening the token from pitch to pitch×duration is the same trade every LLM makes — a richer vocabulary captures more, but needs more data to stay coherent.