Valence Tuning

The same graph as lab 11 — but now after every round each fighter looks at what happened, finds the rule they just used, and nudges its valence: +δ on success, −δ on failure. Rules never seen before get created from experience. Run a 10-round match and watch Alice's and Bob's graphs diverge — they start identical, end as two different policies.

Round 0 / 10 · alice HP 100 bob HP 100

alice's beliefs

bob's beliefs

δ 0.05

Bob:

Scripted = 10-round deterministic trace from dojo_f_v1. Random = uniform over 4 kinds.

match not started

How it works. Valence is the single dimension that drives both decision and learning — pick_counter reads it, reinforce_rule writes it. A bounded ±δ keeps drift visible without whipsawing. Rules absent from the seed library are created from experience at 0.5 ± δ, tagged with source=experience so seeded vs. learned rules stay legible.

Tie to Eddy. This is examples/cognition_dojo_f_v1.py — reinforce_rule(beliefs, counter_kind, opp_kind, delta) on the highest-valence instance of the rule that was used. Across a 10-round match alice and bob's graphs measurably diverge: same seed library, different experience, different policy. The same outcome → ±δ convention generalises to the want graph elsewhere in eddy (taichi want_from_practice).