The term softmax for the action selection rule (2.2) is due to Bridle
(1990). This rule appears to have been first proposed by Luce
(1959). The parameter
is called temperature in simulated
annealing algorithms (Kirkpatrick, Gelatt and Vecchi,
1983).