The term softmax for the action selection rule (2.2) is due to Bridle (1990). This rule appears to have been first proposed by Luce (1959). The parameter is called temperature in simulated annealing algorithms (Kirkpatrick, Gelatt and Vecchi, 1983).

Richard Sutton
Fri May 30 10:02:27 EDT 1997