Programming Exercise: The Party Problem

	Reinforcement Learning and Artificial Intelligence (RLAI)
	The Party Problem

This page describes a Markov decision process based on life as a student and the decisions one must make to both have a good time and remain in good academic standing. This MDP is used as an example and in some homework exercises for CMPUT 499/609 (a course at the University of Alberta).

States:

R = Rested
T = Tired
D = homework Done
U = homework Undone
8p = eight o'clock pm

Actions:

P = Party
R = Rest
S = Study
any means any action has the same effect
note not all actions are possible in all states

Red numbers are rewards

Green numbers are transition probabilities (all those not labeled are probability 1.0)

The gray rectangle denotes a terminal state.

Party Problem Programming Assignment #1:

Implement a program that models the Party Problem described above. Use any programming language of your choice. Assume that the agent follows a random equiprobable policy (i.e. the probability of picking a particular action while in a given state is equal to 1 / number of actions that can be performed from that state). Run your program for 50 episodes. For each episode, have your program print out the agent's sequence of experience (i.e. the ordered sequence of states/actions/rewards that occur in the episode) as well as the sum of the rewards received in that episode (i.e. the Return with respect to the start state) in a readable form.

What to Hand In (on paper):

The sequences of experience from each episode, including the Return observed in that episode.
The values of each state (computed by hand using the Bellman equations).
The average Return from the fifty episodes.
The source code of your program.

The second part of this assighment (below) is due one week after the first (see schedule).

Party Problem Programming Assignment #2:

Implement the policy iteration algorithm (described in Figure 4.3, p. 98) to learn the optimal policy for the Party Problem described above. Set the initial policy to "Rock & Roll all night and Party every day" (i.e. policy should choose to party regardless of what state the agent is in). Perform each policy evaluation step until the largest change in a state's value (capital delta in Figure 4.3) is smaller than 0.001 (theta in Figure 4.3). Print out the policy and the value function for each iteration (policy change) of the algorithm in a readable form.

What to Hand In (on paper):

The policy and value function for each iteration of the algorithm.
The source code of your program.

Extend this Page How to edit Style Subscribe Notify Suggest Help This open web page hosted at the University of Alberta. Terms of use 2052/0