Monte Carlo Programming Assignment

	Reinforcement Learning and Artificial Intelligence (RLAI)
	CMPUT 499/609: Monte Carlo Programming Assignment

Objective & Description

The goal of this assignment is to become familiar with the Monte Carlo and TD learning algorithms and to get comfortable working with the RL-Glue framework.

In this assignment, you will implement a Monte Carlo - Exploring Start (ES) agent and a Sarsa ES agent to play varients of the game Blackjack. You are asked to come up with these agents' policies for playing Blackjack assuming that:

The number of cards in a suit that take a value of ten (i.e. 10s, Jacks, Queens, & Kings) is doubled from 4 to 8. Thus each card dealt has probability 8/17 of being a ten and probability 1/17 of being each of Ace, two, three,..., and nine.
The number of cards in a suit that take a value of ten (i.e. 10s, Jacks, Queens, & Kings) is halved from 4 to 2. Thus each card dealt has probability 2/11 of being a ten and probability 1/11 of being each of Ace, two, three,..., and nine.

    Remember that cards are dealt with replacement, as if from an infinite deck, so there is no need to keep track of which cards have already been dealt. The state space remains the same as in the original problem.

    The Blackjack environment can be found as part of the RL-Glue framework in the directory 'Env'. Also, much of the Grid World Benchmark code that comes with RL-Glue will come in handy.

    The Monte Carlo ES algorithm is described in Figure 5.4, p. 120 of the textbook. The Sarsa algorithm is described in Figure 6.9, p. 146 of the textbook, but you will have to modify it to use exploring starts. Have the agent learn over the course of 100,000 episodes. A description of the game of Blackjack appears as part of Example 5.1 in the textbook.

    What to hand In (On paper):

Graphs of the policy of the two agents for both conditions described above. An example of what I'm looking for can be found on the left side of Figure 5.5 in the textbook. You can draw the graphs by hand (carefully, cleanly, clearly) but you must compute the policies with your program.
The source code of your agents.

Reminder: Assignments must be your own work. It is ok to talk to other students, but not to exchange code of any kind. A good rule of thumb is that if you talk to someone, don't bring a pencil. Email is generally not a good idea, but follow the spirit of this rule of thumb in any email.

Extend this Page How to edit Style Subscribe Notify Suggest Help This open web page hosted at the University of Alberta. Terms of use 1880/0