Reinforcement Learning and
Artificial
Intelligence (RLAI)
CMPUT
499/609: Monte Carlo Programming Assignment
Objective
& Description
The goal of this assignment is to become familiar
with the Monte Carlo and TD learning algorithms and to get comfortable working
with the RL-Glue
framework.
In this assignment, you will implement a Monte Carlo
- Exploring Start (ES) agent and a Sarsa ES agent to play varients of the game Blackjack.
You are asked to come up with these agents' policies for playing Blackjack
assuming that:
The number of cards
in a suit that take a value of ten (i.e.
10s, Jacks, Queens, & Kings) is doubled from 4 to 8. Thus
each card dealt has probability 8/17 of being a ten and probability
1/17 of being each of Ace, two, three,..., and nine.
The number of cards in a suit that take a value of ten
(i.e.
10s, Jacks, Queens, & Kings) is halved from 4 to 2. Thus each
card dealt has probability 2/11 of being a ten and probability 1/11 of
being each of Ace, two, three,..., and nine.
Remember that cards are dealt with replacement,
as if from an infinite deck, so there is no need to keep track of which
cards have already been dealt. The state space remains the same
as in the original problem.
The Blackjack environment can be found as part of
the RL-Glue framework in the directory 'Env'. Also, much of the Grid
World Benchmark code that comes with RL-Glue will come in handy.
The Monte Carlo ES algorithm is described in Figure
5.4, p. 120 of the textbook. The Sarsa algorithm is described in Figure
6.9, p. 146 of the textbook, but you will have to modify it to use
exploring starts. Have the agent learn over the course of
100,000 episodes. A description of the game of Blackjack appears as
part
of
Example 5.1 in the textbook.
What to hand
In (On paper):
Graphs of the policy of the two agents for both
conditions described above. An example of what I'm looking for can be
found on the left side of Figure 5.5 in the textbook. You can
draw the graphs by hand (carefully, cleanly, clearly) but you must
compute the policies with your program.
The source code of your agents.
Reminder: Assignments must be your own work. It is ok to
talk to other students, but not to exchange code of any kind. A
good rule of thumb is that if you talk to someone, don't bring a
pencil. Email is generally not a good idea, but follow the spirit
of this rule of thumb in any email.