date |
day |
topic |
Assignment due |
10-Jan |
Tuesday |
Intro, logistics, requirements,
expectations |
|
12-Jan |
Thursday |
Introduction |
Read all of Chapter 1; 2 thought
questions |
17-Jan |
Thursday |
Bandit Methods |
Read the nonstarred sections of
Chapter 2 plus 2.8; 2 thought questions |
19-Jan |
Thursday |
Bandit Methods |
Exercises 2.1, 2.5, and 2.55; 2.8 is
extra credit |
24-Jan |
Tuesday |
Markov Decision Processes |
Read all of Chapter 3; 2 thought
questions |
26-Jan |
Thursday |
Value Functions |
Exercises 3.4, 3.5, (3.6 is extra
credit), 3.8 (omit final part re eq 3.10), 3.9, 3.10, 3.11, 3.15, 3.17 |
31-Jan |
Tuesday |
RL-Glue, RL-Library |
Implement Party Problem MDP;
generate 50 episodes with the random policy and compute the average
return at start state; compute state values |
2-Feb |
Thursday |
Dynamic Programming |
Read Chapter 4; 2 thought questions |
7-Feb |
Tuesday |
Dynamic Programming |
Exercises 4.1, 4.2, 4.3, 4.5, 4.9;
Implement policy iteration on the Party Problem; show sequence of
policies and value fns, starting with the policy that always parties |
9-Feb |
Thursday |
Monte Carlo Methods |
Read Chapter 5; 2 thought questions |
14-Feb |
Tuesday |
Monte Carlo Control |
Exercises 5.1, 5.2, 5.5 |
16-Feb |
Thursday |
Temporal Difference Learning |
Read Chapter 6; 2 thought questions;
apply MC ES to the blackjack environment using RL-Glue; plot policies
for the 'twice/half as many 10s' cases |
28-Feb |
Tuesday |
Temporal Difference Learning |
Exercises
6.1,6.2,6.3,6.8,6.9,6.10,6.12 |
2-Mar |
Thursday |
Special lecture; the exam questions |
Apply Sarsa control, e-greedy with
epsilon=0.1, to the cat-and-mouse problem [cancelled] |
7-Mar |
Tuesday |
Midterm Exam |
|
9-Mar |
Thursday |
Integrating Monte Carlo and
Temporal-difference Methods |
Read Chapter 7; 2 thought questions |
14-Mar |
Tuesday |
Eligibility Traces |
Exercises 7.2 and 7.6
|
16-Mar |
Thursday |
Function Approximation |
Read Chapter 8; 2 thought questions
|
21-Mar |
Tuesday |
Function Approximation |
Exercises 8.1, 8.2, 8.6 and 8.7;
First function approx programming assignment |
23-Mar |
Thursday |
Policy Gradient Methods with
Function Approximation |
Reading: LSTD(lambda) by Boyan, 2
thought questions |
28-Mar |
Tuesday |
Integrating Learning and Planning:
Dyna |
Read Chapter 9; 2 thought questions;
2nd function approx programming assignment |
30-Mar |
Thursday |
Model-based backups |
Exercises 9.1,9.2,9.3,9.5 (9.6 is
extra credit); Read Chapter 10 |
4-Apr |
Tuesday |
Guest lecture by Michael Bowling
|
Read Bowling and Veloso paper, 2 thought questions; 1-page mini-project proposal due |
6-Apr |
Thursday |
Advanced topic: Temporal abstraction
and hierarchy |
Read the options paper sections 1-3 and 7-8, 2 thought questions |
11-Apr |
Tuesday |
Advanced topic: Hidden State:
POMDPs, PSRs, TD nets |
Read Chapter 11, 2 thought questions |
13-Apr |
Thursday |
mini-projects
|
mini-project due |
19-Apr |
Wednesday |
final exam |
MEC 4 3
|