Reinforcement Learning and
Artificial
Intelligence (RLAI)
Thought questions
For each reading (or as indicated in the schedule) you should submit
two thought questions in
hardcopy in class on the day they are due. Send them by email to
sutton@cs.ualberta.ca only if this is not possible for you. (this is a
change in policy.)
A thought question is a question about the overall content of a
reading. It is meant to force you to actually think about what
you have read and to react to it in some meaningful way. Your questions
can be general, even vague, but they should be heartfelt. For
each question, list at least two possible answers or kinds of
answers. Be prepared to read one of your questions and its
possible answers aloud in class. There will be a random draw to
determine which students will read their questions. If you can't attend
a class, ask another student to read your question should your name
come up in the random draw. If you don't attend and there is no
one else to read your question then you will get zero points for that
reading's thought questions.
This year what you send in doesn't have to be in the form of
questions. the grading will be as follows (X2 for the 2
questions):
0 if you do nothing 1 for sending something 2 for saying something about the content of the chapter 3 for convincing me that you read and thought about the contents of the chapter Due by class time of lecture where chapter is covered.
Points to remember:
If you have to submit by email:
- paste questions into the body of the email (no attachments)
- Put "[thought]" in the subject
Questions must be in before the start of class for full marks, but you still must do them even if they are late.
Give answers YOU think are correct. If you are restating the content of the text book, you must add your own thoughts.
Brainstorm. A great thought
question is the start of a research idea, not a potential exam question.
Think about the chapter as a whole as well as picky details. It may
help to think of "What caught my attention in this chapter? How would I
begin to investigate this interesting question?"
I will monitor this page. Extend it with questions you want to discuss.
Examples of thought questions
Here are two questions of the right form, but which don't show any real
engagement with the content of the reading. You would get 1
point for each of these.
1. Is reinforcement learning really applicable to more than a narrow range of topics?
A) Yes
B) No
2. Do we need to know about policy gradient search for the midterm?
A) Yes
B) No
C) I'm not telling
Below are some 3-point questions/answers. Some of these are quite
long, but that is not essential or even desirable. They just have
to show you have put some thought into the chapter, preferably about
its whole, overall content as opposed to asking particularly about one
small part of it.
1. How could curiosity fit in the reinforcement learning framework?
A) The concept of curiosity could be applied to the exploration policy (as a meta-layer, somehow).
B) A "curiosity satisfaction" number could be added
to the reward signal from the environment to encourage an agent to
learn about new things.
2. When you are formulating an application as a reinforcement learning problem, what is a good way to inject domain knowledge?
A) Constrain the action set in certain states (don't allow bad moves)
B) Add reward to behaviors that you know (heuristically) are good.
C) Change the initial value function.
D) Describe the state space with added parameters representing knowledge about the domain.
E) Use a model for extra data
1. In section X, why do we do only a one step backup of the value function?
A) No particular reason, we'll get to other options.
B) The improvement in the error of the value
function is most significant for the first update. With discounted
reward, the improvement decreases on each step and it is more time
efficient to focus on the most important updates.
C) If there is a stochastic reward signal and the
update is a step in the wrong direction, or if there was significant
error in the value in the first place and the update only took a small
step in the right direction, the benefit of propagating the new value
back is not worth the work.
D) In certain cases, it would be worth propagating
it back, like when data is sparse. Propagating the changes back on each
step is not as clear as repeating the trajectory.
E) It requires a transition model (like for Dynamic
Programming). It doesn't make sense to just use the trajectory
experienced.
F) Each step would take an increasing amount of
computation, and we only have a time step's worth of computation
resources. But this could be addressed by putting a limit on the number
of updates per time step.
2. In the elevator control problem, the elevators were treated as
completely independent agents and each elevator controller did not
consider the behavior of the other elevators. Doesn't that lead to
problems like more than one elevator rushing to pick up the same person?
A) Yes, but the simplification made the elevator
controllers better, which balanced out occasional wrong decisions.
Multiple elevators picking up the same person is not as serious a
problem as a person not being picked up, and the value function
reflects this.
B) Yes, but it happens rarely enough that it doesn't matter.
C) No. Other constraints added to the problem (not
allowing an elevator to stop when another elevator was stopped)
prevented this from happening.
D) No. Having a shared reward signal adds enough
interaction, and the elevators select actions independently that end up
working together.