What are POMDPs?
- a problem formulation that enable optimal sequential decisions to be made in uncertain environments
Were covering…
-
Sequential Decision Making
- Markov decision processes (MDPs)
- partially observable markov decision processes (POMDPs)
-
Solution Method
=algos to solve MDP/POMDP- online and offline solvers
- value function approximation
- Simulations
- State Estimation using Particle Filters
- Deep Reinforcement Learning
- Imitation Learning
- Black-box Validation
Common problems
- MDP: Grid World – Agent moving around a grid world, looking for rewards
- POMDP: Crying Baby – when to feed baby based on crying observations
- MDP: 1D Random Walk – Agent moves around number line
- POMDP: 2D Random Walk – estimating state of a moving agent based on observations
- MDP: Mountain Car – Reach a goal up a hill, starting in a valley
- MDP: Swinging Pendulum – Balance a swinging pendulum upright
Markov decision process
MDP is a problem formulation that defines how an agent takes sequential actions from states in its environment, guided by rewards–using uncertainty in how it transitions from state to state
-
formally is defined by:
\(S, A, T, R, \gamma\) S - State Space - `POMDPs.states` A - Action Space - `POMDPs.actions` T(s'|s,a) - Transition function - `POMDPs.transition` R(s,a) - Reward function - `POMDPs.reward` \(\gamma \epsilon [0,1]\) - Discount factor - `POMDPs.discount`
Note: Remember! a MDP is problem formulation and not an algo
- an MDP formulation enables the use of solution methods, i.e. algos
MDP Example: Grid World
- In grid world, an agent moves around a grid attempting to collect as much reward as possible, avoiding negative rewards
Defn: State space S
- a set of all possible states an agent can be in (discrete or continuous) e.g. all possible (x,y) cells in a 10x10 grid (i.e. 100 discrete states)
Defn: Action space A
- four (discrete) cardinal directions [up, down, left, right]
Defn: Transition function/model T(s'|s,a)
-
defines hwo the agent transitions from the current state s to the next state s' when taking action a
-
returns a probability distribution over all possible next states s' given (s,a) e.g. Stochastic transition (incorporates randomness/uncertainty)
- Action a = up from state s 70% chance of transitioning correctly 30% chances (10% x 3) of transitioning incorrectly
-
returns a probability distribution over all possible next states s' given (s,a) e.g. Stochastic transition (incorporates randomness/uncertainty)