Decision Making Under Uncertainty

What are POMDPs?

  • a problem formulation that enable optimal sequential decisions to be made in uncertain environments

Were covering…

  • Sequential Decision Making
    • Markov decision processes (MDPs)
    • partially observable markov decision processes (POMDPs)
  • Solution Method = algos to solve MDP/POMDP
    • online and offline solvers
    • value function approximation
  • Simulations
  • State Estimation using Particle Filters
  • Deep Reinforcement Learning
  • Imitation Learning
  • Black-box Validation

Common problems

  • MDP: Grid World – Agent moving around a grid world, looking for rewards
  • POMDP: Crying Baby – when to feed baby based on crying observations
  • MDP: 1D Random Walk – Agent moves around number line
  • POMDP: 2D Random Walk – estimating state of a moving agent based on observations
  • MDP: Mountain Car – Reach a goal up a hill, starting in a valley
  • MDP: Swinging Pendulum – Balance a swinging pendulum upright

Markov decision process

MDP is a problem formulation that defines how an agent takes sequential actions from states in its environment, guided by rewards–using uncertainty in how it transitions from state to state

  • formally is defined by:

    \(S, A, T, R, \gamma\) S - State Space - `POMDPs.states` A - Action Space - `POMDPs.actions` T(s'|s,a) - Transition function - `POMDPs.transition` R(s,a) - Reward function - `POMDPs.reward` \(\gamma \epsilon [0,1]\) - Discount factor - `POMDPs.discount`

Note: Remember! a MDP is problem formulation and not an algo

  • an MDP formulation enables the use of solution methods, i.e. algos

MDP Example: Grid World

  • In grid world, an agent moves around a grid attempting to collect as much reward as possible, avoiding negative rewards

Defn: State space S

  • a set of all possible states an agent can be in (discrete or continuous) e.g. all possible (x,y) cells in a 10x10 grid (i.e. 100 discrete states)

Defn: Action space A

  • four (discrete) cardinal directions [up, down, left, right]

Defn: Transition function/model T(s'|s,a)

  • defines hwo the agent transitions from the current state s to the next state s' when taking action a
    • returns a probability distribution over all possible next states s' given (s,a) e.g. Stochastic transition (incorporates randomness/uncertainty)
      • Action a = up from state s 70% chance of transitioning correctly 30% chances (10% x 3) of transitioning incorrectly