•Markov Decision Processes (MDP) (see JWSR 05)
–Definition:M = (S, A, T, C) S : States, A:Actions,
•Actions may be non-deterministic T: Transitionfunction,
•States are fully observable S x A (S)
C: Cost function
S x A Real
•Perform stochastic optimization using DynamicProgramming
•Value function heuristic :
•Optimal Policy n : S A
–(Minimize expected cost)