Markov Decision Process (MDP)
A Markov Decision Process (MDP) is a framework used in decision theory, operations research, and reinforcement learning to model environments for decision-making where outcomes are partly due to chance and partly under the control of a decision-maker. MDPs are characterized by a set of states, a set of actions available in each state, transition probabilities that determine the likelihood of moving from one state to another given an action, and a reward function that assigns rewards to state-action pairs.
The objective in an MDP is to find a policy (a strategy for choosing actions based on the current state) that maximizes the expected sum of rewards over time, often referred to as the return. MDPs assume the Markov property, meaning that the future state depends only on the current state and the action taken, not on the sequence of events that preceded it.
In robotics, an MDP can model a robot's navigation through a maze where each location in the maze is a state, the actions are the directions the robot can move, the transition probabilities might reflect the uncertainty in the robot's movement (e.g., slipping or wheel error), and the rewards could be assigned to reaching the goal and avoiding obstacles.
The solution to the MDP, in this case, would be a policy that guides the robot to the goal efficiently while minimizing collisions. Another example is in automated game playing, such as a board game where the states are the possible configurations of the game board, the actions are the legal moves, the transition probabilities may account for elements of chance (like dice rolls), and the rewards are associated with winning, losing, or strategic advantages.
The MDP framework helps in developing strategies that maximize the chances of winning the game.