Skip to content
/ Glossary

Markov Decision Process (MDP)

A mathematical model for decision-making in situations with randomness and partial control by a decision-maker.
Definition

A Markov Decision Process (MDP) is a framework used in decision theory, operations research, and reinforcement learning to model environments for decision-making where outcomes are partly due to chance and partly under the control of a decision-maker. MDPs are characterized by a set of states, a set of actions available in each state, transition probabilities that determine the likelihood of moving from one state to another given an action, and a reward function that assigns rewards to state-action pairs.

The objective in an MDP is to find a policy (a strategy for choosing actions based on the current state) that maximizes the expected sum of rewards over time, often referred to as the return. MDPs assume the Markov property, meaning that the future state depends only on the current state and the action taken, not on the sequence of events that preceded it.

Examples/Use Cases:

In robotics, an MDP can model a robot's navigation through a maze where each location in the maze is a state, the actions are the directions the robot can move, the transition probabilities might reflect the uncertainty in the robot's movement (e.g., slipping or wheel error), and the rewards could be assigned to reaching the goal and avoiding obstacles.

The solution to the MDP, in this case, would be a policy that guides the robot to the goal efficiently while minimizing collisions. Another example is in automated game playing, such as a board game where the states are the possible configurations of the game board, the actions are the legal moves, the transition probabilities may account for elements of chance (like dice rolls), and the rewards are associated with winning, losing, or strategic advantages.

The MDP framework helps in developing strategies that maximize the chances of winning the game.

/ GET STARTED

Join the #1 Platform for AI Training Talent

Where top AI builders and expert AI Trainers connect to build the future of AI.
Self-Service
Post a Job
Post your project and get a shortlist of qualified AI Trainers and Data Labelers. Hire and manage your team in the tools you already use.
Managed Service
For Large Projects
Done-for-You
We recruit, onboard, and manage a dedicated team inside your tools. End-to-end operations for large or complex projects.
For Freelancers
Join as an AI Trainer
Find AI training and data labeling projects across platforms, all in one place. One profile, one application process, more opportunities.