— Ch. 1 · Origins In The 1950s —
Markov decision process.
~4 min read · Ch. 1 of 6
Operations research teams in the early 1950s began constructing mathematical models to handle sequential decisions under uncertainty. These researchers built upon earlier work by Russian mathematician Andrey Markov, who developed stochastic chains decades prior. The term Markov decision process emerged from this lineage of probabilistic thinking. A specific connection exists between these new models and the underlying structure of state transitions that follow the Markov property. The process gained recognition across ecology, economics, healthcare, telecommunications, and reinforcement learning fields after its initial development. Lloyd Shapley published a paper on stochastic games in 1953 that included value iteration as a special case. This historical moment marked the formal introduction of methods now central to modern artificial intelligence.
Four Components Defined
A Markov decision process functions as a four-tuple containing distinct elements for states, actions, transition probabilities, and rewards. The set of states forms the state space, which may be discrete or continuous like real numbers. The set of actions constitutes the action space available from any given state. Transition probability defines the likelihood that an action taken at time t leads to a specific next state. Immediate reward represents the expected gain received after transitioning from one state to another via a chosen action. A policy function maps the state space directly to the action space to guide decision making. This framework incorporates cause and effect relationships while managing uncertainty and nondeterminism. The goal remains finding a good policy that specifies the action for each state.