Questions about Markov decision process

Short answers, pulled from the story.

What is a Markov decision process and when did it emerge?

A Markov decision process emerged from operations research teams in the early 1950s who constructed mathematical models to handle sequential decisions under uncertainty. This term arose from a lineage of probabilistic thinking that built upon earlier work by Russian mathematician Andrey Markov regarding stochastic chains.

Who published a paper on stochastic games in 1953 that included value iteration as a special case?

Lloyd Shapley published a paper on stochastic games in 1953 that included value iteration as a special case. This historical moment marked the formal introduction of methods now central to modern artificial intelligence.

How does a Markov decision process function as a four-tuple containing distinct elements for states actions transition probabilities and rewards?

A Markov decision process functions as a four-tuple containing distinct elements for states, actions, transition probabilities, and rewards. The set of states forms the state space which may be discrete or continuous like real numbers while the set of actions constitutes the action space available from any given state.

Why does the curse of dimensionality restrict exact solution techniques to problems with compact representations despite polynomial time algorithms existing?

The curse of dimensionality restricts exact solution techniques because problem representation size often grows exponentially with the number of state and action variables. Online planning techniques like Monte Carlo tree search can find useful solutions in larger problems without full enumeration when dynamic programming requires an explicit model.

What is a partially observable Markov decision process and how do constrained Markov decision processes solve using linear programs only because dynamic programming does not work for these constrained scenarios?

When the current state remains unknown during action selection the problem becomes a partially observable Markov decision process or POMDP. Constrained Markov decision processes introduce multiple costs incurred after applying an action instead of just one and CMDPs solve using linear programs only because dynamic programming does not work for these constrained scenarios.