— Ch. 1 · Foundations And Definitions —
Hidden Markov model.
~5 min read · Ch. 1 of 6
In 1960, Leonard Baum and his colleagues published a series of statistical papers that introduced the hidden Markov model. This mathematical structure describes a system where an observer sees outcomes but cannot directly see the underlying process driving them. The model requires two distinct stochastic processes: one hidden state sequence and one observable output sequence. The hidden states form a Markov chain, meaning the next state depends only on the current state. Observations depend probabilistically on these hidden states in a known way. Since the states remain invisible, researchers must infer their nature by analyzing the visible data. A discrete-time example involves drawing balls from urns placed inside a sealed room. An unseen genie selects an urn and draws a ball onto a conveyor belt for observation. The observer sees the labeled balls but never knows which specific urn generated each draw. This setup creates a scenario where the probability of the third ball drawn depends on the previous urn choice, yet the urn itself remains hidden from view.
Inference Algorithms
Researchers developed computational methods to solve the problem of finding the most likely sequence of hidden states given observed outputs. The Viterbi algorithm efficiently calculates this maximum likelihood path through the state space. It evaluates joint probabilities for candidate sequences by multiplying transition and emission values. Another critical method is the forward algorithm, which computes the total probability of observing a specific sequence. This approach sums over all possible hidden state paths using dynamic programming principles. Filtering tasks determine the distribution of the final hidden state at the end of a sequence. Smoothing algorithms work backward to estimate the probability of past states relative to current observations. These techniques allow systems to make predictions about weather patterns or speech sounds based on limited data. For instance, Alice might guess Bob's activity based on his daily reports while knowing only general weather trends. The mathematical framework ensures that calculations remain tractable even when dealing with complex chains of events.