— Ch. 1 · Neuroscience Origins And Early Models —
Recurrent neural network.
~5 min read · Ch. 1 of 6
In 1901, Santiago Ramón y Cajal observed recurrent semicircles in the cerebellar cortex. These structures formed by parallel fibers, Purkinje cells, and granule cells suggested that neural pathways could loop back on themselves. This observation challenged the prevailing view of the brain as a strictly feedforward system. By 1933, Lorente de Nó discovered recurrent reciprocal connections using Golgi's method. He proposed that excitatory loops explained aspects of the vestibulo-ocular reflex. During the 1940s, multiple researchers began proposing feedback mechanisms within the brain. Hebb considered reverberating circuits as an explanation for short-term memory. The McCulloch and Pitts paper from 1943 introduced a neuron model containing cycles. They studied networks where current activity could be affected by events indefinitely far in the past. Recurrent inhibition was proposed in 1946 as a negative feedback mechanism in motor control. Neural feedback loops became a common topic at the Macy conferences. Frank Rosenblatt published close-loop cross-coupled perceptrons in 1960. These were three-layered networks with recurrent connections in the middle layer. Kaoru Nakano published similar networks in 1971. Shun'ichi Amari followed with his own work in 1972. A researcher named Hopfield acknowledged these early efforts in his 1982 paper.
The Vanishing Gradient Crisis And LSTM
Traditional recurrent neural networks suffered from the vanishing gradient problem. This issue limited their ability to learn long-range dependencies. Errors would vanish exponentially quickly over time lags between important events. Hochreiter and Schmidhuber invented Long Short-Term Memory networks in 1995. This architecture solved the vanishing gradient problem effectively. It prevented backpropagated errors from vanishing or exploding. LSTMs allowed errors to flow backward through unlimited numbers of virtual layers unfolded in space. The system could learn tasks requiring memories of events thousands or millions of discrete time steps earlier. Later, Gated Recurrent Units emerged as a more computationally efficient alternative. GRUs were introduced in 2014. They had fewer parameters than LSTMs because they lacked an output gate. Performance on polyphonic music modeling and speech signal modeling was found to be similar to that of long short-term memory. There does not appear to be particular performance difference between LSTM and GRU. Around 2006, bidirectional LSTM started to revolutionize speech recognition. These models outperformed traditional systems in certain speech applications. They broke records for improved machine translation and language modeling.