Neural network (machine learning)
In 1805, Adrien-Marie Legendre published the method of least squares to find a rough linear fit for planetary movement data. This statistical technique became the foundation for what would eventually be called artificial neural networks over two centuries later. The simplest form of these modern systems is a feedforward network with a single layer of output nodes using linear activation functions. Inputs travel directly through weights to outputs, where the sum of products between weights and inputs is calculated at each node. Scientists minimized mean squared errors by adjusting these weights to match target values. In 1795, Carl Friedrich Gauss applied similar methods to predict celestial orbits. These early mathematical tools predated any computer hardware capable of running them. Warren McCulloch and Walter Pitts published their non-learning computational model in 1943, splitting research into biological process studies and artificial intelligence applications. D.O. Hebb proposed his learning hypothesis based on neural plasticity in the late 1940s. Farley and Clark used computational machines to simulate this Hebbian network in 1954. Rochester, Holland, Habit, and Duda created other neural network machines in 1956. Frank Rosenblatt described the perceptron in 1958, funded by the United States Office of Naval Research. R.D. Joseph noted that Farley and Clark had actually developed an earlier perceptron-like device before Rosenblatt's work.
Rosenblatt's perceptron raised public excitement for research in Artificial Neural Networks, causing the US government to drastically increase funding. This contributed to what researchers called "the Golden Age of AI" fueled by optimistic claims about emulating human intelligence. The first perceptrons did not have adaptive hidden units. Joseph discussed multilayer perceptrons with an adaptive hidden layer in 1960. Rosenblatt cited and adopted these ideas in 1962, crediting H.D. Block and B.W. Knight. These early efforts failed to produce a working learning algorithm for hidden units. Minsky and Papert published their critique in 1969, emphasizing that basic perceptrons were incapable of processing the exclusive-or circuit. Their insight was irrelevant for deep networks developed by Ivakhnenko and Amari. Research stagnated in the United States following this publication. Alexey Ivakhnenko and Lapa published the Group method of data handling in 1965, training arbitrarily deep neural networks. A 1971 paper described a deep network with eight layers trained by regression analysis. Superfluous hidden units were pruned using separate validation sets. Activation functions used Kolmogorov-Gabor polynomials, making them the first deep networks with multiplicative units or gates. Shun'ichi Amari published the first deep learning multilayer perceptron trained by stochastic gradient descent in 1967. His student Saito conducted computer experiments showing a five-layer MLP learned internal representations.
Gottfried Wilhelm Leibniz derived the chain rule in 1673, which backpropagation applies efficiently to networks of differentiable nodes. Rosenblatt introduced the terminology "back-propagating errors" in 1962 but did not know how to implement it. Henry J. Kelley had a continuous precursor of backpropagation in 1960 within control theory. Seppo Linnainmaa published the modern form of backpropagation in his Master's thesis in 1970. G.M. Ostrovski et al. republished this work in 1971. Paul Werbos applied backpropagation to neural networks in 1982, though his 1974 PhD thesis did not yet describe the algorithm. David E. Rumelhart and colleagues popularized backpropagation in 1986 without citing original work. Between 2009 and 2012, artificial neural networks began winning prizes in image recognition contests. Dan Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber created DanNet in 2011. This network achieved superhuman performance in visual pattern recognition for the first time, outperforming traditional methods by a factor of three. Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton won the large-scale ImageNet competition in October 2012 with AlexNet. Ng and Dean created a network in 2012 that learned to recognize higher-level concepts like cats from unlabeled images. Increased computing power from GPUs and distributed computing allowed larger networks to be used. Yann LeCun and colleagues created LeNet in 1989 to recognize handwritten ZIP codes on mail. Training required three days for this convolutional neural network.
Kunihiko Fukushima introduced the neocognitron architecture in 1979, which included convolutional layers and downsampling layers with weight replication. This system was not trained by backpropagation but became essential for computer vision. Max pooling, a popular downsampling procedure, also originated with Fukushima's work. Alex Waibel introduced the Time Delay Neural Network in 1987 to apply convolutional neural networks to phoneme recognition. Wei Zhang applied a backpropagation-trained CNN to alphabet recognition in 1988. LeNet-5, a seven-level CNN developed by Yann LeCun and colleagues in 1998, classified digits using 32x32 pixel images. Banks applied this technology to recognize hand-written numbers on checks digitized in those same dimensions. In 2014, Jürgen Schmidhuber published Attention Is All You Need, leading to the modern Transformer architecture. These models require computation time that is quadratic in the size of the context window. Many modern large language models such as ChatGPT, GPT-4, and BERT use this architecture today. Ian Goodfellow et al. created Generative Adversarial Networks in 2014, becoming state-of-the-art in generative modeling during the 2014, 2018 period. The GAN principle was originally published by Jürgen Schmidhuber in 1991 under the name "artificial curiosity." Nvidia released StyleGAN in 2018 based on Progressive GAN by Tero Karras and colleagues. Diffusion models eclipsed GANs in generative modeling since 2015 with systems like DALL-E 2 and Stable Diffusion in 2022.
Machine learning separates into three main paradigms: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning uses paired inputs and desired outputs to produce the correct result for each input. Mean-squared error minimizes average squared errors between network output and desired output. Tasks include pattern recognition, regression, handwriting, speech, and gesture recognition. Unsupervised learning receives input data along with a cost function dependent on the task domain. Applications include clustering, statistical distribution estimation, compression, and filtering. Reinforcement learning involves an actor taking actions while receiving unpredictable responses from the environment. The goal is to generate the most positive or lowest-cost responses over time. Dynamic programming coupled with neural networks applies to vehicle routing, video games, natural resource management, and medicine. Self-learning in neural networks was introduced in 1982 through the Crossbar Adaptive Array system. This system computed decisions about actions and emotions about encountered situations without external advice. Bozinovski published work on this self-learning method in 1982 and again in 2014. Jürgen Schmidhuber proposed the neural sequence chunker in 1991, introducing concepts of self-supervised pre-training. A neural history compressor solved very deep learning tasks requiring more than 1000 subsequent layers in 1993. Sepp Hochreiter identified the vanishing gradient problem in his 1991 diploma thesis and proposed recurrent residual connections.
Neural networks have found applications in many disciplines including function approximation, data processing, nonlinear system identification, pattern recognition, and sequence recognition. ANNs diagnose several types of cancers by distinguishing highly invasive cancer cell lines using only cell shape information. They accelerate reliability analysis of infrastructures subject to natural disasters and predict foundation settlements. Flood mitigation uses ANNs for modeling rainfall-runoff processes. Geoscience fields like hydrology, ocean modeling, coastal engineering, and geomorphology employ black-box models built with these systems. Cybersecurity teams use machine learning to classify Android malware and identify domains belonging to threat actors. Researchers develop ANN systems for penetration testing, detecting botnets, credit card frauds, and network intrusions. Physics researchers propose solving partial differential equations and simulating properties of many-body open quantum systems. Brain research studies short-term behavior of individual neurons and dynamics arising from interactions between them. Materials science uses graph neural networks to discover new stable materials by predicting total energy of crystals. Stock market prediction and credit scoring utilize ANNs to process vast financial data and recognize complex patterns. Automated surveillance and medical imaging benefit from deep convolutional neural networks achieving state-of-the-art performance. Voice-activated systems improve through large vocabulary continuous speech recognition. Text classification, sentiment analysis, and machine translation enable automated customer service and content moderation.
Common questions
When was the method of least squares published by Adrien-Marie Legendre?
Adrien-Marie Legendre published the method of least squares in 1805 to find a rough linear fit for planetary movement data. This statistical technique became the foundation for what would eventually be called artificial neural networks over two centuries later.
Who introduced backpropagation and when did Seppo Linnainmaa publish its modern form?
Rosenblatt introduced the terminology back-propagating errors in 1962 but did not know how to implement it. Seppo Linnainmaa published the modern form of backpropagation in his Master's thesis on the 1st of September 1970.
Which researchers won the ImageNet competition in October 2012 with AlexNet?
Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton won the large-scale ImageNet competition in October 2012 with AlexNet. This network achieved superhuman performance in visual pattern recognition for the first time outperforming traditional methods by a factor of three.
What year did Jürgen Schmidhuber publish Attention Is All You Need leading to the Transformer architecture?
Jürgen Schmidhuber published Attention Is All You Need in 2014 leading to the modern Transformer architecture. These models require computation time that is quadratic in the size of the context window and are used today by many large language models such as ChatGPT GPT-4 and BERT.
How do neural networks diagnose cancer according to the provided script text?
Artificial neural networks diagnose several types of cancers by distinguishing highly invasive cancer cell lines using only cell shape information. They also accelerate reliability analysis of infrastructures subject to natural disasters and predict foundation settlements.