Free to follow every thread. No paywall, no dead ends.
Neural network (machine learning) | HearLore
Neural network (machine learning)
In 1943, two scientists named Warren McCulloch and Walter Pitts published a paper that would quietly seed the digital age, proposing that the human brain's neurons could be modeled as simple mathematical units. Their work, titled A Logical Calculus of the Ideas Immanent in Nervous Activity, did not immediately revolutionize technology, but it established the foundational idea that information processing could be distributed across a network of simple elements rather than controlled by a central processor. This concept, known as connectionism, stood in stark contrast to the von Neumann architecture that dominated early computing, where memory and processing were strictly separated. McCulloch and Pitts showed that by connecting these simple units in specific patterns, complex behaviors could emerge without any single unit possessing intelligence. Their model was non-learning, meaning the connections were fixed, yet it provided the first rigorous framework for thinking about how biological systems might compute. This theoretical groundwork remained obscure for nearly two decades, waiting for the right combination of hardware and mathematical insight to transform it from a biological curiosity into the engine of modern artificial intelligence.
The Perceptron's Golden Age
The year 1958 marked a turning point when psychologist Frank Rosenblatt unveiled the perceptron, a device funded by the United States Office of Naval Research that promised to mimic human learning. Rosenblatt's invention was the first implemented artificial neural network capable of adjusting its own connections, or weights, based on experience. The public and government reaction was immediate and explosive, with headlines claiming the machine could learn to see and think. This optimism fueled a golden age of artificial intelligence funding, as scientists believed they were on the verge of creating machines that could emulate human intelligence. However, the perceptron had a critical flaw: it could not solve problems that were not linearly separable, such as the exclusive-or circuit. In 1969, mathematicians Marvin Minsky and Seymour Papert published a book demonstrating this limitation, which caused the field to stagnate in the United States and led to a drastic reduction in funding. While American research hit a wall, Soviet scientists like Alexey Ivakhnenko and Lapa were quietly developing the Group method of data handling in 1965, a technique that could train arbitrarily deep neural networks. Their work, largely ignored by the West, proved that deep networks were possible and could learn complex patterns, but the global focus remained fixated on the limitations of the single-layer perceptron.
Who published the first paper on neural networks in 1943?
Warren McCulloch and Walter Pitts published the paper A Logical Calculus of the Ideas Immanent in Nervous Activity in 1943. Their work proposed that human brain neurons could be modeled as simple mathematical units. This paper established the foundational idea of connectionism in artificial neural networks.
When did Frank Rosenblatt unveil the perceptron and what was its funding source?
Frank Rosenblatt unveiled the perceptron in 1958. The device was funded by the United States Office of Naval Research. This invention was the first implemented artificial neural network capable of adjusting its own connections based on experience.
What year did Seppo Linnainmaa publish the modern form of the backpropagation algorithm?
Seppo Linnainmaa published the modern form of the backpropagation algorithm in 1970. This mathematical machinery allowed neural networks to adjust the weights of connections in hidden layers. Paul Werbos applied this method to neural networks in 1982 and David E. Rumelhart popularized it in 1986.
Which neural network won the ImageNet competition in October 2012?
AlexNet won the large-scale ImageNet competition in October 2012. The network was developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton. This breakthrough was made possible by the availability of powerful graphics processing units which reduced training times from months to days.
What year did Amazon scrap its recruiting tool due to bias against women?
Amazon had to scrap its recruiting tool in 2018. The model favored men over women for jobs in software engineering due to the higher number of male workers in the field. The program penalized any resume with the word woman or the name of any women's college.
When was the Transformer architecture introduced and what is its primary use?
The Transformer architecture was introduced in 2017. It has become the model of choice for natural language processing. This architecture enables models like ChatGPT, GPT-4, and BERT to understand and generate human language.
The modern era of neural networks began to take shape in the 1970s and 1980s through the rediscovery and application of backpropagation, an algorithm for training deep networks. Although the concept of back-propagating errors was introduced by Rosenblatt in 1962, he could not implement it. The mathematical machinery required to make it work was derived by Gottfried Wilhelm Leibniz in 1673, but it took until 1970 for Seppo Linnainmaa to publish the modern form of the algorithm in his Master's thesis. Paul Werbos applied this method to neural networks in 1982, and David E. Rumelhart popularized it in 1986, though he did not cite the original work. This algorithm allowed networks to adjust the weights of connections in hidden layers, solving the problem that had stalled progress for decades. The development of backpropagation enabled the training of networks with multiple layers, leading to the birth of deep learning. In 1979, Kunihiko Fukushima introduced the neocognitron, a convolutional neural network that used weight sharing and max pooling, laying the groundwork for computer vision. By 1989, Yann LeCun and his team created LeNet, a convolutional neural network that could recognize handwritten ZIP codes, a task that required three days of training but demonstrated the practical power of deep networks. This era also saw the introduction of recurrent neural networks, which allowed for loops in the network architecture, enabling the modeling of sequential data and time-dependent processes.
The Deep Learning Breakthrough
Between 2009 and 2012, artificial neural networks began to win prizes in image recognition contests, approaching human-level performance on various tasks. In 2011, a convolutional neural network named DanNet, developed by Dan Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber, achieved superhuman performance in a visual pattern recognition contest, outperforming traditional methods by a factor of three. This success was followed by AlexNet, developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton, which won the large-scale ImageNet competition in October 2012 by a significant margin. The breakthrough was made possible by the availability of powerful graphics processing units, or GPUs, which allowed for the training of much larger networks than before. The use of GPUs reduced training times from months to days, making it feasible to train networks with many layers. This period also saw the development of generative adversarial networks, or GANs, by Ian Goodfellow in 2014, which became state-of-the-art in generative modeling. GANs involve two networks competing with each other, one generating data and the other trying to distinguish real from fake. This competition led to the creation of highly realistic images and sparked discussions about deepfakes. The rapid advancement of deep learning during this period transformed the field, leading to applications in image processing, speech recognition, natural language processing, and many other domains.
The Architecture of Thought
Neural networks have evolved into a diverse family of architectures, each designed to handle specific types of data and tasks. Convolutional neural networks, or CNNs, have proven particularly successful in processing visual and other two-dimensional data, while recurrent neural networks, or RNNs, handle sequential data like speech and text. Long short-term memory, or LSTM, was introduced to solve the vanishing gradient problem in RNNs, allowing them to handle signals with mixed frequencies and large vocabularies. The Transformer architecture, introduced in 2017, has become the model of choice for natural language processing, enabling models like ChatGPT, GPT-4, and BERT to understand and generate human language. These architectures are built from layers of artificial neurons, which are connected by weighted links that determine the strength of the signal between them. The network forms a directed, weighted graph, where the output of one neuron becomes the input of another. The initial inputs are external data, such as images or documents, and the ultimate outputs accomplish the task, such as recognizing an object in an image. The network's ability to learn and model non-linearities and complex relationships is achieved by connecting neurons in various patterns, allowing the network to adapt to the data it is trained on.
The Black Box Dilemma
Despite their success, neural networks face significant criticism regarding their transparency and reliability. A common criticism is that they require too many training samples for real-world operation, and their black-box nature makes it difficult to understand what they have learned. In 1997, Alexander Dewdney, a former Scientific American columnist, commented that neural networks have a tendency to be mysterious and unexplainable. However, researchers have developed methods to visualize and explain learned neural networks, such as attention mechanisms, which help to uncover the principles that allow a learning machine to be successful. Another major issue is dataset bias, where low-quality data with imbalanced representativeness can lead to the model learning and perpetuating societal biases. For example, in 2018, Amazon had to scrap a recruiting tool because the model favored men over women for jobs in software engineering due to the higher number of male workers in the field. The program penalized any resume with the word woman or the name of any women's college. This highlights the critical importance of data quality and the need for careful tuning to avoid discriminatory outcomes. Despite these challenges, ongoing research is aimed at addressing remaining issues such as data privacy, model interpretability, and expanding the scope of neural network applications in medicine, finance, and other fields.
The Hardware Revolution
The resurgence of neural networks in the twenty-first century is largely attributable to advances in hardware. From 1991 to 2015, computing power, especially as delivered by GPGPUs, increased around a million-fold, making the standard backpropagation algorithm feasible for training networks that are several layers deeper than before. The use of accelerators such as FPGAs and GPUs can reduce training times from months to days. Neuromorphic engineering addresses the hardware difficulty directly by constructing non-von-Neumann chips to directly implement neural networks in circuitry. Another type of chip optimized for neural network processing is called a Tensor Processing Unit, or TPU. These hardware advancements have enabled the training of large and effective neural networks, which require considerable computing resources. The brain has hardware tailored to the task of processing signals through a graph of neurons, but simulating even a simplified neuron on von Neumann architecture may consume vast amounts of memory and storage. The development of specialized hardware has been crucial in making neural networks practical for real-world applications, from autonomous driving to medical diagnosis. The future of neural networks will likely depend on continued innovations in hardware, as the demand for more powerful and efficient computing grows.
The Future of Intelligence
Artificial neural networks have undergone significant advancements, particularly in their ability to model complex systems, handle large data sets, and adapt to various types of applications. Their evolution over the past few decades has been marked by a broad range of applications in fields such as image processing, speech recognition, natural language processing, finance, and medicine. In medicine, neural networks are able to process and analyze vast medical datasets, enhancing diagnostic accuracy and predicting patient outcomes for personalized treatment planning. In drug discovery, neural networks speed up the identification of potential drug candidates and predict their efficacy and safety, significantly reducing development time and costs. In content creation, neural networks such as generative adversarial networks and transformers are used to create original artworks and music compositions. The future of neural networks will likely involve further integration with other fields, such as materials science, where graph neural networks have demonstrated their capability in scaling deep learning for the discovery of new stable materials. The ongoing research is aimed at addressing remaining challenges such as data privacy, model interpretability, and expanding the scope of neural network applications. As neural networks continue to evolve, they will play an increasingly important role in shaping the future of technology and society.