Universal approximation theorem

Historical Proofs And Evolution

The timeline of discovery began with Cybenko's 1989 paper on sigmoid activation functions. Maxwell Stinchcombe and Halbert White followed shortly after in 1989 to extend these findings to multilayer feed-forward networks. Kurt Hornik demonstrated in 1991 that the architecture itself provided the potential rather than the specific choice of activation function. Moshe Leshno and his colleagues published their equivalence results in 1993 regarding nonpolynomial activation functions. Allan Pinkus refined these concepts further in 1999. The focus shifted toward arbitrary depth cases starting around 2003 when Gustaf Gripenberg contributed to the field. Dmitry Yarotsky and Zhou Lu advanced the theory significantly in 2017 using ReLU activation functions. Boris Hanin and Mark Sellke expanded these results in 2018. Patrick Kidger and Terry Lyons generalized the work to include general activation functions like tanh or GeLU by 2020. Cai constructed a finite set of mappings named a vocabulary in 2024, allowing any continuous function to be approximated through composition.

What did George Cybenko prove in 1989 about feedforward neural networks?

George Cybenko proved that feedforward neural networks with one hidden layer can approximate any continuous function to arbitrary accuracy. His technical report established the property known as universality for these systems.

When did Maxwell Stinchcombe and Halbert White extend the universal approximation theorem?

Maxwell Stinchcombe and Halbert White extended the findings of the universal approximation theorem shortly after 1989. Their work applied the principles to multilayer feed-forward networks during that same year.

How does network width affect the ability to achieve universal approximation according to Zhou Lu?

Zhou Lu showed that networks of width n plus 4 could approximate any Lebesgue-integrable function if depth grew sufficiently. If width was less than or equal to n, this expressive power was lost entirely.

Who determined the optimal minimum width bound for universal approximation in 2023?

Cai determined the optimal minimum width bound for universal approximation in 2023. This result specifies exactly how many neurons are needed to approximate a given function within a specific distance metric.

What extensions of the universal approximation theorem exist for graph neural networks?

Brüel-Gabrielsson established a universal approximation theorem result for graphs in 2020 showing injective properties were sufficient. Graph convolutional neural networks can be made as discriminative as the Weisfeiler Leman graph isomorphism test.

Universal approximation theorem.

Historical Proofs And Evolution

Continue Browsing

Common questions

What did George Cybenko prove in 1989 about feedforward neural networks?

When did Maxwell Stinchcombe and Halbert White extend the universal approximation theorem?

How does network width affect the ability to achieve universal approximation according to Zhou Lu?

Who determined the optimal minimum width bound for universal approximation in 2023?

What extensions of the universal approximation theorem exist for graph neural networks?

Width Versus Depth Tradeoffs

Quantitative Complexity Bounds

Specialized Architectures And Variants