— Ch. 1 · Foundations And History —
Computer vision.
~4 min read · Ch. 1 of 5
In 1966, an undergraduate summer project attached a camera to a computer with the goal of describing what it saw. This early experiment marked the beginning of computer vision as a distinct discipline within artificial intelligence research at universities. The field aimed to mimic the human visual system as a stepping stone toward giving robots intelligent behavior. Studies in the 1970s formed the early foundations for many algorithms that exist today, including edge extraction and motion estimation. Researchers developed concepts like scale-space and contour models known as snakes during this period. By the 1990s, statistical learning techniques were used in practice to recognize faces using methods called Eigenface. A significant change occurred toward the end of the decade when computer graphics and computer vision began interacting more closely. Recent work has seen the resurgence of feature-based methods combined with machine learning techniques and complex optimization frameworks. Deep learning algorithms have surpassed prior methods on benchmark datasets for tasks ranging from classification to optical flow.
Core Algorithms And Tasks
The ImageNet Large Scale Visual Recognition Challenge uses millions of images and 1000 object classes to test algorithm performance. Current best algorithms based on convolutional neural networks now approach human-level accuracy on these tests. These systems struggle with small or thin objects like an ant on a flower stem or a person holding a quill. Humans find fine-grained classifications difficult while convolutional neural networks handle them easily. Optical character recognition identifies characters in printed or handwritten text for encoding into formats like ASCII. Facial matching technology enables mobile phone facelock and smart door locking systems. Emotion recognition classifies human emotions but psychologists caution that internal states cannot be reliably detected from facial expressions alone. Pose estimation determines the position or orientation of specific objects relative to a camera. Tracking follows movements of interest points such as vehicles, humans, or other organisms through image sequences. Optical flow calculates how each point moves relative to the image plane resulting from both scene movement and camera motion.