The human eye can process only 10 to 12 images per second as distinct pictures, yet it perceives motion at rates far exceeding this limit. This biological constraint, known as the flicker fusion threshold, is the invisible foundation upon which all modern visual media rests. When a computer display flickers faster than 50 frames per second, the majority of observers perceive the light as stable and continuous. However, this stability is an illusion created by the persistence of vision, a phenomenon where a single millisecond visual stimulus can be perceived to last between 100 and 400 milliseconds. This temporal gap allows a rapid sequence of static images to merge into fluid motion, tricking the brain into seeing movement where there is only a rapid succession of stillness. The threshold varies wildly depending on the stimulus; a non-uniform image with complex details can push the flicker fusion threshold into the hundreds of hertz, far beyond the simple modulation of light. In one striking example, a 10-millisecond green flash followed immediately by a 10-millisecond red flash is perceived by the human brain as a single yellow flash, demonstrating how the visual system integrates separate stimuli into a unified perception. This biological quirk is the reason why film and video exist at all, turning the limitations of human perception into the medium of storytelling.
The Silent Film Hand Crank
Early silent films operated at a chaotic range of frame rates, typically between 16 and 24 frames per second, dictated not by technology but by the physical strength of the person turning the camera crank. Because these cameras were hand-cranked, the frame rate often fluctuated during a scene to match the mood of the action, creating a variable speed that could make a scene appear faster or slower depending on the operator's rhythm. Projectionists in theaters held the power to alter this speed further by adjusting a rheostat that controlled the voltage powering the film-carrying mechanism, effectively changing the playback speed to suit the audience's reaction. Film companies frequently intended for theaters to project their silent films at a higher frame rate than the one used during filming, believing that faster motion was more engaging. These early rates were sufficient to create a sense of motion, yet the resulting image was often perceived as jerky and disjointed. To combat the eye strain caused by the flickering light, projectors employed dual- and triple-blade shutters that displayed each frame two or three times, increasing the flicker rate to 48 or 72 frames per second. Thomas Edison famously declared that 46 frames per second was the absolute minimum required for the eye to perceive motion without strain, setting a benchmark that would eventually drive the industry toward standardization. By the mid to late 1920s, the frame rate for silent film had stabilized into a range of 20 to 26 frames per second, bridging the gap between the chaotic early days and the precision of the sound era.
The introduction of sound film in 1926 forced the industry to abandon the variable frame rates of the silent era, as the human ear is far more sensitive to changes in frequency than the eye is. Theaters had been showing silent films at speeds ranging from 22 to 26 frames per second, but the introduction of audio required a fixed speed to prevent the pitch of the dialogue from warping as the film sped up or slowed down. The industry chose 24 frames per second as a compromise, a rate that satisfied the technical requirements of sound synchronization while remaining close to the existing silent film standards. From 1927 to 1930, as various studios updated their equipment, the rate of 24 frames per second became the universal standard for 35-millimeter sound film. At this speed, the film travels through the projector at a precise rate, allowing simple two-blade shutters to produce a projected series of images at 48 per second, satisfying Edison's earlier recommendation for eye comfort. Many modern 35-millimeter film projectors now use three-blade shutters to give 72 images per second, flashing each frame three times to further reduce eye strain and flicker. This standardization was not merely an aesthetic choice but a technical necessity that allowed the audio and visual elements to remain perfectly synchronized, creating the immersive experience that defined the golden age of cinema. The 24 frames per second standard remains the backbone of film production today, a legacy of the technological constraints that forced the industry to find a middle ground between the eye and the ear.
The Art Of Twos And Threes
In the world of drawn animation, the fluidity of motion is often an illusion created by a technique known as animating on twos, where one drawing is displayed for every two frames of film. Since film typically runs at 24 frames per second, this results in a display of only 12 drawings per second, yet the fluidity remains satisfactory for most subjects. When a character performs a quick movement, however, animators must revert to animating on ones, displaying a new drawing for every frame, as twos are too slow to convey the motion adequately. A blend of these two techniques keeps the eye fooled while controlling production costs, allowing studios to produce hours of animation without creating a unique drawing for every single frame. This economy of motion was pushed to its limits in the mid-1960s with the introduction of Saturday morning cartoons, which were produced as cheaply as possible and often shot on threes or even fours, meaning three or four frames per drawing. This translates to only 8 or 6 drawings per second respectively, creating a distinct, jerky aesthetic that defined a generation of television animation. Anime also frequently utilizes these techniques, drawing on threes or twos to maintain a balance between fluidity and cost. The result is a visual language where the viewer's brain fills in the gaps, accepting the limited number of drawings as a continuous flow of movement. This approach to animation demonstrates how the human visual system can be manipulated to accept lower frame rates as fluid motion, provided the timing and composition are carefully managed.
The Grid That Dictated 59.94
The frame rates of analog television were not chosen for their visual quality but were dictated by the frequency of the electric grid powering the world. Most of the world adopted 50 frames per second to match the 50-hertz mains frequency, while Canada, the United States, Mexico, the Philippines, Japan, and South Korea adopted 60 frames per second to match their 60-hertz grids. The frequency of the electricity grid was extremely stable, making it logical to use for synchronization, ensuring that the television signal remained locked to the power source. The introduction of color television technology, however, made it necessary to lower the 60 frames per second frequency by 0.1 percent to avoid a display artifact known as dot crawl, which appeared on legacy black-and-white displays showing up on highly color-saturated surfaces. By lowering the frame rate by 0.1 percent, the undesirable effect was minimized, resulting in the standard of 59.94 images per second that persists in North America, Japan, and South Korea today. This seemingly minor adjustment created a confusing legacy where video transmission standards are based on 59.94 images per second, yet the industry often refers to the rate as 60 frames per second. Two sizes of images are typically used, 1080i interlaced or 1080p progressive, and 720p, with interlaced formats customarily stated at half their image rate and double their image height, though these statements are purely custom. In each format, 60 images per second are produced, but the resolution of 1080i produces 59.94 or 50 images, each squashed to half-height in the photographic process and stretched back to fill the screen on playback in a television set. The 720p format produces images that are not squeezed, so that no expansion or squeezing of the image is necessary. This confusion was industry-wide in the early days of digital video software, with much software being written incorrectly, the developers believing that only 29.97 images were expected each second. While it was true that each picture element was polled and sent only 29.97 times per second, the pixel location immediately below that one was polled 1/60 of a second later, part of a completely separate image for the next 1/60-second frame.
The Real Time Battle
In computer video games, frame rate plays a critical role in the experience, as games are rendered in real time rather than being pre-recorded. Sixty frames per second has for a long time been considered the minimum frame rate for smoothly animated game play, ensuring that fast-paced action remains clear and responsive. Video games designed for PAL markets, before the sixth generation of video game consoles, had lower frame rates by design due to the 50-hertz output, which noticeably made fast-paced games, such as racing or fighting games, run slower. Less frequently, developers accounted for the frame rate difference and altered the game code to achieve nearly identical pacing across both regions, with varying degrees of success. Computer monitors marketed to competitive PC gamers can hit 360, 500 frames per second, or more, providing a level of smoothness that reduces motion blur and input latency. High frame rates make action scenes look less blurry, such as sprinting through the wilderness in an open world game, spinning rapidly to face an opponent in a first-person shooter, or keeping track of details during an intense fight in a multiplayer online battle arena. Some people may have difficulty perceiving the differences between high frame rates, yet the competitive advantage is undeniable. Frame time is related to frame rate, but it measures the time between frames, and a game could maintain an average of 60 frames per second but appear choppy because of a poor frame time. Game reviews sometimes average the worst 1 percent of frame rates, reported as the 99th percentile, to measure how choppy the game appears. A small difference between the average frame rate and 99th percentile would generally indicate a smooth experience. To mitigate the choppiness of poorly optimized games, players can set frame rate caps closer to their 99 percent percentile. When a game's frame rate is different than the display's refresh rate, screen tearing can occur, and Vsync mitigates this, but it caps the frame rate to the display's refresh rate, increases input lag, and introduces judder. Variable refresh rate displays automatically set their refresh rate equal to the game's frame rate, as long as it is within the display's supported range.
The Ghost In The Machine
Frame rate up-conversion is the process of increasing the temporal resolution of a video sequence by synthesizing one or more intermediate frames between two consecutive frames. A low frame rate causes aliasing, yields abrupt motion artifacts, and degrades the video quality, making the temporal resolution an important factor affecting video quality. Algorithms for frame rate up-conversion are widely used in applications, including visual quality enhancement, video compression, and slow-motion video generation. Most methods can be categorized into optical flow or kernel-based and pixel hallucination-based methods. Flow-based methods linearly combine predicted optical flows between two input frames to approximate flows from the target intermediate frame to the input frames, proposing flow reversal for more accurate image warping. Moreover, there are algorithms that give different weights of overlapped flow vectors depending on the object depth of the scene via a flow projection layer. Pixel hallucination-based methods use deformable convolution to the center frame generator by replacing optical flows with offset vectors, and there are algorithms that also interpolate middle frames with the help of deformable convolution in the feature domain. However, since these methods directly hallucinate pixels, unlike the flow-based methods, the predicted frames tend to be blurry when fast-moving objects are present. This technological dance between the original frames and the synthesized ones creates a new layer of complexity in video production, where the goal is to make the motion appear smoother than the original capture allowed. The process is essential for modern video standards that support 120, 240, or 300 frames per second, allowing frames to be evenly sampled for standard frame rates such as 24, 48, and 60 frames per second film or 25, 30, 50, or 60 frames per second video. Of course, these higher frame rates may also be displayed at their native rates, but the ability to convert lower frame rates into higher ones has become a standard tool in the video editor's arsenal.