Tensor Processing Unit

In 2015, Google quietly deployed a custom silicon chip inside its data centers that would eventually render the graphics cards used by the rest of the tech industry obsolete for specific tasks. This chip, the Tensor Processing Unit, was not a standard component found in any store, but a bespoke creation designed to solve a very specific problem: the crushing energy demands of running neural networks at a global scale. While competitors were trying to force general-purpose graphics processing units to do the job, Google engineers realized that the math required for machine learning was fundamentally different from the math required to render video games. They built a machine that sacrificed the ability to draw textures or handle complex rasterization in exchange for raw, unadulterated matrix multiplication speed. The result was a device that could perform 15 to 30 times more operations per watt than the CPUs and GPUs of the time, effectively changing the economics of artificial intelligence forever. This was not a marketing gimmick but a physical reality that allowed Google to process the entire text database of Google Street View in less than five days, a task that would have taken weeks using standard hardware.

The Systolic Array Revolution

The architecture of the first TPU was a radical departure from the von Neumann architecture that had dominated computing for decades, relying instead on a systolic array design that allowed data to flow rhythmically through the chip like blood cells in a vein. Norman Jouppi, the principal architect who led the project to production in just 15 months, engineered a system where data did not need to be shuttled back and forth between a processor and memory for every single calculation. Instead, the data moved through a grid of 256 by 256 multipliers, with each cell passing its results to its neighbor in a synchronized dance that eliminated the memory bottleneck. This design was so efficient that the chip could fit into a standard hard drive slot within a data center rack, yet it consumed only 28 to 40 watts of power while delivering performance that dwarfed its predecessors. The first generation chip operated at 700 megahertz and utilized 8-bit precision, a deliberate choice to maximize the number of operations per joule. By 2017, the second generation TPU had introduced high bandwidth memory, increasing the data transfer rate to 600 gigabytes per second and allowing the chip to handle floating-point calculations, a capability that made it suitable for training complex models rather than just running them.

The Battle for the Cloud

For years, the TPU remained a secret weapon, used exclusively within Google's own infrastructure to power services like Search, Google Photos, and the AlphaGo system that defeated the world champion in the game of Go. It was not until the 12th of February 2018, that Google opened the gates, allowing third-party companies to access these chips through its cloud computing service. This move transformed the TPU from an internal efficiency tool into a commercial product, challenging the dominance of Nvidia in the AI accelerator market. The company began offering different versions of the chip, from the massive pods used for training massive models to smaller inference units for edge devices. By 2021, the fourth generation TPU had been announced, boasting an interconnect bandwidth that was ten times greater than any other networking technology at the time. The race to build the fastest chip intensified, with Google claiming that its TPU v4 was 5 to 87 percent faster than Nvidia's A100 on machine learning benchmarks. The competition was not just about raw speed but about energy efficiency, as the cost of electricity for running data centers became a major factor in the profitability of AI services.

Common questions

What is the Tensor Processing Unit and when was it deployed by Google?

The Tensor Processing Unit is a custom silicon chip deployed by Google in 2015 to handle neural network tasks more efficiently than standard graphics cards. This bespoke creation was designed to solve the energy demands of running neural networks at a global scale.

How does the Tensor Processing Unit architecture differ from the von Neumann architecture?

The Tensor Processing Unit uses a systolic array design that allows data to flow rhythmically through the chip like blood cells in a vein instead of shuttling data back and forth between a processor and memory. This architecture eliminates the memory bottleneck by moving data through a grid of 256 by 256 multipliers in a synchronized dance.

When did Google open the Tensor Processing Unit to third-party companies?

Google opened the Tensor Processing Unit to third-party companies on the 12th of February 2018 through its cloud computing service. This move transformed the chip from an internal efficiency tool into a commercial product that challenges the dominance of Nvidia in the AI accelerator market.

What are the specifications of the Edge TPU announced by Google in July 2018?

The Edge TPU is a purpose-built ASIC that consumes only 2 watts of power and performs 4 trillion operations per second. This chip was integrated into the Pixel 4 smartphone as the Pixel Neural Core to handle camera features with minimal latency and power consumption.

What legal dispute involving the Tensor Processing Unit began in 2019?

Singular Computing filed a lawsuit against Google in 2019 alleging patent infringement regarding the dynamic range of floating-point numbers used in the chip. The dispute centered on the bfloat16 format, a non-standard floating-point format that Google invented to maximize the efficiency of its neural networks.

What are the performance specifications of the Trillium and Ironwood Tensor Processing Unit generations?

The Trillium chip announced in May 2024 offers a 4.7 times performance increase over the previous v5e model. The Ironwood chip announced in April 2025 promises a peak computational performance rate of 4,614 teraflops per second.