Internet Archive
Brewster Kahle founded the Internet Archive in May 1996, establishing a digital library with the mission to provide universal access to all knowledge. The earliest known archived page on the site was saved on the 10th of May 1996, at 14:42 UTC, capturing the download page for Internet Explorer. By October of that same year, the organization had begun archiving and preserving the World Wide Web in large amounts. This initial phase focused on web crawling capabilities to preserve as much public web content as possible. The archive link provided free access to collections of digitized media including websites, software applications, music, audiovisual materials, and print items. In late 1999, the collection expanded beyond the web archive by incorporating the Prelinger Archives. Now the Internet Archive includes texts, audio, moving images, and software alongside its original web focus. It hosts several other projects such as the NASA Images Archive, the contract crawling service Archive-It, and the wiki-editable library catalog Open Library. Soon after these expansions began, the Archive started working to provide specialized services relating to information access needs of the print-disabled community. Publicly accessible books were made available in a protected Digital Accessible Information System format designed for those with disabilities.
Since late 2009, the headquarters of the Internet Archive has been located in the building that formerly housed the Fourth Church of Christ, Scientist in San Francisco, California. From 1996 to 2009, operations took place within the Presidio of San Francisco, a former U.S. military base. The organization maintains data centers in three Californian cities: San Francisco, Redwood City, and Richmond. To reduce risk of total data loss, copies of parts of the collection are stored at more distant locations including the Bibliotheca Alexandrina in Egypt and a facility in Amsterdam. As of 2025, reports indicate the Internet Archive operates six data centers mainly in California with smaller ones in other U.S. states, Canada, and Europe. These facilities have controlled access and fire protection systems monitored for security. All Internet Archive data centers adhere to ISO/IEC 27001 standards, and some meet additional certifications. In 2019, the organization had an annual budget of $37 million derived from revenue from web crawling services, various partnerships, grants, donations, and the Kahle-Austin Foundation. A December 2019 campaign set a goal of reaching $6 million in donations. Scanning is performed by 100 paid operators worldwide as of 2019, though most staff once worked directly in book-scanning centers.
The archived content became more easily available to the general public in 2001 through the creation of the Wayback Machine. This service allows archives of the World Wide Web to be searched and accessed to see what previous versions of websites used to look like or visit sites that no longer exist. The Wayback Machine was created as a joint effort between Alexa Internet and the Internet Archive. Hundreds of billions of websites and their associated data including images, source code, and documents are saved in its database. By October 2025, the Internet Archive announced that the Wayback Machine had archived one trillion webpages, equivalent to more than 100,000 terabytes of data. As of 2025, the archive held over 866 billion web pages, more than 42.5 million print materials, 13 million videos, 3 million TV news reports, 1.2 million software programs, 14 million audio files, 5 million images, and 272,660 concerts within its collection. In September 2024, Google and the Internet Archive announced a collaboration where links to the Wayback Machine would appear in the 'more about this page' menu in Google Search. This partnership effectively replaced Google's own Google Cache service which it retired earlier that year.
On the 1st of June 2020, four large publishing houses filed a lawsuit against the Internet Archive before the United States District Court for the Southern District of New York. The plaintiffs claimed that the organization's practice of controlled digital lending constituted copyright infringement. On the 25th of March 2023, the court found in favor of the publishers. The negotiated judgment of the 11th of August 2023, barred the Internet Archive from digitally lending books for which electronic copies are on sale. The operation of the National Emergency Library was part of this legal challenge. The Internet Archive closed the National Emergency Library on the 16th of June 2020, rather than the planned the 30th of June 2020, due to the lawsuit. Judge Koeltl ruled against the Internet Archive saying the concept was not fair use so the archive infringed copyrights by lending out publisher books without waitlist restrictions. An agreement was reached for the Internet Archive to pay an undisclosed amount to the publishers. The Internet Archive appealed the ruling. On the 4th of September 2024, the United States Court of Appeals for the Second Circuit upheld the district court's decision calling the argument about fair use doctrine unpersuasive.
The Great 78 Project launched in 2019 aims to digitize 250,000 78 rpm singles containing 500,000 songs from the period between 1880 and 1960. It has been developed in collaboration with the Archive of Contemporary Music and George Blood Audio responsible for audio digitization. In August 2023, music industry giants Universal Music Group, Sony Music, and Concord sued the Internet Archive over the Great 78 Project for $621 million in damages from alleged copyright infringement. The lawsuit was settled in September 2025. The Live Music Archive sub-collection includes more than 170,000 concert recordings from independent musicians as well as established artists like the Grateful Dead and The Smashing Pumpkins. Jordan Zevon allowed the Internet Archive to host a definitive collection of his father Warren Zevon's concert recordings ranging from 1976 to 2001 containing 126 concerts including 1,137 songs. Since 2018, the Internet Archive visual arts residency helps connect artists with the organization's over 48 petabytes of digitized materials. Previous artists in residence include Taravat Talepasand, Whitney Lynn, and Jenny Odell.
During the week of the 27th of May 2024, the Internet Archive suffered a series of distributed denial of service attacks that made its services unavailable intermittently sometimes for hours at a time over several days. A hacker group called SN_BLACKMETA claimed responsibility on May 28 with possible links to Anonymous Sudan. Beginning the 9th of October 2024, the team confirmed DDoS attacks, site defacement, and a data breach. A pop-up on the defaced site stated there was a catastrophic security breach affecting about 31 million user accounts compromised in a file dated the 28th of September 2024. The attackers stole users' email addresses and Bcrypt-hashed passwords. On October 11, Brewster Kahle said the data is safe and would bring the service back to normal in days not weeks. On October 13, the Wayback Machine was restored in a read-only format while archiving web pages was temporarily disabled. On October 15, the website remained mostly offline prioritizing keeping data safe at the expense of service availability. On October 20, threat actors stole unrotated API tokens and breached the organization on its Zendesk email support platform.
Continue Browsing
Common questions
Who founded the Internet Archive and when was it established?
Brewster Kahle founded the Internet Archive in May 1996. The earliest known archived page on the site was saved on the 10th of May 1996 at 14:42 UTC.
Where is the headquarters of the Internet Archive located as of late 2009?
The headquarters of the Internet Archive has been located in San Francisco, California since late 2009. This building formerly housed the Fourth Church of Christ, Scientist before operations moved there from the Presidio of San Francisco.
How many webpages did the Wayback Machine archive by October 2025?
By October 2025, the Internet Archive announced that the Wayback Machine had archived one trillion webpages. This volume represents more than 100,000 terabytes of data stored within its database.
What legal ruling affected the Internet Archive digital lending practice in September 2024?
On the 4th of September 2024, the United States Court of Appeals for the Second Circuit upheld a district court decision against the Internet Archive. Judge Koeltl ruled that the organization infringed copyrights by lending out publisher books without waitlist restrictions during the National Emergency Library operation.
Which music industry groups sued the Internet Archive over the Great 78 Project in August 2023?
Universal Music Group, Sony Music, and Concord sued the Internet Archive over the Great 78 Project in August 2023. The lawsuit sought $621 million in damages for alleged copyright infringement before settling in September 2025.