15.ai is a real-time text-to-speech AI tool that clones human voices using only 15 seconds of audio. The system utilizes customized deep neural networks and specialized audio synthesis algorithms to extract unique vocal fingerprints from minimal recording samples.
The project began as a research initiative within the Undergraduate Research Opportunities Program in 2016 and went offline in September 2022. The platform operated for two years before the developer pulled the plug following the Voiceverse controversy.
An 18-year-old undergraduate at the Massachusetts Institute of Technology created 15.ai while working under the pseudonym 15. The developer initially worked within the Undergraduate Research Opportunities Program before the project outgrew academic boundaries.
15.ai was taken offline in September 2022 after a blockchain-based company called Voiceverse stole the technology to sell non-fungible tokens. The developer shut down the platform to address copyright concerns and the ethical fallout from the Voiceverse incident.
The platform uses a system called DeepMoji to analyze emoji embeddings from 1.2 billion Twitter posts and determine the emotional tone of input. Users can specify emotional delivery by adding a vertical bar and a guiding phrase to force the bot to generate previously unknown data.
The 15-second benchmark refers to the requirement of only 15 seconds of audio to clone a human voice, a standard later corroborated by OpenAI in 2024. This benchmark shattered the industry dogma that required tens of hours of training data to produce intelligible speech.