Generative pre-trained transformer

On the 11th of June 2018, a research paper titled Improving Language Understanding by Generative Pre-Training landed on the arXiv server, quietly introducing a technology that would eventually reshape the global economy. This document described GPT-1, the first model to successfully merge the transformer architecture with generative pre-training, a combination that allowed machines to learn from vast amounts of unlabeled text without human instruction. Before this moment, the field of natural language processing relied heavily on supervised learning, a method that required manually labeling millions of data points to teach a computer how to understand language. That process was prohibitively expensive and slow, creating a bottleneck that limited the scale and sophistication of early AI systems. The researchers at OpenAI realized that if they could train a model to predict the next word in a sentence using only raw internet text, the model would implicitly learn grammar, facts, and reasoning patterns without needing a single human label. This semi-supervised approach was a breakthrough that shifted the paradigm from teaching machines specific tasks to allowing them to learn the structure of language itself. The initial model was small by today's standards, yet it demonstrated that a neural network could be pre-trained on a diverse corpus like BookCorpus and then fine-tuned for specific tasks with remarkable efficiency. The implications were not immediately obvious to the public, but within the research community, the potential for scaling this architecture was already being whispered about as the next great leap in artificial intelligence.

The Staged Release

By the 14th of February 2019, the stakes had risen significantly when OpenAI unveiled GPT-2, a model ten times larger than its predecessor with 1.5 billion parameters trained on 40 gigabytes of WebText. The sheer capability of this new system to generate coherent, human-like text raised immediate alarms within the organization about the potential for malicious use. The researchers feared that bad actors could use the model to generate spam, create disinformation campaigns, or automate phishing attacks with unprecedented efficiency. In a move that surprised many in the tech industry, OpenAI decided to withhold the full model from the public, opting instead for a staged release strategy. They initially published smaller versions of the model to the research community while keeping the largest version under lock and key until November 2019. This decision highlighted a growing tension between the open science movement and the safety concerns surrounding powerful AI. The model had been trained on eight million web pages, giving it a breadth of knowledge that previous systems could not match, yet the company felt the risk of releasing the full power of the model was too great to ignore. This cautious approach set a precedent for how future AI developments would be handled, balancing the drive for innovation with the responsibility to prevent harm. The partial release allowed researchers to study the model's capabilities and limitations, laying the groundwork for the more aggressive development strategies that would follow in the coming years.

When was the Generative pre-trained transformer research paper published?

The research paper titled Improving Language Understanding by Generative Pre-Training landed on the arXiv server on the 11th of June 2018. This document described GPT-1, the first model to successfully merge the transformer architecture with generative pre-training.

What date did OpenAI release GPT-3 and how many parameters did it have?

OpenAI responded with GPT-3 on the 28th of May 2020, a model boasting 175 billion parameters trained on a dataset far larger than anything seen before. This model demonstrated few-shot and zero-shot learning abilities, meaning it could perform complex tasks with only a few examples or even no examples at all.

When was ChatGPT launched and what training method did it use?

The public chatbot named ChatGPT was launched by OpenAI on the 30th of November 2022. The model powering this chatbot was refined through reinforcement learning from human feedback, or RLHF, which involved human trainers engaging in conversations with the model to teach it how to follow instructions.

What capabilities did GPT-4 introduce when it was released on the 14th of March 2023?

GPT-4 was released on the 14th of March 2023 as a multi-modal system capable of processing and generating text, images, and audio. This model could analyze charts, interpret diagrams, and generate code based on visual inputs, making it a versatile tool for a wide range of applications.

When did OpenAI begin the legal battle over the term GPT?

The 23rd of April 2023 marked the beginning of a legal and commercial battle over the term GPT, as OpenAI asserted that the acronym should be regarded as a brand of the company rather than a generic technical term. OpenAI applied to the United States Patent and Trademark Office to seek domestic trademark registration for the term GPT.

Generative pre-trained transformer

The Staged Release

Continue Browsing

Common questions

When was the Generative pre-trained transformer research paper published?

What date did OpenAI release GPT-3 and how many parameters did it have?

When was ChatGPT launched and what training method did it use?

What capabilities did GPT-4 introduce when it was released on the 14th of March 2023?

When did OpenAI begin the legal battle over the term GPT?

The Scale of Giants

The Human Feedback Loop

The Multimodal Horizon

The Reasoning Revolution

The Brand War