Questions about Large language model

Short answers, pulled from the story.

When did Google researchers publish the paper Attention Is All You Need?

Google researchers published the paper titled Attention Is All You Need in 2017 at the NeurIPS conference. This document introduced the transformer architecture to replace older recurrent neural network methods.

What is the Chinchilla scaling law for large language models?

The Chinchilla scaling law states that training cost equals six FLOPs per parameter per token. Scaling laws predict LLM performance based on total compute used, size of the artificial neural network, and size of its pretraining dataset.

How much energy does text generation require per prompt as of 2025?

Text generation requires around 0.05 Wh per prompt while image generation averages 2.91 Wh which is the most energy-intensive process. Simple classification tasks consume an average of 0.002 to 0.007 Wh per prompt about nine percent smartphone charge.

Why do large language models exhibit gender bias in their outputs?

Gender bias manifests through stereotypical occupational associations assigning nursing roles disproportionately to women because AI models inherit biases present in training data. In 2023 LLMs assigned roles and characteristics based on traditional gender norms like nurses being women.

What legal settlement did Anthropic reach regarding memorization practices in 2025?

In 2025 Anthropic reached a preliminary agreement to settle a class action by authors for approximately $1.5 billion after a judge found the company stored millions of pirated books. Meta obtained a favorable judgment mid-2025 where the court found plaintiffs lacked records sufficient to show infringement.

Read the full story about Large language model →

Up Next

Claude (language model)Generative pre-trained transformer GPT-4 ChatGPT GPT-3