Who coined the term mechanistic interpretability and when?
Chris Olah coined the term mechanistic interpretability in the early 2010s to describe a radical shift in how researchers approached artificial intelligence. Before this moment, neural networks were treated as opaque black boxes where inputs produced outputs without any clear understanding of the internal logic. Olah and his team began to treat these complex mathematical structures like binary computer programs that could be reverse-engineered to reveal their true functions.