LLMs can't innovate | José G. de Almeida

A really interesting result: when rating how innovative product ideas are, people tend to rate those provided by ChatGPT more highly than those provided by people.

Equally interesting? The fact that all of these ideas - once you go through them - are of products which already exist.

This raises a pretty complicated question - when we evaluate how “creative” or “innovative” an LLM is (a corollary of the forever-sought out-of-domain generalization for generative methods), what is actually being rated?

To further complicate matters, and abstracting TS Kuhn’s core idea in his “The Structure of Scientific Revolutions”: people have a hard time seeing radical change as a legitimate option.

So, whatever innovation or creativity that comes from a human or an LLM is subject to a fundamental force (at least for product innovation): innovation has to be recognizable. In other words, it has to happen within a relatively narrow realm of possibilities.

A big LLM selling point is that they can innovate - see OAI and Google’s recent efforts in injecting their AI products in research endeavors. But when we look at hard data - Subbarao Kambhampati’s work or recent papers from Apple - we see clear evidence that LLMs cannot solve truly novel problems.

So why are we seeing this grand pursuit? My bet is on the - overblown - promises that the AI industry has made to justify their existence. Sam Altman has “mused” that one day we might be able to ask ChatGPT to cure cancer. He also recently said that AI will be able to solve climate change while claiming that most of Earth’s electricity should eventually go to run AI. I think you get the point: there is no shortage of exaggerated claims to get funding.

However, the kind of AI contributing to research - “solving cancer” or “saving the planet” - is not the kind of AI that Altman produces. The AlphaFold protein structure database (powered, you guessed it, by not-an-LLM AlphaFold) is one such kind of an AI, providing pretty good predictions of how proteins look in three-dimensions. GraphCast or WeatherNext are not-an-LLM AI models which do a pretty good job at forecasting the weather. AI has also led to enhanced drug development or to improved medical image processing.

These are narrow applications - far from what attracts billions in funding. But they are capable of creating real value using AI (or similar) applications. Even in the age of agentic AI - where we expect LLMs to interact with one another - we will still require highly accurate tools which can be used to solve real-world problems. These tools will depend largely on specialist systems capable of solving specific problems with high fidelity - not on general intelligence (whatever that may end up being).

And mind you: LLMs can actually be useful for some tasks! If you are equipped with a working brain, curiosity, and a healthy dose of skepticism. They can be super helpful in sifting through large documents or searching for information across multiple sources. Even for a little speculative methodological banter they can be super helpful. It’s just that the “innovation” they can muster is only impressive when we forget that what we don’t know can fill a library (in my case many times over).

At the end of the day AI is already doing some pretty cool things! The vast majority, however, does not depend on the narrow definition of AI as text generation. And innovation - at least in the way that we, as people, can measure it, is not really a necessity for that.