LLMs are already good enough to revolutionize the internet

As we’ve been figuring out the new thing, I’ve spent the vast majority of my time recently writing code that uses generative AI – chatbots, data extractors, automation tools, moderation tools, ad-hoc classifiers, you name it. (I’ve also been writing a lot of code with LLMs, but that’s a topic for another time.)

People who have worked with me know that I can be a bit, uh, reserved when it comes to hype and how much technology trends really matter, but I’ve bought in pretty hard on genAI. The thing I’ve seen that best represents my take are Drew Breunig’s notes about Sober AI. Outside AI labs, most of the builders I’ve talked to who are successfully shipping product seem to roughly agree – LLMs are powerful technology but not digital gods.

Leaving aside AI labs, here’s a not-very-secret secret of the GenAI startup world: some of this stuff doesn’t quite work yet. There’s a lot of people building agents and copilots (some of which I’m using every day) but some of the agents aren’t yet agenting and some of the copilots have an unfortunate tendency to execute a controlled flight into terrain. The technology is getting better, and we’re getting better at using it – but it’s unclear that scaling alone is going to give us another GPT4-style capability jump, where some use cases went from “doesn’t work” to “nailed it” overnight.

I’m pretty optimistic that models are going to keep getting better, but here’s another secret: in a lot of ways, it doesn’t matter if we ever get another jump in raw AI capabilities.

Over the past few years, we’ve seen such a rapid leap in capability that, unless you’re building your own tools, you’re unlikely to have felt how much the basic user experience of the internet – so, everything – is going to change. Even if OpenAI, Anthropic, or Meta never release another model update, we’ve got technology that works really well for things like:

Automatic content summarization
- Relatedly, content transformations like densifying or rephrasing content to be more efficient or understandable for you specifically – if you’ve ever typed “Explain this to me like a child” or “Assume I’m already an expert” into ChatGPT, you’re already familiar with this.
Automated translation (including into languages most entities couldn’t economically justify paying translators for)
Automated speech recognition and transcription that’s 100X faster than realtime
Automated speech generation and now interaction that can blow commercial IVR systems out of the water.
- Between these two capabilities, with a bit of glue code you can now never listen to another podcast, or consume absolutely all your information as a podcast, although, you know, don’t.
Ad-hoc, general purpose content classification
- Prosaically, you can think of this like: “tools that can automatically organize your files and content according to your preferences with only a handful of examples up front.”
Structured data extraction from basically arbitrary content: define a schema, wire up Instructor and just get started.
Summarize or answer questions about a document or index it for large-scale search based just on a document image, without having to write custom parsing or data extraction code for a particular file type – if you can take a screenshot of a document, you can get it into an interactive knowledge base. (Our long national nightmare of parsing arbitrary PDF files is finally over.)

People are going to quibble about the reliability of GenAI for some of these use cases:

An X post from @AndrewSchmidtFC containing a humorous failure of Apple's new notification summarization feature

…and they’re correct. But: we’re super early at building the right glue and form factors around AI, not just early with AI models themselves. Applications are going to get a lot better and a lot more reliable even with zero improvements in the underlying models. And the ability to do all these tasks isn’t being gatekept by large platforms – for 5/6 of the applications above, you can download a model and run some version of it on your MacBook Air… if it doesn’t have the model baked in already.

Maybe these capabilities are the peace dividends of the AGI wars. Regardless, the genie is not going back in the bottle. Every business, non-profit, government agency and nerdy consumer can deploy more sophisticated intelligence against natural language, audio, images, documents (and increasingly video) than the most powerful internet platforms had 36 months ago. Here’s another way to say it: Maybe we get AGI or maybe we don’t, but current generation AI is the largest power shift towards consumers and independent makers since the dawn of the web, and that’s good enough for me.