Trying to parse all the rumors about OpenAI’s plans for the future is crazymaking — it does, in fact, seem to be driving a not-insignificant number of people sort of insane. Some of this is a natural consequence of its project: New AI models do things that weren’t previously possible in software, and can be difficult to judge whether a given new breakthrough falls into the category of “cool trick” or “consequential development that will change all of our lives forever.” It’s also a consequence of the company’s messaging, which oscillates in substance and tone, leaning into and away from the most sensational rumors and theories about the company. One moment CEO Sam Altman is posting riddles about being unsure whether or not his company has achieved artificial general intelligence, or AGI, which will either usher in an era of acceleration toward terrifying superintelligence or… “matter much less” than people expect. The next, Altman and his staff are insisting that the hype is getting out of control and that we’re “early” in a new “paradigm,” with lots of work to do on the way to… somewhere.
As a communications strategy, this has clearly been effective, or at least not gotten in the way. Massive amounts of capital are lining up behind OpenAI, in the form of direct investment and, most recently, a joint infrastructure project with the imprimatur of President Trump. (Altman on Trump in 2016: “an unacceptable threat to America;” Altman on Trump this week: “incredible for the country in many ways.”) It relies on a split that’s both natural for a research-led firm like OpenAI and, I think, cultivated by the company, between work at the “frontier” — articulated in terms of specialized benchmarks, promising training and inference methods, “reasoning models,” and the attendant theoretical possibilities with inherently unpredictable consequences — and the company’s actual products, which everyone can try and which hundreds of millions of people have. It’s the former category that’s dominated OpenAI coverage over the last year, and especially the past few months: Fallen benchmarks; speculation about potential paths for AGI and ASI; infrastructure needs; and the perhaps uniquely attractive prospect, to investors, of mass labor automation. Meanwhile, although the company has been making frequent updates to its models and products, the mainstream user experience of OpenAI has, in contrast to the sudden and shocking release of the ChatGPT in 2022, improved incrementally.
On Thursday, OpenAI made an attempt to recouple its vibes and its product lineup with the release of Operator, “an agent that can go to the web to perform tasks for you”:
Operator can be asked to handle a wide variety of repetitive browser tasks such as filling out forms, ordering groceries, and even creating memes. The ability to use the same interfaces and tools that humans interact with on a daily basis broadens the utility of AI, helping people save time on everyday tasks while opening up new engagement opportunities for businesses.
OpenAI posted a longer demo in a video:
This is similar to Anthropic’s “computer use” feature in Claude, which was announced last year. It’s an early step for OpenAI into the vaguely defined category of AI “agents,” which are intended to carry out multi-step tasks on users’ behalf. Agents, and underlying agentic models, are the industry’s obsession of the moment, in no small part because they represent a step toward the intoxicating sales pitch for AI employees. First comes software that reads your screen and books you a hotel. Then comes software that does the entire job. That’s the trillion-dollar idea.
OpenAI, like Anthropic, is clearly well on its way to managing some browser-based tasks for users. But the messy reality of the web, combined with the rising stakes of software that can make purchases or initiate communication on a user’s behalf, brings to mind the race to build autonomous cars. In that case,rapid early progress fostered a false sense of imminence, followed by a longer-than-expected process of working out edge-cases, ironing out bugs, and years of testing, with wider deployment still TBD. In early form, according to testers, Operator’s preview is interesting to watch — it’s running your screen! it’s clicking and typing! — but is also unreliable, slow, and easy to confuse. Casey Newton in Platformer:
My most frustrating experience with Operator was my first one: trying to order groceries. “Help me buy groceries on Instacart,” I said, expecting it to ask me some basic questions. Where do I live? What store do I usually buy groceries from? What kinds of groceries do I want?
It didn’t ask me any of that. Instead, Operator opened Instacart in the browser tab and begin searching for milk in grocery stores located in Des Moines, Iowa.
At that point, I told Operator to buy groceries from my local grocery store in San Francisco. Operator then tried to enter my local grocery store’s address as my delivery address.
After a surreal exchange in which I tried to explain how to use a computer to a computer, Operator asked for help. “It seems the location is still set to Des Moines, and I wasn’t able to access the store,” it told me. “Do you have any specific suggestions or preferences for setting the location to San Francisco to find the store?”
Lots of money and talent is focused on making this sort of thing actually work, and the big AI firms are all projecting confidence. As with self-driving cars, though, a free-roaming piece of software that inhabits your identity — or even just has your credit card — has to work, or at least not catastrophically fail, basically all the time. An assistant that needs more help than it provides is not worth having; an assistant that screws up is a liability. If buying groceries through a streamlined interface is deceptively complicated, what isn’t?
Whether (or how quickly) software like this becomes more viable — as tools and as products — is one set of questions. But what happens if features like this both work and become widely available — if the hundreds of billions of dollars funneling into AI achieves its purpose?
In OpenAI’s video examples, Operator interacts with the computer in a manner mostly indistinguishable from a (slow-moving, easily confused) person, clicking around to book a restaurant on OpenTable, shopping for groceries, and browsing concert tickets. Currently, Operator is a limited test, available to Pro users who pay $200 a month. But let’s say millions of users are able to deploy agents to browse the web or use apps — or, in a more general sense, interact with businesses or people. The world around them won’t stand still. This is easy to understand on a personal scale. Talking to someone’s human assistant is not the same as talking to that person, even if you still get what you need from them. Likewise, bouncing through a phone tree is different from talking to a human, even if you still eventually get the information you’re looking for. You’re transacting, but you’re not getting attention.
It’s not much harder to think about at a corporate scale, where attention is likewise important, but also measured and monetized. If OpenTable, a business with a long history of fighting attempts to automate and game its systems with bots, began to realize that many of its users were booking tables using agents, would it respond with hostility? In the narrow frame of OpenAI’s product line, Operator is an early demo of new capabilities. In the wider context of the web around it — the web it will need to manipulate and interact with — its clearest precursors are tools for sniping, scalping, running up metrics, and spamming. Because it runs through a browser identifiable as OpenAI’s, Operator already has related problems, according to tester Dan Shipper:
The downside is that many sites like Reddit already block AI agents from browsing so they can’t be accessed by Operator. In this research preview mode, Operator is also blocked by OpenAI from accessing certain resource-intensive sites like Figma or competitor-owned sites like YouTube for performance or legal reasons.
Other early users encountered similar issues:
I was trying to get some pricing from eBay via Operator because I’m always looking for ways to enhance my software with AI. To my disappointment, eBay already flagged it with anti-bot detection which resulted in GPT quickly opting out and responding that it couldn’t proceed…
This blocking isn’t a response to the arrival of “agents,” exactly — it’s the result of earlier measures websites have taken against firms scraping for AI training data. The web is already having a pretty strong immune response to AI. How might it respond to the default bot-ification of users?
But warmer reactions would be complicated, too. A more amenable e-commerce partner might be fine with its customers using agents to make purchases, but it would still find the resulting state of affairs strange, at minimum. The company might ask OpenAI: Why don’t we just do this more directly? If you want your users to be able to order products through your chatbot, why don’t we just let your software browse our product listings in a less error-prone and wasteful way? Maybe we can build an API? Why not work together, so your product actually functions and we don’t get left behind?
You can already order something from Amazon through Alexa not because it has advanced agentic AI capabilities to browse the platform like a person, but because Amazon made special accommodations and built special tooling, invisible to users, to connect one product with another. It’s software talking to software, not humans talking to software pretending to be humans to use software.
OpenAI’s ideal outcome would be a bunch of other firms rushing to help its products work, to integrate as deeply as possible with ChatGPT, and to try to anticipate and eliminate the ways in which brittle “agents” might fail from their end (in other words, to bring the web into something more akin to its own sandbox). Setting aside the AI employee pitch, this is how the company might turn its chatbot into a more versatile tool, an “everything app,” or a chat interface for the rest of the web. (In 2023, they attempted to do this by opening an app store, which they advertised with a similar pitch, minus the emphasis on the word “agent.” It didn’t catch on.) There are two ways OpenAI might get leverage to make this happen. One is that customers demand it: They use ChatGPT, Operator works, and they want the rest of the world to work with Operator, even if other firms are wary of OpenAI. This is the hard way, and the current state of Operator suggests that, even if it’s possible, it would be a long and bumpy road. The other way is simpler and more appealing, at least for OpenAI: Declare your success ahead of time, insist that capable agents are a mere matter of time and scaling, and suggest everyone get in line now rather than later to achieve the inevitable together, thereby making your actual task easier, and achieving truly broad agentic capabilities somewhat less important. A similar story has convinced investors, not to mention the new administration. Will it work on everyone else?