OpenAI on Monday introduced a new model called GPT-4o (as in omni) that the company says “reasons across voice, text, and vision.” In practice, this means ChatGPT now responds more quickly to a wider range of input — text, image, voice — provided in more natural ways. You can talk to it, and it talks back; you can show it things, and it tells you what it sees.
OpenAI’s “Spring Update” event was a brisk affair that, due to runaway speculation by AI influencers, necessitated a few disclaimers. This wasn’t going to be a search engine, CEO Sam Altman warned, nor would it be the long-rumored GPT-5. Instead, he teased some “new stuff,” some of which “feels like magic” to him.
For industry watchers, it was an interesting event in a few ways. For one, OpenAI is releasing GPT-4o to all users, breaking with its current strategy of reserving its most capable models for paid subscribers (who will now get higher usage limits among other, smaller benefits). AI enthusiasts had hypothesized for weeks that a pair of chatbots that had quietly appeared on a testing platform — and that seemed better by some measures than GPT-4 — were actually upcoming OpenAI models, and it turns out they were. What wasn’t apparent from those leaks, which let people prod a text-based chatbot, was what OpenAI spent most of its presentation showing off. ChatGPT is now a lot better at talking:
You’ll probably notice a few strange things about the chatbot’s presentation, and you’re meant to. OpenAI says its new voice functionality — it had one before, but it was essentially voice-to-text and text-to-voice features built on top of a chatbot — is responsive enough that it can be interrupted. It can also interpret and express a range of “emotive styles,” meaning that, as with text-based chatbots, ChatGPT will now attempt to assess and choose appropriate spoken tones. The company staged a live demonstration where a parade of nervous, camera-shy executives spoke to the chatbot, which responded with — at least at first listen — substantially more confidence than its human interlocutors had. It was alternately impressive and strange — here it is singing “Happy Birthday” after seeing a piece of cake with a candle in it:
OpenAI is showing off something technologically new here, and we can assume we’ll see similar demos from its competitors, possibly as soon as this week and perhaps from Google. The release also suggests, at minimum, an upgrade to the style of voice assistant currently epitomized by Siri and Alexa, which had promised big things before being demoted to kitchen timers and light switches. It’s also obviously evocative of representations of AI in science fiction, such as the movie Her, in which the lead character falls in love with a piece of software. This thing flatters, giggles, and does voices. It doesn’t exactly respond to being cut off as a person would, but it doesn’t just keep going or drop the conversation. It will perform whatever tone you ask it to but appears to default to an energetic, positive, supportive persona — a helpful co-worker, someone trying to be your friend, or, if you’re feeling suspicious, someone trying to get something from you.
Months of speculation about a new core model from OpenAI and endless hints at the possibility of “artificial general intelligence” from its executives and boosters have set incredibly high expectations for the company’s forthcoming products. What OpenAI presented was instead primarily a step forward in its products’ ability to perform the part of an intelligent machine. There are risks to doubling down on the personification of AI — if people are made to feel as though they’re talking to a person, their expectations will be both impossibly diverse and very high — but there are benefits, too, which OpenAI knows well.
ChatGPT was initially released as a public tech demo; it went viral because of its capabilities but also because it spoke more convincingly and freely than chatbots had before it. It wrote with confidence in a tone that suggested it was eager to help. It was highly responsive to requests even when it couldn’t fulfill them, though it would often try to anyway. There was (and remains) an enormous gap between what the interface suggested (that you were talking to a real person) and what you were actually doing (prompting a machine). With user expectations where they were, this interplay turned out to be hugely powerful. ChatGPT’s persona invited users to make generous assumptions about the underlying technology and, just as important, about where it would, or at least could, one day go.
Such personification is by definition misleading; whether you think that’s a problem depends a bit on what you think OpenAI and other AI firms are up to and how much potential their projects have. The optimistic outlook is that voice, like chat, is simply a specific, unusually natural interface for computers and that the better the illusion is, the easier it will be to tap into the full productive potential of AI. But OpenAI’s sudden emphasis on ChatGPT’s performance over, well, its performance is worth thinking about in critical terms, too. The new voice features aren’t widely available yet, but what the company showed off was powerfully strange: a chatbot that laughs at its own jokes, uses filler words, and is unapologetically ingratiating. To borrow Altman’s language, the fact that Monday’s demo “feels like magic” could be read as a warning or an admission: ChatGPT is now better than ever at pretending it’s something that it’s not.
More screen time
- Why YouTube Should Cut Off Its Thumbnails
- Luigi Mangione’s Full Story Isn’t Online
- Sam Altman’s Mixed Signals