ChatGPT is now running inside Snapchat, Notion, and other apps. Will it give them a sustaining advantage — or just line OpenAI's pockets?
We need to start pinning these ChatGPT developers/executives/etc down and force them to answer the following questions:
- Are you aware of the problem of hallucination in LLMs? Have you personally used a service like ChatGPT and observed it hallucinating?
- Will your app work reliably if ChatGPT hallucinates? How often do you expect this to happen?
- If ChatGPT hallucinates and your app misinforms your customers, could this have negative consequences to their well-being? Do you have any mitigation strategies to prevent such hallucinations from reaching your users?
It should be clear from my pissy tone that "I'm sure Sam Altman will figure it out" isn't an acceptable answer to any of these questions :) Indeed, there are very good reasons to think that many of these ChatGPT-enabled applications won't work very well as described:
- Why would you pay for a tutoring service that makes up fictional historical events or screws up basic scientific definitions? ChatGPT is on the record doing both.
- Why would you pay for a note-taking service that inaccurately summarizes your notes? ChatGPT frequently screws up text summarization, ignoring key details and inventing details that aren't in the original.
- Why would you pay to be a "retail partner" of a shopping app if that app misstates your products or prices? See above.
On and on and on. These are not pedantic questions! All of these scenarios are guaranteed to happen with ChatGPT as it stands today - and it's not just GPT-3.5. Bing AI seems even worse about hallucinations, and it cannot be trusted to summarize a single document even after today's update.
People need to be demanding answers before forking over cash. But what seems to be happening instead is that customers and investors alike are senselessly enthralled with ChatGPT and not rationally analyzing its limitations, instead imagining what it *could* be.
Where I think Web 2 failed (after, let’s face it, one of the most impressive starts in the history of mankind, just this side of, say, penicillin and manned flight) is that apps started solving problems that people don’t actually have.
Products should, ideally, make the lives of actual human beings better. As social apps continue to flounder and sputter with the fluctuations in the ad market, I’m always reminded that the solutions those companies are building (expensive 3D glasses) aren’t solving any real problems. They are SOLUTIONS in SEARCH of a problem.
That’s where I think we stand today, so far as I’ve seen anyway, with the rollout of ChatGPT. The fact that people are shoe-horning use cases into what is ultimately a large, predictive search engine (I very much agree with what you, Casey, and Kevin said on a recent Hard Fork that we just don’t have the right language to discuss these yet). When companies unleash tools like that into the general world, people will generally fail to prescribe life-changing uses for them. That’s as strong an argument as I can make for why UX isn’t part of the thing…it’s the whole thing.
However, IMO, the real gains of generative AI are NOT going to be for the average consumer. Sure, succinct search and smarter online shopping tools are useful, but they’re not game-changing.
Academic researchers, engineers, theoretical mathematicians, and so on need tools like this far more than we do. In fact, those are the professionals who should have exclusive use of those tools right now. These novel and somewhat insignificant consumer integrations (and light enterprise, like your Notion example) are…fine. Game changing? Hardly.
The moonshot use case for OpenAI’s tech hasn’t happened yet, and I think it needs more time in the hands of sophisticated engineers before we get to it.
Thanks for the tip about reinstalling Tweetbot (😭) in order to click the "I don't need a refund" button.
Great article again. I wanted to highlight a few points on the AI landscape around these chatbots:
1. Regarding "individualization": The models already have different prompts & sampling configurations, and hints of different base models (can dig up Sam Altman interviews if needed, e.g. Bing may use two models in tandem).
2. The interface point is spot on. The winners in this space have the best UX, and fastest (important if OpenAI adds tiers to pay for speed, or more models are at their level).
3. Electricity analogy isn’t as good, because when a cheaper model is deployed it’s easy to switch (no real hardware). You can’t poof get new GPUs, but you can get a new model when its on HuggingFace.
I wrote about these issues and the various companies prepping competitor models (both open and closed source) earlier this week: https://robotic.substack.com/p/rlhf-battle-lines-2023
Fantastic take. So many thoughts.
1. I am not sure there is going to be a "most" when it comes to benefits reaped from generative AI. New markets will elbow in and ebb and flow in size. So rather than "most" I think it is better to describe what will benefit "greatly". Then we can allow for multiple answers.
2. Notion and Snap are b2c, but the cloud computing boom was b2b. So the current moves IMO aren't that important.
3. Related to 2, I think containerization and virtualization both are parables to draw conclusions about the evolution of generative AI. By that I mean platforms make a lot of horizontal money and commoditize. Tools that are able to use those platforms for targeted value will grow vertically. The vertical lanes are largely unexplored evolutions of current verticals, while the horizontal bricks are easier to understand but could commoditize.
4. From the perspective of core competency I think there will be many more verticals like Notion and Snap than horizontals.
5. OpenAI may benefit from incentivizing data sharing from developers. Companies would possibly be willing to share data in return for the sweet sweet nectar of free credits.
I'm not convinced a natural language query interface is going to take off. Take your examples and how we'd do this today with existing tools.
> What was I worried about last summer?
Query "worried" with a date range
> When’s the last time I saw my friend Brian?
Query "Brian" sorted reverse chronological
The benefit of natural language hete is marginal. Maybe saves some clicks. But there's a downside -- in most cases, natural language takes longer to say / type. And there's a false confidence inflicted via current conversational interfaces. Your last *note* about Brian is on two years ago is different than saying you saw Brian two years ago. Language models have a risk of conflating these things. Traditional search is more honest.
There's still a lot language models can help. Summarization is cool (if we can workaround the overconfidence problem). And as an input for generating expensive content like large blocks of text or art, that's cool. But the search-based use cases seem overblown.
I see useful personal roles for generative AI. Fed on your data - journal entries, calendars, emails - to give you back whatever you need. A personal assistant, a guide, an advisor, a therapist, a friend. Of course, we will see in it only what we want to see in it. Some will see more humanity than others. Some will see divinity.