Earlier this month, I wrote about how you can fast-forward through time by using the paid versions of frontier AI products. Today, let’s talk about another of those products, the text-to-image generator DALL-E. The latest version tells us a lot about how OpenAI is thinking about product, policy, and safety as its core technology continues to improve.
These days, ChatGPT gets all the attention. But six months before OpenAI’s chatbot arrived on the scene and shifted the tech world’s collective focus to generative artificial intelligence, the company released another tool that spoke to similar possibilities. DALL-E 2, which I got access to in June of last year, captivated me from its first release.
As a kid, I was an enthusiastic artist, delighting in drawing my own comic books on weekends and after school. But I quickly found the limits of my talent, and despite trying my best to follow the instructions in books designed to make me a better illustrator, I never really got very far. Fast forward a few decades, and suddenly, I could conjure whole worlds just by typing words into a box. It felt like magic, in a way that the tech industry often promises but rarely lives up to.
In the 18 months since DALL-E 2 emerged, that picture has become complicated by questions about copyright, permissions, and what — if anything — the makers of text-to-image generators like DALL-E owe the artists whose work their models are trained on. Stock photo company Getty Images sued Stable Diffusion earlier this year, saying the company’s model had improperly been trained on its photos. Similar lawsuits seem likely to follow. Meanwhile, Adobe demonstrated an alternative path forward by creating its own Firefly image generator using only licensed imagery and says it will compensate creators whose work was used in the training process.
Despite the legal and ethical certainties around text-to-image generators, though, the field has continued to develop rapidly. Midjourney, which launched shortly after DALL-E 2, attracted 15 million users and is generating hundreds of millions of dollars in annual revenue with its own image generator despite currently being available to use exclusively on Discord. Stable Diffusion reached 10 million users last October. As they grew, the quality of their images improved exponentially, as DALL-E’s — while still impressive by pre-2022 standards — remained stagnant.
Then, on Thursday, DALL-E 3 arrived. After a short time in public beta, the next generation of OpenAI’s image generator is now available to enterprise customers and to subscribers to ChatGPT Plus. (You can also use a free version through the Bing Image Creator.)
I’ve been using DALL-E 3, both through Bing and ChatGPT, for the past few weeks. On Wednesday, I also was briefed on the new version by Gabriel Goh, a research scientist at OpenAI who helped to build the new model, and Sandhini Agarwal, who works on AI policy at the company.
Here are five surprising things I’ve learned about DALL-E 3 in my first weeks of using it.