Romanian photographer and artist Aurel Manea has used a new text-to-image AI to create beautiful, almost-photorealistic, landscape images.
First noticed by PetaPixel, Manea used Stability AI’s Stability Diffusion—a DALL-E 2-like text-to-image generation tool—to make the series of incredible landscape “photographs.” By using prompts like “landscape photography by Marc Adamus, glacial lake, sunset, dramatic lighting, mountains, clouds, beautiful” he was able to create the shots of entirely made-up places. (You can also see other photorealistic images generated by the AI in the Stability Diffusion Facebook group.)
However, unlike DALL-E 2, Stability Diffusion has limited content filters. That’s partly why it is able to create such realistic scenes, but it also raises a few troubling concerns.
How do these AIs work?
Most of the text-to-image generation AIs that are popular at the moment, like DALL-E 2, Google’s Imagen, and even TikTok’s AI Greenscreen feature, are based on the same underlying technique: diffusion models. The deep-down mathematics are complicated, but the general idea is pretty simple.
Diffusion models work by tapping huge databases of images paired with text descriptions. Stable Diffusion, for example, uses more than five billion image-text pairs from the LAOIN-5B database. When given a prompt, the models start with a field of random noise and gradually edit it until it begins to resemble the written target. The random nature of the initial noise is part of what allows each model to generate multiple results for the same prompt.
In other words, every pixel in an image created by one of these models is original. They’re not copying and pasting random parts of different images in a database to generate something, but subtly shaping random noise to resemble a target prompt. This is why so many objects often appear swirly or slightly misshapen—even Van Gogh-esque.
The problem with no filters
Most text-to-image generation models either have high level content filters—like DALL-E 2—or are limited to researchers—like Imagen. What’s most unusual about Stable Diffusion is that it has relatively limited content filters, and Stability AI plans to make it available to the general public. This raises a couple of potential issues.
To prevent DALL-E 2 from being used to generate misinformation, Open AI blocks people from creating images of real people. Stable Diffusion has no such filter. Over on TechCrunch you can see images of Barack Obama, Boris Johnson (the soon-to-be-former British Prime Minister) wielding various weapons, and a portrait of Hitler. While they aren’t quite photorealistic yet, the technology is going that way and could soon be open to abuse.
The other issue is bias. Every machine learning tool is at the mercy of its dataset. DALL-E 2 has had its issues and, most recently, Meta had to shutdown its chatbot after it started spouting antisemitic election fraud conspiracies. TechCrunch notes that the LAOIN-400M database, the precursor to the one Stable Diffusion uses, “was known to contain depictions of sex, slurs and harmful stereotypes.”
To counter that, Stability AI has created the LAOIN-Aesthetics database, but it is unclear yet if it is truly free from bias.
Are these even photos?
For the past while at PopPhoto, we’ve been discussing how computational photography changes the nature of photographs. These types of generated images are just another outgrowth of the same kinds of research. The question here in particular is: If an AI can one day generate a realistic image of a real place—or even of an imagined place—then what does it mean for landscape photography?
Obviously we don’t know yet, but we’re going to have fun discussing and debating it from here on out.
How can I try Stable Diffusion?
If you want to try Stable Diffusion, you can apply on Stability AI’s website. Right now, it’s just open to researchers and beta testers.