from the an-ai-did-not-write-this dept
After posting the following AI-generated images, I got private replies asking the same question: “Can you tell me how you made these?” So, here I will provide the background and “how to” of creating such AI portraits, but also describe the ethical considerations and the dangers we should address right now.
Generative AI – as opposed to analytical artificial intelligence – can create novel content. It not only analyzes existing datasets but it generates whole new images, text, audio, videos, and code.
As the ability to generate original images based on written text emerged, it became the hottest hype in tech. It all began with the release of DALL-E 2, an improved AI art program from OpenAI. It allowed users to input text descriptions and get images that looked amazing, adorable, or weird as hell.
Then, people start hearing about Midjourney (and its vibrant Discord) and Stable Diffusion, an open-source project. (Google’s Imagen and Meta’s image generator are not released to the public). Stable Diffusion allowed engineers to train the model on any image dataset to churn out any style of art.
Due to the rapid development of the coding community, more specialized generators were introduced, including new killer apps to create AI-generated art from YOUR pictures: Avatar AI, ProfilePicture.AI, and Astria AI. With them, you can create your own AI-generated avatars or profile pictures. You can change some of your features, as demonstrated by Andrew “Boz” Bosworth, Meta CTO, who used AvatarAI to see himself with hair:
Startups like the ones listed above are booming:
In order to use their tools, you need to follow these steps:
1. How to prepare your photos for the AI training
As of now, training Astria AI with your photos costs $10. Every app charges differently for fine-tuning credits (e.g., ProfilePicture AI costs $24, and Avatar AI costs $40). Please note that those charges change quickly as they experiment with their business model.
Here are a few ways to improve the training process:
- At least 20 pictures, preferably shot or cropped to a 1:1 (square) aspect ratio.
- At least 10 face close-ups, 5 medium from the chest up, 3 full body.
- Variation in background, lighting, expressions, and eyes looking in different directions.
- No glasses/sunglasses. No other people in the pictures.
Approximately 60 minutes after uploading your pictures, a trained AI model will be ready. Where will you probably need the most guidance? Prompting.
2. How to survive the prompting mess
After the training is complete, a few images will be waiting for you on your page. Those are “default prompts” as examples of the app’s capabilities. To create your own prompts, set the className as “person” (this was recommended by Astria AI).
Formulating the right prompts for your purpose can take a lot of time. You’ll need patience (and motivation) to keep refining the prompts. But when a text prompt comes to life as you envisioned (or better than you envisioned), it feels a bit like magic. To get creative inspiration, I used two search engines, Lexica and Krea. You can search for keywords, scroll until you find an image style you like, and copy the prompt (then change the text to “sks person” to make it your self-portrait).
Some prompts are so long that reading them is painful. They usually include the image’s setting (e.g., “highly detailed realistic portrait”) and style (“art by” one of the popular artists). As regular people need help crafting those words, we already have an entirely new role for artists under prompt engineering. It’s going to be a desirable skill. Just bear in mind that no matter how professional your prompts are, some results will look WILD. In one image, I had 3 arms (don’t ask me why).
If you wish to avoid the whole prompts chaos, I have a friend who just used the default ones, was delighted with the results, and shared them everywhere. For those apps to be more popular, I recommend including more “default prompts.”
Potentials and Advantages
1. It’s NOT the END of human creativity
The electronic synthesizer did not kill music, and photography did not kill painting. Instead, they catalyzed new forms of art. AI art is here to stay and can make creators more productive. Creators are going to include such models as part of their creative process. It’s a partnership: AI can serve as a starting point, a sketch tool that provides suggestions, and the creator will improve it further.
2. The path to the masses
Thus far, Crypto boosters didn’t answer the simple question of “what is it good for?” and have failed to articulate concrete, compelling use cases for Web3. All we got was needless complexity, vague future-casting, and “cryptocountries.” On the contrary, AI-generated art has a clear utility for creative industries. It’s already used in various industries, such as advertising, marketing, gaming, architecture, fashion, graphic design, and product design. This Twitter thread provides a variety of use cases, from commerce to the medical imaging domain.
When it comes to AI portraits, I’m thinking of another target audience: teenagers. Why? Because they already spend hours perfecting their pictures with various filters. Make image-generating tools inexpensive and easy to use, and they’ll be your heaviest users. Hopefully, they won’t use it in their dating profiles.
Downsides and Disadvantages
1. Copying by AI was not consented to by the artists
Despite the booming industry, there’s a lack of compensation for artists. Read about their frustration, for example, in how one unwilling illustrator found herself turned into an AI model. Spoiler alert: She didn’t like being turned into a popular prompt for people to mimic, and now thousands of people (soon to be millions) can copy her style of work almost exactly.
Copying artists is a copyright nightmare. The input question is: can you use copyright-protected data to train AI models? The output question is: can you copyright what an AI model creates? Nobody knows the answers, and it’s only the beginning of this debate.
2. This technology can be easily weaponized
A year ago on Techdirt, I summed up the narratives around Facebook: (1) Amplifying the good/bad or a mirror for the ugly, (2) The algorithms’ fault vs. the people who build them or use them, (3) Fixing the machine vs. the underlying societal problems. I believe this discussion also applies to AI-generated art. It should be viewed through the same lens: good, bad, and ugly. Though this technology is delightful and beneficial, there are also negative ramifications of releasing image-manipulation tools and letting humanity play with them.
While DALL-E had a few restrictions, the new competitors had a “hands-off” approach and no safeguards to prevent people from creating sexual or potentially violent and abusive content. Soon after, a subset of users generated deepfake-style images of nude celebrities. (Look surprised). Google’s Dreambooth (which AI-generated avatar tools use) made making deepfakes even easier.
As part of my exploration of the new tools, I also tried Deviant Art’s DreamUp. Its “most recent creations” page displayed various images depicting naked teenage girls. It was disturbing and sickening. In one digital artwork of a teen girl in the snow, the artist commented: “This one is closer to what I was envisioning, apart from being naked. Why DreamUp? Clearly, I need to state ‘clothes’ in my prompt.” That says it all.
According to the new book Data Science in Context: Foundations, Challenges, Opportunities, machine learning advances have made deepfakes more realistic but also enhanced our ability to detect deepfakes, leading to a “cat-and-mouse game.”
In almost every form of technology, there are bad actors playing this cat-and-mouse game. Managing user-generated content online is a headache that social media companies know all too well. Elon Musk’s first two weeks at Twitter magnified that experience — “he courted chaos and found it.” Stability AI released an open-source tool with a belief in radical freedom, courted chaos, and found it in AI-generated porn and CSAM.
Text-to-video isn’t very realistic now, but with the pace at which AI models are developing, it will be in a few months. In a world of synthetic media, seeing will no longer be believing, and the basic unit of visual truth will no longer be credible. The authenticity of every video will be in question. Overall, it will become increasingly difficult to determine whether a piece of text, audio, or video is human-generated or not. It could have a profound impact on trust in online media. The danger is that with the new persuasive visuals, propaganda could be taken to a whole new level. Meanwhile, deepfake detectors are making progress. The arms race is on.
AI-generated art inspires creativity, and enthusiasm as a result. But as it approaches mass consumption, we can also see the dark side. A revolution of this magnitude can have many consequences, some of which can be downright terrifying. Guardrails are needed now.