Artificial Intelligence (AI) is a topic that is generating a lot of attention, as well as a lot of hyperbolic article titles. As it appears that AI-generated (insert object here) aren’t going to go away anytime soon, I decided to look into how I could incorporate AI into my work at Onebridge. I am the Senior Visual Communications Specialist here at Onebridge, so my primary focus has been in the realm of AI-generated art, but I feel that a number of my reflections in this article could apply to other AI-generated topics as well.
Before we get too far ahead of ourselves, I want to stress that the focus of this series will be to provide some clarity into the possibilities (and limits) I have personally experienced with AI, as well as an in-depth walkthrough of creating illustrations and animations with AI for marketing purposes (look for that article later). What I won’t be covering is the moral or ethical considerations when it comes to AI.
The positions and opinions on that are both varied and vitriolic, and I don’t feel it is entirely beneficial to add one more needle to the digital haystack. In my opinion, Pandora’s Box has been opened, and while I hope for better governance and ethical guidelines on what data is used to train AI, I don’t believe any amount of litigation will be able to put AI-generated outputs back in the box.
Okay, with that said, let’s look at Artificial Intelligence as it applies to the creative process.
The first thing I would like to define is what AI is and isn’t. My friend (and manager) Chris Hoyt has a fun and insightful video on the misuse of the word (insert link), that I would love to borrow. In short, he argues that companies and media outlets are using Artificial Intelligence, the way it exists in the popular imagination, to improperly define to the general public this new technology we have before us. He argues (and I agree) that a better definition of our current state of AI would be Automated Intelligence. So, how does this word swap help us?
It de-mystifies what the AI is doing, as well as provide a greater degree of accountability to both the developers and adopters of AI.
Automated Intelligence is not thinking and making decisions cognitively or mysteriously. Instead of a crystal ball, it is more like a mirror that is reflecting and imitating what we put into it (for better or worse). In AI-generated art, we are feeding the models vast sums of visual information, then using prompts and parameters to extract our desired information. In that sense, AI-generated art is similar to a BI dashboard that compiles and visualizes the data we have fed it into appealing insights and information. Just like with a BI dashboard, when we run into issues or inconsistencies, we shouldn’t start with the platform, we should start with the data.
If the data has issues (i.e. unclean/unrefined data), then our outputs will carry over those same issues. One humorous example of this in the world of AI-generated art is the problem AI frequently has with illustrating hands. If we consider the AI as a mirror, then what it is showing us about ourselves is that artists also struggle with hands! The AI has been fed illustrations from all over the internet, from artists of all skill levels, and a common “dirty” part of that data is poorly drawn hands.
This is the same reason why a necessary aspect of creating AI art is the use of both positive and negative prompts to filter the “data” the AI generator is referencing. In the initial prompt, it is common to use a litany of adjectives to direct the AI. Common ones include referencing an art community (like ArtStation), an artist, or a software program (Unreal Engine, Blender, etc). This helps the AI refine its data sets, and you are more likely to get good initial results from your prompt.
However, things get even more refined and interesting with the use of negative prompts. It is just as important to define for the AI what you would rather not see as much as what you want it to produce. A common copy/paste negative prompt that is a good baseline example is:
ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft
Negative prompting is not a silver bullet, and typically the more complex and unique the request, the more difficulties the AI has with fulfilling it without some of the above issues.
Here’s an example of a prompt output, the same output with positive prompting, and finally with both positive and negative prompting (prompts will be below each image). These were all done in a free online art generator called Playground.AI. There are additional “controls” that were employed in this example, but for the sake of simplicity I will cover them more in the follow-up article.
In addition to the prompt, in Playground.AI (and many others) you can also choose a filter, which can dramatically affect the end result (see the examples below):
As you can see, the more parameters you include in your input, the closer you will get to your desired final output. Again, to compare it to getting insights from a BI dashboard, you need to “clean” your data inputs to get better insights. To keep this analogy going further, we are going to talk about some of the “drill down” features you can employ in AI-art generation.
Beyond the positive and negative prompting, there are a host of other parameters that can affect AI-art generation that allow you to “drill down” to a better end result. Let’s look at a few together:
• The Base Model: Not a lot of choice here; for most people beginning their foray into AI-art generation, you’ll start with Stable Diffusion 1.5 (it has the most models built for it), though more people are beginning to develop models for 2.1. The biggest difference between the two is that 2.1 moved away from its heavy reliance on prompts that refer to actors/actresses and particular art styles. 2.1 also requires more practice with negative prompting, but depending on how the legal aspects of AI-art generation play out, 2.1 might become the ONLY model to work with.
• The Model (or Filter): While it is possible to generate AI-Art with only the base model, using one of the refined models available can help accelerate the process of getting to your end result. These models vary with the different platforms available, and I have found that I will use different platforms depending on what style of art I am looking to produce. Some platforms even allow you to train your own model, but that is a process worth its own article.
• A Reference Image: Similar to the model, a reference image isn’t required, but it can help accelerate your process by focusing the AI’s efforts. Many platforms provide several options when using a reference image, such as the strength of the image’s influence on the final product, to choosing to use the image as a pose reference vs a visual/style reference. This is also really helpful in the refinement process, since you can use one of the iterations created as a reference image, to establish a baseline look/feel you are trying to move toward.
• Number of Images Generated: Since AI-art is an iterative process, it helps to cast a wide net at the beginning of the process by having the AI generate multiple versions at once, then narrowing the results as you find a happy baseline to move forward with. Most platforms will allow you to create anywhere from 4-8 images at a time.
• Prompt Strength: Just as you can set the strength of a reference image, you can also set the strength of your prompt. Initially it is a good idea to leave this at its default setting, but if you discover the AI isn’t including aspects that are important to you from the prompt, you can increase the strength to try and force the AI to adhere more closely. I find that this in conjunction with the reference image strength is where I spend the most time adjusting in order to refine the final image.
• Seed Number: This is the “secret sauce” of every iteration. The AI randomly generates a seed number that is tied to a set of images generated, and if you look closely at each set of images produced from that seed, you can see some common design elements or influences associated with that seed. Once you find a seed you want to explore further, you can lock that seed number in in order to reduce the amount of variance in the images.
• Sampler Method: This is one of the more complicated places to drill down, but for all pragmatic purposes you can leave it at its default setting. I have found that some samplers appear better suited to certain subjects than others (for example, portraits of people and faces vs landscapes or backgrounds), but it is probably an area requiring more advanced experience.
As you can see, there is definitely more areas of human input and refinement than most people realize with AI-art generation. Yes, you can just type in something like “The Mandalorian riding a Vespa at sunset” and get a pretty good output in just a few iterations, but once you move to subjects that aren’t pop culture references with large datasets for the AI to reference, it requires more human input/guidance.
The other thing to recognize that almost ALL AI-generated art requires some clean-up by a human to be used in any professional capacity (which will be covered in-depth in the next article). AI-generated art is really good at passing what I call the “at-a-glance” test, where the basic composition or elements are there, but on closer inspection you begin to see issues. Using the Mandalorian picture shown here (click the image to enlarge), if you spend more than just a few seconds looking at the picture, you’ll notice multiple kickstands, weird placement of the side mirrors, nonsensical armor plating, and weird anomalies in the black visor of the helmet.
In conclusion, what should we do with AI? Is it going to steal our jobs and destroy human creativity, or will it become another discipline or tool in the creative toolbox? Can we use it in a professional capacity, or is it only good for pop culture hobby art? In my opinion, I think AI-generated art is allowing people who might lack the ability or opportunity to express their thoughts and ideas in a visual medium.
I think it allows creative teams to explore visual mediums and leverage them in creative ways that they might have not been able to do within their limited resources. I think it is also a great place to explore ideas and other ways of thinking that can help creatives think “outside the box”. I think there is a “science” to it, but there is still plenty of room for the “art” aspect of human input.
To tie it back into what Onebridge is known for, data management and consulting, as long as human are involved in the input (the data, with all of it possibilities and problems), then it will still take good, competent humans to achieve a beneficial output (clean, actionable, visualized data that helps facilitate intelligent decisions).
In the next article, I am hoping to further “de-mystify” AI-generated art by taking you through the process of prompting the AI to generate a piece of art for marketing purposes, then taking that artwork from “at-a-glance” ready to finished product. Until then, if you need help from good, competent humans when it comes to your data, contact us today.