Welcome to Waifu Labs v2: How do AIs Create?

2022-01-07

If you haven't already had a chance to try out the new Waifulabs v2 Model, check it out now!

In the two years since we first launched our our anime portraits AI project, Waifu Labs, our inhuman artist has been hard at work. Recently, it finished drawing its 20-millionth commission (quite an achievement for any artist!)

To count down the release of Arrowmancer, our mobile game illustrated using AI, we’re releasing a series of posts addressing the topic of artificial creativity.

Starting with the number one question: how do AIs (or even humans for that matter) create?

How Does it Work?

(Left: training progress, Right: the finished result)

The type of AI that powers Waifu Labs is called a Generative Adversarial Network. We made an explainer video on how the technology works!

You can think of it as a pair of AIs that spar against each other in order to learn:

The first AI is called the Generator. It’s task is to learn how to draw.
The second AI is called the Discriminator. It’s task is to learn how to tell fake drawings (produced by an AI) from real drawings (produced by a human artist).

Both AIs are exposed to anime data from human artists and offered feedback on how they performed on their respective tasks. At the very end, we separate out the Generator and run it as the artist behind the scenes.

It is interesting to note that like human rivals, it is imperative that they grow at the same rate. When one AI dominates the other, both stop learning.

Below, you can see the Generator’s progress: we’ve collected samples of the work that it produces as it trains. We’ve measured the training time in “steps.”

Step 0: You can see that the AI starts off with absolutely no idea what “art” is. The first picture it draws is noise, a random splash of primordial test-tube goop. It will bumble around like this for a while, trying out strategies to draw more like the human art that it sees.

Step 1024: We see the beginnings of a face come in.

Step 4096: Slightly style-specific features, anime-like strands of hair, start to emerge

Step 13516: We find that, just like human artists, the AI always learns how to draw the eyes first.

Step 23961: Secondary features like ears and shoulders start to come in.

Step 40564: Gradually, the murky shapes resolve into real features. As the size of the features get smaller and smaller, they give the illustration its “texture” and “style.” Color will be the last to come in.

Step 43636: During this phase, the training gets unstable at times, so we have snapshots of occasional horrors like this.

Step 50000: The final result!

For this particular AI, we are able to parallelize the training by building a mini-supercomputer in our living room from scratch to accommodate its specific needs — though that merits its own story another time.

All in all, the Waifu Labs AI took about 2 weeks to reach the quality you see today.

Building a Mental Representation

It is interesting to note that from this process, the AI is not merely learning to copy the works it has seen, but forming high-level (shapes) and low-level (texture) features for constructing original pictures in its own mental representation.

Human artists also have this mental representation in their meat-brains. For many artists, this representation lives beyond the realm of conscious thought in the domain of intuition. This mental representation grows throughout the artist’s lifetime: it matures as a combination of practicing art and the accumulating life experience on the meat-body.

With an AI, the mental representation is locked the moment the training stops, so we can then take it out and peer directly at it. It looks a bit like a coordinate system, and we call this the latent space.

By isolating the vectors that control certain features, we can create results like

different pose, same character:

same pose, different style:

And much more!

Some coordinates in latent space are definitely weird. Our favorite is this gal, affectionately nicknamed “bighead.”

Latent space is immeasurably huge. Parched. Devoid of content. We wade through vast deserts of average artwork to find the coordinates for alluring, captivating characters. Cataloging these coordinates allows us to do cool tricks, like morph between characters.

You can also see the latent space mapping in action on Waifu Labs: by separating the coordinates that control pose, colors, and details, we can create an interface where humans can commission the AI artist to draw what they want!

How do you measure creativity?

A large part of developing the AI is the evaluation of performance, which brings up the philisophical argument of how do you evaluate creativity? Conventionally, FID is the score used to score generative AIs, but this was not quite sufficient for our needs.

To properly score the results of our AI artist, we took a rubric from another domain. Namely, the qualitative rubric that commercial art directors use to evaluate freelance artists.

Quality: The artist can match the requested style (anime portraits)
Diversity: The artist can depict a variety of traits in the requested Quality
Customizability: It is easy to select the specific traits from the space that the artist is able to depict for a specific portrait
Time: The artist can deliver the piece in a timely manner

Some interesting points to note:

Quality is defined as conformity to a style. As far as commercial art direction is concerned, arrangement is key: having beautiful art pieces of different styles doesn’t work as well as having less impressive pieces that complement and match each other.
Customizability is an interesting topic for human artists. This is often referred to in the industry as “professional manner”: the ability to communicate and figure out what the client wants. It’s not necessarily a function of pure art skill as much as it is writing good emails. Similarly, for our machine artist, this is the metric by which we evaluated the UI interaction.

Closing thoughts

Of course, it’s not sufficient to define the sum of creativity by its usefulness in commercial application. However, using these metrics, we were able to reach a good proxy and make an AI formats its outputs in a way that can drive the core creative design and production pipeline of a commercial product.

The creative industry and the act of creativity are two different, deeply intertwined concepts. Though we may never breatch the mystiques of the later, we can peer at it through the lens of the former.

To that end, please take a moment to check out the commercial product in action, our mobile game, Arrowmancer is made using the AI behind Waifu Labs!

When we first approached studios with our AI work, the primary response we received was bewilderment. While we could fulfill the above requirements for freelance illustration, there was no existing project that made use of the concept of infinitely customizable, combineable illustration.

And so we set out to do it ourselves: we made a game that leverages the the unique aspects of generative AI. Is is our hope it will usher in a new age of creativity, a collaborative model of asset production between human and machine creatives!