- William's World - https://home.woodchuckhunters.com/blog -

My AI photo generating process

I’m trying for photorealism, but the only card I can make Stable Diffusion work on is an ‘ancient’ AMD R9 200 series video card with 3GB of VRAM so the resolution is incredibly limited. Still, there’s something magical (and oddly addicting) about being able to render MySpace-like images of fake or imaginary women. It’s as if I’m able to suck a photo off someone else’s phone through an inter-dimensional rift. It’s definitely hitting a dopamine trigger in my reptile brain.

Regardless, sometimes when I post images on Reddit or elsewhere, people ask what my process is and I wanted to get something written down for posterity. So here’s the basic thing that works for me…

  1. First I use an ad-sponsored Stable Diffusion app on my phone that quickly generates tons of “Anime” images and I use terms like “highly detailed” and “realistic” which tends that model toward images that are closer to photorealism.
  2. I probably generate 10s of images with prompts like “woman in pink dress” to get even a single image that pops out and demands some more detail…
anime model rendering of “woman in pink dress”
  1. Next I put that image into the img2img section of SD Automatic1111 [1] on my workstation and try to bring that person to life with a model like the “Edge of Realism”.
    • Approximate settings on my AMD R9 3GB VRAM:
      • 0.5 MP resolution around 512×904
      • CFG between 10 and 20
      • Denoise between 0.45 and 0.55
      • Enable Tiled VAE; encoder size 480, decoder size 48
  2. Before I click “Generate”, I write a prompt from scratch with details that stand out.
    This image had a prompt like,
    “brunette woman in satin metallic dress leaning back against apartment wall”
brunette woman in satin metallic dress leaning back against apartment wall
Click here to earn Bitcoin through your website like me! [2]

5. I ‘listen’ to the image and decide if she’s trying to tell me more about who she could be…

5a. Interestingly (as a person with a media/communications related degree), SD is trained on our own symbolic interpretations of the world so you can prompt it with words that are no longer socially acceptable but that still historically described a type of person…

6. Almost finished! Now, I go back through and prompt with different details to see if anything else stands out…

  1. Finally, if I’m feeling brave, I’ll take some of the details that I’ve ‘uncovered’ to write a fresh prompt in the txt2img field to see what kind of person I can generate.
  2. This is where the real magic happens! Adding something like “bathroom mirror selfie” to whatever details you discovered allows you to generate an image straight out of “alt-reality 2006”
brunette in tight pink shiny silk dress; bathroom mirror selfie

And so that’s that! There’s a debate over whether or not Stable Diffusion creates “art”, but I find it’s more interesting to think of it as a tool of discovery. We can use it to discover the biases in our own mind and also discover what exists inside the amalgamation of the human consciousness that fed it. It’s like lucid dreaming at the greatest scale! I think the future of “neural network artwork” will, for both good and bad, allow us to learn more about the nature of the human experience and I’m excited to continue using it.