January 28 2023
Prompt Engineering for Stories: A Generative AI Children's Book using ChatGPT and Midjourney
ChatGPT and Midjourney are both examples of generative AI, a form of machine learning that lets users create artwork based on specific inputs, such as text or images. Reading about the possiblities of this technology doesn’t have the same impact as personally trying it. It’s amazing.
I first learned about Midjourney from Stratechery’s interview with its founder, David Holz, creator of LeapMagic. I had no idea that Midjourney was Discord’s most popular server or that Midjourney is one of the biggest consumers of GPUs in the cloud. The images are amazing. Type in text, and out comes a professional image.
Midjourney, released in July 2022, creates images in 60 seconds, solely through Discord. In the Stratechery interview, Holz talks about how the team was using Discord as an easy way to test the software and then realized how much fun it was, so they decided to make the tool available only through Discord. Midjourney uses a freemium business model, with a couple dozen free images and paid tiers.
OpenAI’s ChatGPT, released in November 2022, is also remarkable. It’s a text chatbot, based on the GPT-3 large language model (LLM), and can respond to an impressive variety of questions. Ask it for an essay comparing communism and capitalism, and the essay is ready in less than a minute. Ask for a Python script to ingest a CSV file. Done. Ask for a sonnet about why ChatGPT is at capacity. It displays that on its status page.
Both services take text as an input, referred to as a prompt, and run it through an advanced machine learning model to output a generative AI result. To explore the two services, I thought it would be fun to create children’s books, writing a prompt for ChatGPT so that the output could be fed directly into Midjourney for story artwork. I did one pass myself and then another with my family, and we printed all of them on Shutterfly as 6x6 Instant Books for $4 each.
Prompt Engineering
Prompts are a very new concept, but there is already a term for iterating on them for better results: prompt engineering. My goal was to create a prompt for ChatGPT that would generate a story, and the story sentences would serve as the prompts for artwork in Midjourney. I went with two family favorites: birds and pandas.
ChatGPT: “Write a seven-sentence children’s story titled “Falcon Finds Home” about a peregrine falcon that wakes up in a box, escapes, and flies home above the forest and stream to find its family and rainbow-colored nest, with each sentence written as a prompt.”
- The peregrine falcon woke up inside a small box, feeling confused and alone.
- ItThe falcon quickly realized it needed to escape from the box and began looking for a way out.
- Using its powerful wings, the falcon broke through the top of the box and flew into the open air.
- ItThe falcon soared above the forest, the trees a blur of green beneath it.
- The falcon followed the babbling stream, knowing it would lead it home.
- As Itthe falcon flew, a rainbow appeared in the sky, guiding the falcon towards its destination.
- Finally, the falcon saw its colorful nest built high in a tree, and it knew it was home.
ChatGPT wrote the below story within ten seconds, including the numbering. I made slight alterations to the text to keep the context of the story for Midjourney in every line.
It’s a story and a bit unnverving. I came up with the idea in thirty seconds, and I was able to create all of the assets for a book in around ten minutes, lay it out in Shutterfly in ten minutes, and ship it in a week for $20.
ChatGPT: “Write a children’s story titled “Great Wall of Pandas” in seven sentences about a family of panda that must journey over the Great Wall of China to make it home to their bamboo forest.”
The Great Wall of Pandas is a story about a family of pandas who live in a bamboo forest in China. One day, the mother panda, Peaches, and the father panda, Mr. Panda, decide to take their baby panda, Bao, on a journey to see the Great Wall of China. As they walk along the Great Wall, Bao the Panda asks his parents many questions about the history of the wall and how it was built to protect the country. Bao the Panda is so fascinated by the Great Wall that he wants to climb to the top, so with the help of his parents, he does just that. From the top of the wall, Bao the Panda sees a beautiful view of the country below. When it’s time to go home, the family of pandas has a great journey back to their bamboo forest. Bao the Panda can’t wait to tell all of his friends about his adventure and is excited for more adventures with his family.
Connecting ChatGPT to Midjourney to create visual stories from prompts creates an AI-driven creativity pipeline. With prompt engineering, one person can fuse words and art into a story in minutes instead of months.
Creativity
My family was uninterested in ChatGPT, but Midjourney was magical. Type in words, and out comes an image. Visualizing ideas was far more interesting for the family than getting the computer to generate a story. They wanted to write their own stories and use Midjourney for the artwork.
They wrote two books, one about a fox and a panda and one about birds, and then we laid out the stories with the artwork from Midjourney on Shutterfly. In fact, printing 6x6 books for $4 with Shutterfly was as much of a hit as Midjourney.
ChatGPT and Midjourney aren’t perfect. I generated many images with Midjourney where the panda had ten toes. ChatGPT repeatedly ignored my requests to use the name in each sentence or how to incorporate styles like “cinematic” or “sketch”. Still, these are the state of what’s available now. Those shortcomings will be resolved soon.