I exploit two totally different implementations of GPT to generate the captions. There’s the newest GPT-3 Da Vinci mannequin from OpenAI that does a superb job, however you need to be enrolled of their beta program to make use of it. And there’s the open-source GPT-Neo mannequin from EleutherAI. The mannequin is loads smaller, nevertheless it’s free to make use of.
GPT-3 Da Vinci
OpenAI’s GPT-3 Da Vinci is at the moment the most important AI mannequin for Pure Language Processing. I’m utilizing their newest “zero-shot” fashion of prompting with their new Da Vinci Instruct mannequin. As a substitute of offering examples of what you’re asking the mannequin to do, you’ll be able to simply merely ask it what to do immediately.
Right here is the immediate that creates a caption for the apple pie image.
Create a humorous caption for a brand new meme about apple pie. The background image is Easy and simple apple pie served with vanilla ice cream, on a gingham tablecloth in Lysekil, Sweden.
I go the immediate into the decision to OpenAI together with some extra parameters. Right here’s the Python code.
response = openai.Completion.create(
The max_token parameter signifies how lengthy the response must be. The temperature and top_p parameters are comparable in that they point out the quantity of selection within the response. The frequency_penalty and presence_penalty are additionally comparable in that they management how usually there are new deviations and new matters within the response. If you wish to know what all these parameters do, try my article from final month, here.
Earlier than I present examples of the output from GPT-3, right here is the authorized disclaimer that OpenAI means that I present, which is all true.
The creator generated the next textual content partially with GPT-3, OpenAI’s large-scale language-generation mannequin. Upon producing draft language, the creator reviewed and revised the language to their very own liking and takes final duty for the content material of this publication.
Operating the code 10 occasions will yield the next outcomes, at a complete price of $0.03. Be aware that I formatted the textual content to be in uppercase.
1: THIS IS THE PERFECT WAY TO END A DAY OF APPLE PICKING
2: NO, IT’S NOT THAT EASY
3: I’LL TAKE THE ONE WITH THE VANILLA ICE CREAM, PLEASE
4: APPLE PIE IS THE BEST!
5: THIS APPLE PIE IS SO GOOD, I CAN’T EVEN!
6: YOU’RE NOT THE ONLY ONE WHO LOVES APPLE PIE
7: IF YOU CAN’T FIND THE RECIPE, JUST GOOGLE IT
8: THE PIE IS GOOD, BUT IT’S NOT AS GOOD AS MY MOM’S
9: I’LL HAVE A SLICE OF THAT APPLE PIE, PLEASE
10: WE’RE GOING TO NEED A BIGGER PIE
OK, these are fairly good. One factor I realized is that GTP-3 Da Vinci will be humorous! For instance, caption quantity 2 appears to confer with the “straightforward as pie” idiom.
Be aware that GPT-3, like all AI fashions skilled on a big corpus of textual content, will replicate societal biases. Often the system will produce textual content that could be inappropriate or offensive. OpenAI has a characteristic to label generated textual content with one in every of three warning ranges: 0 – the textual content is secure, 1 – this textual content is delicate, or 2 – this textual content is unsafe. My code will present a warning for any of the generated captions which are flagged as delicate or unsafe.
GPT-Neo is a transformer mannequin created primarily by builders referred to as sdtblck and leogao2 on GitHub. The project is an implementation of “GPT-2 and GPT-3-style fashions utilizing the mesh-tensorflow library.” To date, their system is the scale of OpenAI’s GPT-3 Ada, their smallest mannequin. However GPT-Neo is on the market at no cost. I used the Huggingface Transformers interface to entry GPT-Neo from my Python code.
Since GPT-Neo doesn’t have “instruct” variations of their pre-trained fashions, I needed to write a “few-shot” immediate so as to get the system to generate captions for memes utilizing examples. Right here’s the immediate I wrote utilizing Catastrophe Lady and Grumpy Cat memes with instance captions.
Create a humorous caption for a meme.
Theme: catastrophe lady
Picture description: An image of a lady taking a look at us as her home burns down
Caption: There was a spider. It’s gone now.
Theme: grumpy cat
Picture description: A face of a cat who appears to be like sad
Caption: I don’t like Mondays.
Theme: apple pie.
Picture description: Easy and simple apple pie served with vanilla ice cream, on a gingham tablecloth in Lysekil, Sweden.
After setting the temperature parameter to 0.7 and the top_p to 1.0, I go the immediate into GPT-Neo to generate new captions. Right here’s the code to generate a caption.
from transformers import pipeline, AutoTokenizer
generator = pipeline(‘text-generation’,
outcomes = generator(immediate,
Listed here are the pattern outcomes.
1: I LOVE APPLE PIE
2: I CAN’T. I’M NOT ALLOWED
3: I LOVE THE SIMPLICITY OF AN APPLE PIE
4: APPLE PIE. THE ONLY THING BETTER THAN THIS IS A HOT BATH
5: I’M A PIE. YOU’RE A PIE
6: I LOVE PIE, AND THIS IS A GOOD ONE
7: I LOVE APPLES, BUT I’M NOT VERY GOOD AT BAKING
8: THE PIE IS DELICIOUS, BUT THE ICE CREAM IS NOT
9: I LOVE APPLE PIE. IT’S THE BEST
10: THE BEST FOOD IS WHEN YOU CAN TASTE THE DIFFERENCE BETWEEN THE FOOD AND THE TABLECLOTH
Hmmm. These are inferior to the GPT-3 captions. Most of them are fairly easy and never very humorous. Quantity 10 is simply plain absurd. However quantity 4 appears to be OK. Let’s use this as our caption.
The ultimate step is to compose the meme by writing the caption into the background picture.
Including the captions to memes is pretty simple. Most memes are composed utilizing the Influence font designed by Geoffrey Lee in 1965. For AI-Memer, I used some code by Emmanuel Pire for positioning and rendering the caption into the background picture. I give the consumer the choice to regulate the scale of the font and place the caption on the prime or backside of the picture.
Listed here are our two memes. The caption on the left was generated by GPT-3 and the one on the appropriate was generated by GPT-Neo.
With this venture, I realized that large-scale language-generation fashions can create good captions for memes given an outline of the picture. Though lots of the generated captions are simple, often they are often very intelligent and humorous. The GPT-3 Da Vinci mannequin, specifically, appears to create intelligent memes continuously, demonstrating each a command of the language with a seemingly deep understanding of cultural historical past.
Though the outcomes are fairly good, there may be positively room for enchancment. For instance, the alternatives for background photographs appear considerably restricted, particularly for popular culture. This can be on account of the truth that I’m limiting the search to make use of solely freely licensed photographs. I don’t know if a US court docket has weighed in but on whether or not or not the background photographs in memes will be deemed to be honest use or not, so I’ll depart that query to the legal professionals.
The builders behind GPT-Neo at EleutherAI are persevering with to construct and prepare greater language fashions. Their subsequent mannequin is known as GPT-NeoX. They are saying their “main objective is to coach an equal mannequin to the full-sized GPT-3 and make it obtainable to the general public beneath an open license.”