Make Any Photo Talk with AI
Create talking photo videos for social media in three simple steps.
Choose a Photo
Select a preset, pick a character, or upload your own
How to Make a Photo Talk
Create a talking photo video in three simple steps.
1. Choose a Character or Photo
Select an existing character or upload a clear, front-facing photo with good lighting.
2. Write Script & Preview Audio
Type your text, pick a voice (or use your character's voice), and preview the audio before creating the video.
3. Generate Your Video
Choose your resolution and create a lip-synced talking video. Download the result as MP4.
Talking Photo Examples
See what you can create with AI talking photos — from news anchors to virtual streamers.
Beauty
Chef
Cat
ASMR
Presenter
Businessman
Car POV
Mouse Character
News Anchor
Streamer
Want more control? Use the Face Animator to upload your own audio file, or the Text to Speech tool for standalone voice generation.
Frequently Asked Questions
What type of photos work best?
Clear, high-resolution face photos with good lighting work best. Make sure the face is centered and front-facing for optimal animation results. The photo should contain only one face.
What languages are supported?
The text-to-speech engine automatically detects the language of your text. It supports a wide range of languages including English, Spanish, French, German, Portuguese, and many more.
What is the maximum text length?
You can enter up to 3,000 characters per generation. The resulting audio must be under 60 seconds for the video animation step.
What is the difference between 480p and 720p?
480p produces a lower resolution video at 10 credits per second of audio, while 720p produces a higher quality video at 20 credits per second. Choose 480p for quick previews and 720p for final results.
How does the two-step billing work?
You pay separately for audio generation (based on text length) and video creation (based on audio duration and resolution). This way you only pay for the video once you're happy with the audio. You can re-generate the audio as many times as needed before creating the video.
How long does the video take to generate?
Audio generation takes just a few seconds. The video animation typically takes 1-3 minutes depending on the audio length and chosen resolution.
Is my uploaded photo kept private?
Yes, all uploaded photos are kept private and secure. They are only used for the animation process and are not stored or shared.
Can I use my own audio instead of text-to-speech?
Yes! For uploading your own audio file, use the Face Animator tool which lets you upload both a photo and an audio file directly.
Can I clone my own voice?
Yes! In the Voice step, switch to the "Clone New" tab. Upload one or more audio samples of your voice (MP3 or WAV), give it a name, and create a reusable voice clone. Your cloned voices are saved and available in the "My Voices" tab for future use.
How many voices can I clone?
You can have up to 5 cloned voices at a time. To manage your voices — rename or delete them — use the "Manage Voices" button in the "My Voices" tab.
Tools
Let's Socialize
Smart and easy image editing by @ramos_pincel