Discussions
Talking Photo API – Significantly worse animation quality compared to UI (same avatar, same audio)
25 days ago
I'm experiencing a significant and reproducible quality difference between videos generated via the API and videos generated through the HeyGen UI, using the exact same avatar and the exact same audio file.
My setup:
- Talking Photo via
/v2/video/generate - AI-generated avatar
- Audio passed via
audio_url - Avatar engine:
AvatarIV(added after initial support response)
What I observe:
When generating via the HeyGen UI (uploading the audio file manually):
- Natural full-body movement
- Hand gestures synchronized to speech rhythm
- Body language matches the flow of the spoken content
- Result is excellent
When generating via the API with identical avatar and audio:
- Minimal to no body or hand movement
- Only mouth/face area animated
- Output resembles a static image with lip-sync only
- Result is significantly degraded
What I have already tried:
- Verified it is not an avatar issue – the UI produces great results with the same avatar
- Added
"avatar_engine": "AvatarIV"to the request body after support suggested this - Retested multiple times with different audio files – issue is consistent
My API request body:
{
"video_inputs": [
{
"character": {
"type": "talking_photo",
"talking_photo_id": "02941537ac664f93b77c71938a097316"
},
"voice": {
"type": "audio",
"audio_url": "SOME_AUDIO_URL"
}
}
],
"dimension": {
"width": 1080,
"height": 1920
},
"avatar_engine": "AvatarIV"
}
My question:
Is there a parameter or configuration I am missing that would bring the API output to the same quality level as the UI? Or is this a known gap between UI and API rendering?