Discussions

Ask a Question
Back to all

Talking Photo API – Significantly worse animation quality compared to UI (same avatar, same audio)

I'm experiencing a significant and reproducible quality difference between videos generated via the API and videos generated through the HeyGen UI, using the exact same avatar and the exact same audio file.

My setup:

  • Talking Photo via /v2/video/generate
  • AI-generated avatar
  • Audio passed via audio_url
  • Avatar engine: AvatarIV (added after initial support response)

What I observe:

When generating via the HeyGen UI (uploading the audio file manually):

  • Natural full-body movement
  • Hand gestures synchronized to speech rhythm
  • Body language matches the flow of the spoken content
  • Result is excellent

When generating via the API with identical avatar and audio:

  • Minimal to no body or hand movement
  • Only mouth/face area animated
  • Output resembles a static image with lip-sync only
  • Result is significantly degraded

What I have already tried:

  • Verified it is not an avatar issue – the UI produces great results with the same avatar
  • Added "avatar_engine": "AvatarIV" to the request body after support suggested this
  • Retested multiple times with different audio files – issue is consistent

My API request body:

{
  "video_inputs": [
    {
      "character": {
        "type": "talking_photo",
        "talking_photo_id": "02941537ac664f93b77c71938a097316"
      },
      "voice": {
        "type": "audio",
        "audio_url": "SOME_AUDIO_URL"
      }
    }
  ],
  "dimension": {
    "width": 1080,
    "height": 1920
  },
  "avatar_engine": "AvatarIV"
}

My question:
Is there a parameter or configuration I am missing that would bring the API output to the same quality level as the UI? Or is this a known gap between UI and API rendering?