Talking Photo API – Significantly worse animation quality compared to UI (same avatar, same audio)

I'm experiencing a significant and reproducible quality difference between videos generated via the API and videos generated through the HeyGen UI, using the exact same avatar and the exact same audio file.

My setup:

Talking Photo via /v2/video/generate
AI-generated avatar
Audio passed via audio_url
Avatar engine: AvatarIV (added after initial support response)

What I observe:

When generating via the HeyGen UI (uploading the audio file manually):

Natural full-body movement
Hand gestures synchronized to speech rhythm
Body language matches the flow of the spoken content
Result is excellent

When generating via the API with identical avatar and audio:

Minimal to no body or hand movement
Only mouth/face area animated
Output resembles a static image with lip-sync only
Result is significantly degraded

What I have already tried:

Verified it is not an avatar issue – the UI produces great results with the same avatar
Added "avatar_engine": "AvatarIV" to the request body after support suggested this
Retested multiple times with different audio files – issue is consistent

My API request body:

{
  "video_inputs": [
    {
      "character": {
        "type": "talking_photo",
        "talking_photo_id": "02941537ac664f93b77c71938a097316"
      },
      "voice": {
        "type": "audio",
        "audio_url": "SOME_AUDIO_URL"
      }
    }
  ],
  "dimension": {
    "width": 1080,
    "height": 1920
  },
  "avatar_engine": "AvatarIV"
}

My question:
Is there a parameter or configuration I am missing that would bring the API output to the same quality level as the UI? Or is this a known gap between UI and API rendering?