Which Avatar IV endpoint to use and how to reduce head bobbing while reading transcript.

My use case is to read a transcript as realistically as possible. I have created an avatar in Avatar Studio (I'm unsure how to tell if its Avatar IV or Avatar III). I was told this by support:

For Avatar IV via API, you have two endpoint options:

/v2/video/av4/generate

specific endpoint for Avatar IV videos

/v2/video/generate

general endpoint that also supports Avatar IV for Talking Photos

Motion prompts can be added through a "Custom Motion" field in your payload to control avatar movements and reduce head bobbing. The structure is: [Body part] + [Action] + [Emotion/intensity] (e.g., "head stays still" or "minimal head movement").

The docs are not clear what the json structure is to add custom motion, not is it clear which endpoint to use. What is the difference between Talking Photos vs Videos?

Thanks for the help!